smx.datasets.synthetic#
Functions#
|
Generate a one-dimensional Gaussian peak. |
|
Generate a synthetic spectral dataset for multiple classes. |
Module Contents#
- smx.datasets.synthetic.gaussian_peak_model(x, center, amplitude, width)[source]#
Generate a one-dimensional Gaussian peak.
Implements the equation: g(x) = A * exp(-(x - c)² / (2σ²))
Parameters#
- xarray_like
Spectral axis (wavelengths, energy, channels).
- centerfloat
Central position of the peak (same units as x).
- amplitudefloat
Maximum height of the peak (intensity at the center).
- widthfloat
Standard deviation (σ) of the peak — controls spread/width.
Returns#
- ndarray
Array with the Gaussian peak evaluated at each point of x.
Notes#
For XRF: use a small width (~5–15) to simulate narrow lines.
For Vis-NIR: use a larger width (~20–50) for broad absorption bands.
- smx.datasets.synthetic.generate_synthetic_spectral_data(classes_config, n_points=500, x_min=0, x_max=1000, seed=None)[source]#
Generate a synthetic spectral dataset for multiple classes.
Returns a DataFrame where: - First column:
'Class'(values defined by the user: ‘A’, ‘B’, ‘C’, …). - Remaining columns: spectral variables (intensity values). - Rows: individual samples.Parameters#
- classes_configlist of dict
List of dicts, each defining one class. Supported keys:
'name'(str): class label (e.g.'A','B','Soil').'n_samples'(int): number of samples to generate.'peaks'(list): peak definitions on the spectral axis.Supported formats:
[250, 550, 700]
or:
[ {'center': 250, 'amplitude_mean': 0.9, 'width_mean': 10}, {'center': 550, 'amplitude_mean': 1.3, 'width_mean': 18}, {'center': 700, 'amplitude_mean': 0.7, 'width_mean': 25}, ]
The second form allows per-peak amplitude/width customisation. Optional per-peak keys:
amplitude_mean,amplitude_std,width_mean,width_std. Missing keys fallback to class-level defaults below.'amplitude_mean'(float, optional, default1.0): mean peak amplitude.'amplitude_std'(float, optional, default0.1): std dev of amplitude.'width_mean'(float, optional, default15.0): mean peak width (σ).'width_std'(float, optional, default2.0): std dev of peak width.'noise_std'(float, optional, default0.02): std dev of baseline noise.
Example:
[ { 'name': 'A', 'n_samples': 50, 'peaks': [ {'center': 250, 'amplitude_mean': 0.9, 'width_mean': 12}, {'center': 550, 'amplitude_mean': 1.4, 'width_mean': 20}, {'center': 700, 'amplitude_mean': 0.8, 'width_mean': 16}, {'center': 850, 'amplitude_mean': 1.1, 'width_mean': 24}, ], 'amplitude_mean': 1.0, 'amplitude_std': 0.1, 'width_mean': 15.0, 'width_std': 2.0, }, { 'name': 'B', 'n_samples': 50, 'peaks': [250, 700, 850], 'amplitude_mean': 1.2, 'width_mean': 20.0, }, ]
- n_pointsint, default
500 Number of points on the spectral axis (resolution).
- x_min, x_maxfloat, default
0,1000 Limits of the spectral axis (e.g. 400–1000 nm for Vis-NIR, 0–40 keV for XRF).
- seedint, optional
Random seed for reproducibility.
Returns#
- dfpandas.DataFrame
Synthetic spectral dataset.
Column 0:
'Class'(str — class name from classes_config).Columns 1 … n_points: spectral intensities named after x-axis values.
Shape:
(total_samples, n_points + 1).