smx.datasets.synthetic#

Functions#

gaussian_peak_model(x, center, amplitude, width)

Generate a one-dimensional Gaussian peak.

generate_synthetic_spectral_data(classes_config[, ...])

Generate a synthetic spectral dataset for multiple classes.

Module Contents#

smx.datasets.synthetic.gaussian_peak_model(x, center, amplitude, width)[source]#

Generate a one-dimensional Gaussian peak.

Implements the equation: g(x) = A * exp(-(x - c)² / (2σ²))

Parameters#

xarray_like

Spectral axis (wavelengths, energy, channels).

centerfloat

Central position of the peak (same units as x).

amplitudefloat

Maximum height of the peak (intensity at the center).

widthfloat

Standard deviation (σ) of the peak — controls spread/width.

Returns#

ndarray

Array with the Gaussian peak evaluated at each point of x.

Notes#

  • For XRF: use a small width (~5–15) to simulate narrow lines.

  • For Vis-NIR: use a larger width (~20–50) for broad absorption bands.

smx.datasets.synthetic.generate_synthetic_spectral_data(classes_config, n_points=500, x_min=0, x_max=1000, seed=None)[source]#

Generate a synthetic spectral dataset for multiple classes.

Returns a DataFrame where: - First column: 'Class' (values defined by the user: ‘A’, ‘B’, ‘C’, …). - Remaining columns: spectral variables (intensity values). - Rows: individual samples.

Parameters#

classes_configlist of dict

List of dicts, each defining one class. Supported keys:

  • 'name' (str): class label (e.g. 'A', 'B', 'Soil').

  • 'n_samples' (int): number of samples to generate.

  • 'peaks' (list): peak definitions on the spectral axis.

    Supported formats:

    [250, 550, 700]
    

    or:

    [
        {'center': 250, 'amplitude_mean': 0.9, 'width_mean': 10},
        {'center': 550, 'amplitude_mean': 1.3, 'width_mean': 18},
        {'center': 700, 'amplitude_mean': 0.7, 'width_mean': 25},
    ]
    

    The second form allows per-peak amplitude/width customisation. Optional per-peak keys: amplitude_mean, amplitude_std, width_mean, width_std. Missing keys fallback to class-level defaults below.

  • 'amplitude_mean' (float, optional, default 1.0): mean peak amplitude.

  • 'amplitude_std' (float, optional, default 0.1): std dev of amplitude.

  • 'width_mean' (float, optional, default 15.0): mean peak width (σ).

  • 'width_std' (float, optional, default 2.0): std dev of peak width.

  • 'noise_std' (float, optional, default 0.02): std dev of baseline noise.

Example:

[
    {
        'name': 'A',
        'n_samples': 50,
        'peaks': [
            {'center': 250, 'amplitude_mean': 0.9, 'width_mean': 12},
            {'center': 550, 'amplitude_mean': 1.4, 'width_mean': 20},
            {'center': 700, 'amplitude_mean': 0.8, 'width_mean': 16},
            {'center': 850, 'amplitude_mean': 1.1, 'width_mean': 24},
        ],
        'amplitude_mean': 1.0,
        'amplitude_std': 0.1,
        'width_mean': 15.0,
        'width_std': 2.0,
    },
    {
        'name': 'B',
        'n_samples': 50,
        'peaks': [250, 700, 850],
        'amplitude_mean': 1.2,
        'width_mean': 20.0,
    },
]
n_pointsint, default 500

Number of points on the spectral axis (resolution).

x_min, x_maxfloat, default 0, 1000

Limits of the spectral axis (e.g. 400–1000 nm for Vis-NIR, 0–40 keV for XRF).

seedint, optional

Random seed for reproducibility.

Returns#

dfpandas.DataFrame

Synthetic spectral dataset.

  • Column 0: 'Class' (str — class name from classes_config).

  • Columns 1 … n_points: spectral intensities named after x-axis values.

  • Shape: (total_samples, n_points + 1).