smx.zones.aggregation#

ZoneAggregator: reduce each spectral zone (DataFrame) to a single score per sample.

Supports simple column-wise aggregations (sum, mean, …) and PCA-based aggregation (PC1 score). A fit/transform interface ensures that the same PCA model fitted on calibration data can be applied consistently to prediction data.

Classes#

ZoneAggregator

Aggregate spectral zones to a single score per sample.

Module Contents#

class smx.zones.aggregation.ZoneAggregator(method: str = 'pca')[source]#

Aggregate spectral zones to a single score per sample.

Parameters#

methodstr, default 'pca'

Aggregation strategy.

  • 'pca': fit a single-component PCA per zone and use PC1 scores. Preserves directional information and maximises explained variance.

  • 'sum', 'mean', 'median', 'max', 'min', 'std', 'var', 'extreme': simple column-wise aggregations.

Attributes (set after fit())#

pca_info_dict or None

{zone_name: {'pca_model', 'loadings', 'mean', 'variance_explained', 'columns'}} Only populated when method='pca'.

is_fitted_bool

True after fit() has been called.

method = 'pca'#
pca_info_: Dict | None = None#
is_fitted_: bool = False#
fit(spectral_zones_dict: Dict[str, pandas.DataFrame]) ZoneAggregator[source]#

Fit the aggregator on calibration zone data.

For method='pca' this trains a 1-component PCA per zone and stores the models so the same projections can be applied to new data. For simple aggregation methods, fit is a no-op (nothing to learn).

Parameters#

spectral_zones_dictdict[str, pd.DataFrame]

Calibration spectral zones as returned by smx.zones.extraction.extract_spectral_zones().

Returns#

self

transform(spectral_zones_dict: Dict[str, pandas.DataFrame]) pandas.DataFrame[source]#

Apply the fitted aggregator to zone data.

Parameters#

spectral_zones_dictdict[str, pd.DataFrame]

Spectral zones to transform (same structure as used for fit).

Returns#

pd.DataFrame

Scores DataFrame (samples × zones). For method='pca' the index is taken from the first zone’s DataFrame; for simple methods it is the shared index of the input DataFrames.

fit_transform(spectral_zones_dict: Dict[str, pandas.DataFrame]) pandas.DataFrame[source]#

Fit and transform in one step (convenience wrapper).

Parameters#

spectral_zones_dictdict[str, pd.DataFrame]

Calibration spectral zones.

Returns#

pd.DataFrame

Scores DataFrame (samples × zones).

get_variance_explained() Dict[str, float] | None[source]#

Return per-zone explained variance (PCA method only).

Returns None for non-PCA methods.