smx.graph.interpretation#

Threshold mapping and predicate interpretation utilities.

Functions in this module translate LRC results from the preprocessed (score) space back to the natural (unpreprocessed) spectral space, and reconstruct multivariate threshold spectra for visualisation.

Functions#

extract_predicate_info(→ dict)

Extract components from a predicate rule string.

map_thresholds_to_natural(→ pandas.DataFrame)

Map predicate thresholds from the preprocessed space to natural space.

reconstruct_threshold_to_spectrum(→ pandas.Series)

Reconstruct a scalar threshold to a multivariate threshold spectrum.

Module Contents#

smx.graph.interpretation.extract_predicate_info(predicate_rule: str) dict[source]#

Extract components from a predicate rule string.

Parameters#

predicate_rulestr

Rule in the format "zone_name <= threshold" or "zone_name > threshold".

Returns#

dict

{'zone': str, 'operator': str, 'threshold': float}

Examples#

>>> extract_predicate_info("Ca ka <= 25.50")
{'zone': 'Ca ka', 'operator': '<=', 'threshold': 25.5}
smx.graph.interpretation.map_thresholds_to_natural(lrc_df: pandas.DataFrame, zone_sums_preprocessed: pandas.DataFrame, zone_sums_natural: pandas.DataFrame) pandas.DataFrame[source]#

Map predicate thresholds from the preprocessed space to natural space.

For each predicate in lrc_df, this finds the calibration sample whose zone score in the preprocessed space is closest to the predicate’s threshold, and retrieves that sample’s value in the natural (unpreprocessed) space as the best approximation.

Parameters#

lrc_dfpd.DataFrame

LRC results. Must contain columns 'Zone', 'Threshold', 'Operator', and 'Node'.

zone_sums_preprocessedpd.DataFrame

Zone aggregation scores computed on preprocessed calibration data (same zones as lrc_df).

zone_sums_naturalpd.DataFrame

Zone aggregation scores computed on original (unprocessed) data.

Returns#

pd.DataFrame

Copy of lrc_df with additional columns:

  • 'Threshold_Natural' — threshold value in the natural space

  • 'Reference_Sample_Index' — index of the nearest calibration sample

  • 'Approximation_Error' — distance (preprocessed space) to the nearest sample

  • 'Node_Natural' — predicate rule string using the natural threshold

smx.graph.interpretation.reconstruct_threshold_to_spectrum(threshold_value: float, zone_name: str, pca_info_dict: Dict) pandas.Series[source]#

Reconstruct a scalar threshold to a multivariate threshold spectrum.

Uses the PCA model fitted during zone aggregation to reconstruct a threshold value in score space back into the original spectral variable space:

\[\tau = \bar{x} + q \cdot \mathbf{w}\]

where \(\bar{x}\) is the zone mean, \(\mathbf{w}\) the PC1 loadings vector, and \(q\) the threshold score value.

Parameters#

threshold_valuefloat

Threshold in PC1 score space.

zone_namestr

Name of the spectral zone.

pca_info_dictdict

PCA info dictionary as stored in smx.zones.aggregation.ZoneAggregator.pca_info_.

Returns#

pd.Series

Threshold spectrum indexed by original column names.