smx.pipeline#

SMX: high-level facade for the full SMX explanation pipeline.

This class internalises the seed-loop orchestration that every caller would otherwise have to rewrite manually (zone extraction → predicate generation → bagging → metric → graph → LRC → natural-scale mapping across N seeds).

Individual component classes (ZoneAggregator, PredicateGenerator, etc.) remain available for power users who need fine-grained control.

Attributes#

`logger`
`SpectralCuts`

Classes#

SMX

Full SMX explanation pipeline as a single fit/transform object.

Module Contents#

smx.pipeline.logger#

smx.pipeline.SpectralCuts#

class smx.pipeline.SMX(spectral_cuts: SpectralCuts, quantiles: List[float], n_repetitions: int = 4, n_bags: int = 10, n_samples_fraction: float = 0.8, replace: bool = False, metric: Literal['covariance', 'perturbation'] = 'perturbation', estimator: Any | None = None, perturbation_mode: str = 'median', perturbation_value: float = 0, perturbation_metric: str = 'probability_shift', perturbation_stats_source: str = 'full', normalize_by_zone_size: bool = True, zone_size_exponent: float = 1.0, covariance_threshold: float = 0.01, var_exp: bool = True, show_graph_details: bool = False, class_threshold: float = 0.5)[source]#

Full SMX explanation pipeline as a single fit/transform object.

Runs zone extraction → PCA aggregation → predicate generation → seed-loop (bagging → metric → graph → LRC) → cross-seed aggregation → natural-scale threshold mapping.

Parameters#

spectral_cutslist of (name, start, end) tuples

Zone definitions, e.g. [("Low", 1.0, 4.0), ("High", 4.0, 10.0)].

quantileslist of float

Quantile fractions for predicate generation, e.g. [0.25, 0.5, 0.75].

n_repetitionsint, default 4

Number of independent bagging repetitions. Seeds are generated as [0, 1, …, n_repetitions-1].

n_bagsint, default 10

Number of bags per seed.

n_samples_fractionfloat, default 0.8

Fraction of calibration samples drawn per bag. The minimum samples per predicate is hardcoded to 20 % of the dataset.

replacebool, default False

Whether to sample bags with replacement.

metric{‘covariance’, ‘perturbation’}, default ‘perturbation’

Importance metric to use.

estimatorsklearn-compatible estimator, optional

Trained model required when metric='perturbation'.

perturbation_modestr, default ‘median’

Replacement strategy for perturbation ('constant', 'mean', 'median', 'min', 'max').

perturbation_valuefloat, default 0

Constant replacement value used when perturbation_mode='constant'.

perturbation_metricstr, default ‘probability_shift’

Perturbation importance measure. Determines how the impact of spectral zone perturbation is quantified. Choice depends on the estimator type and the desired sensitivity:

Classification estimators (with predict_proba):

'probability_shift' — Mean total variation distance between pre- and post-perturbation class probabilities. Sensitive to confidence changes across all classes. Requires predict_proba.
'accuracy_drop' — Drop in accuracy when perturbed predictions are compared to original predictions.
'f1_drop' — Weighted F1-score decrease after perturbation.
'decision_function_shift' — Mean absolute change in decision function values (e.g. signed distances from hyperplane for SVC). Requires decision_function().

Regression estimators (with predict returning continuous values):

'mean_abs_diff' — Mean absolute difference between original and perturbed predictions.
'mean_diff' — Mean signed difference (bias direction). Positive values indicate perturbation increases predictions, negative decreases.
'mean_relative_dev' — Mean relative deviation, normalized by original prediction magnitude. Treats zero predictions as NaN.

normalize_by_zone_sizebool, default True

Divide raw perturbation importance by zone width.

zone_size_exponentfloat, default 1.0

Exponent applied to zone size during normalisation.

covariance_thresholdfloat, default 0.01

Minimum covariance value to keep a predicate (covariance metric only).

var_expbool, default True

Weight graph edges by PC1 explained variance of the source zone.

show_graph_detailsbool, default False

Print bidirectional-edge details during graph construction.

class_thresholdfloat, default 0.5

Decision boundary for Class_Predicted annotation on bags.

Attributes (set after `fit()`)#

lrc_natural_pd.DataFrame or None: LRC with natural-scale thresholds (available only when X_cal_natural is provided to fit()). Columns: Node, Local_Reaching_Centrality, Zone, Threshold, Operator, Threshold_Natural.
lrc_summed_pd.DataFrame: Mean-aggregated LRC across seeds, preprocessed-scale thresholds.
lrc_summed_unique_pd.DataFrame: Zone-deduplicated version of lrc_summed_ (one row per zone).
zone_scores_pd.DataFrame: PCA zone scores on the preprocessed calibration data.
predicates_df_pd.DataFrame: Full predicate catalogue (generated from zone_scores_).
pca_info_dict: PCA info for the preprocessed zones.
pca_info_natural_dict or None: PCA info for the natural (unpreprocessed) zones (only when X_cal_natural is provided to fit()).
zones_natural_dict or None: Raw spectral zone DataFrames from the unpreprocessed data (only when X_cal_natural is provided to fit()).
graphs_by_seed_dict[int, nx.DiGraph]: Per-seed directed predicate graphs (useful for debugging).
valid_seeds_list[int]: Seeds that produced a non-empty graph (subset of seeds).
faithfulness_dict: Progressive top-k masking evaluation summary produced by evaluate_faithfulness().

spectral_cuts#

quantiles#

n_repetitions = 4#

seeds#

n_bags = 10#

n_samples_fraction = 0.8#

replace = False#

metric = 'perturbation'#

estimator = None#

perturbation_mode = 'median'#

perturbation_value = 0#

perturbation_metric = 'probability_shift'#

perturbation_stats_source = 'full'#

normalize_by_zone_size = True#

zone_size_exponent = 1.0#

covariance_threshold = 0.01#

var_exp = True#

show_graph_details = False#

class_threshold = 0.5#

lrc_natural_: pandas.DataFrame | None = None#

lrc_summed_: pandas.DataFrame | None = None#

lrc_summed_unique_: pandas.DataFrame | None = None#

zone_scores_: pandas.DataFrame | None = None#

predicates_df_: pandas.DataFrame | None = None#

pca_info_: Dict | None = None#

pca_info_natural_: Dict | None = None#

zones_natural_: Dict | None = None#

graphs_by_seed_: Dict[int, networkx.DiGraph]#

valid_seeds_: List[int] = []#

faithfulness_: Dict[str, Any] | None = None#

fit(X_cal_prep: pandas.DataFrame, y_pred_cal: pandas.Series | numpy.ndarray, X_cal_natural: pandas.DataFrame | None = None) → SMX[source]#

Run the full SMX explanation pipeline.

Parameters#

X_cal_preppd.DataFrame: Pre-processed calibration spectra (samples × features).
y_pred_calpd.Series or np.ndarray: Continuous model predictions for the calibration set (aligned with X_cal_prep).
X_cal_naturalpd.DataFrame, optional: Unpreprocessed calibration spectra with the same shape as X_cal_prep. Required for lrc_natural_ threshold mapping. When None, the natural-scale mapping step is skipped and lrc_natural_, zones_natural_, and pca_info_natural_ remain None.

Returns#

self

evaluate_faithfulness(X_eval: pandas.DataFrame, *, ranking: Literal['unique', 'summed', 'natural'] = 'unique', X_reference: pandas.DataFrame | None = None, metric: Literal['auto', 'probability_shift', 'mean_abs_diff', 'decision_function_shift'] = 'auto', masking_strategy: Literal['zero', 'constant', 'mean', 'median', 'min', 'max'] = 'zero', constant_value: float = 0.0, max_k: int | None = None, n_random_rankings: int = 100, random_state: int | None = 42, output_path: str | pathlib.Path | None = None, plot_title: str | None = None, plot_width: int = 1100, plot_height: int = 560) → Dict[str, Any][source]#

Evaluate SMX faithfulness via progressive top-k zone masking.

The ranked spectral zones are progressively masked on X_eval following the selected SMX ranking, and the resulting prediction shift is summarized by the area under the masking curve (AUC).

Parameters#

X_evalpd.DataFrame: Evaluation spectra to be masked progressively.
ranking{‘unique’, ‘summed’, ‘natural’}, default ‘unique’: Ranking table used to derive the ordered list of spectral zones. 'unique' uses the one-zone-per-row ranking in lrc_summed_unique_. 'summed' and 'natural' are deduplicated internally to one row per zone before masking.
X_referencepd.DataFrame, optional: Reference spectra used to compute replacement values for non-zero masking strategies. Defaults to X_eval.
metric{‘auto’, ‘probability_shift’, ‘mean_abs_diff’, ‘decision_function_shift’}, default ‘auto’: Prediction-shift metric to evaluate. 'auto' chooses 'probability_shift' when the estimator exposes predict_proba(), 'decision_function_shift' when it exposes decision_function(), otherwise 'mean_abs_diff'.
masking_strategy{‘zero’, ‘constant’, ‘mean’, ‘median’, ‘min’, ‘max’}, default ‘zero’: How masked spectral variables are replaced.
constant_valuefloat, default 0.0: Replacement value used when masking_strategy='constant'.
max_kint, optional: Maximum number of ranked zones to mask. Defaults to all ranked zones available in X_eval.
n_random_rankingsint, default 100: Number of random rankings used to contextualize the observed AUC.
random_stateint, optional: Seed controlling the random baseline.
output_pathstr or Path, optional: If provided, also export a faithfulness plot to this path. The extension determines the format (.html or a static image).
plot_titlestr, optional: Title override used when output_path is provided.
plot_widthint, default 1100: Plot width in pixels. Used only when output_path is provided.
plot_heightint, default 560: Plot height in pixels. Used only when output_path is provided.

Returns#

dict: Faithfulness summary including curve_df, auc, auc_normalized, level, and null-baseline statistics.

plot_zone_ranking_over_spectrum(output_path: str | pathlib.Path, *, ranking: Literal['unique', 'natural'] = 'unique', aggregation: Literal['mean', 'median'] = 'mean', title: str | None = None, X_natural: pandas.DataFrame | None = None, y_labels: pandas.Series | None = None, class_colors: dict | None = None, width: int | None = 1200, height: int | None = 500) → pandas.DataFrame[source]#

Plot ranked spectral zones over a reference spectrum and save to file.

The output format is inferred from output_path — .html for an interactive figure, or .png / .svg / .pdf for a static image (requires kaleido).

Parameters#

output_pathstr or Path: Destination .html file.
ranking{‘unique’, ‘natural’}, default ‘unique’: Ranking source. 'unique' uses lrc_summed_unique_ (one row per zone). 'natural' uses lrc_natural_ and collapses multiple predicates per zone to the strongest LRC value.
aggregation{‘mean’, ‘median’}, default ‘mean’: Aggregation used to build the reference spectrum from zones_natural_.
titlestr, optional: Figure title override.
X_naturalpd.DataFrame, optional: Full calibration dataset in natural (unpreprocessed) units. When provided together with y_labels, a mean spectrum is drawn for each class on top of the overall reference spectrum.
y_labelspd.Series, optional: Class labels aligned with the rows of X_natural. Required when X_natural is given.
class_colorsdict[str, str], optional: Mapping from class label to hex/CSS color string. Missing labels fall back to a built-in palette.
widthint, default 1200: Figure width in pixels. Used only for static image exports.
heightint, default 500: Figure height in pixels. Used only for static image exports.

Returns#

pd.DataFrame: Normalized ranking table used in the figure.

plot_faithfulness(output_path: str | pathlib.Path, *, title: str | None = None, width: int = 1100, height: int = 560) → pandas.DataFrame[source]#

Plot the progressive masking faithfulness curve saved in faithfulness_.

Parameters#

output_pathstr or Path: Destination file. Use .html for an interactive figure or an image extension for static export.
titlestr, optional: Figure title override.
widthint, default 1100: Figure width in pixels. Used only for static image exports.
heightint, default 560: Figure height in pixels. Used only for static image exports.

Returns#

pd.DataFrame: Faithfulness masking curve used in the figure.

smx.pipeline#

Attributes#

Classes#

Module Contents#

Parameters#

Attributes (set after fit())#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Attributes (set after `fit()`)#