smx.predicates.metrics#

Predicate metric strategy classes.

Each class implements the compute(bags_dict) dict[str, DataFrame] interface so they can be swapped transparently in the SMX pipeline.

Available metrics#

  • CovarianceMetric — covariance (or mutual information) between zone scores and model predictions within each predicate bag.

  • PerturbationMetric — perturbation-based importance: replace the spectral zone of each predicate with a constant/statistic value and measure the impact on model predictions.

Classes#

BasePredicateMetric

Strategy interface for predicate importance metrics.

CovarianceMetric

Association metric between zone scores and model predictions.

PerturbationMetric

Spectral-perturbation importance metric.

Module Contents#

class smx.predicates.metrics.BasePredicateMetric[source]#

Bases: abc.ABC

Strategy interface for predicate importance metrics.

Subclasses implement compute(), which accepts a bags dictionary (as returned by smx.predicates.bagging.PredicateBagger) and returns a dictionary mapping bag name → DataFrame with columns ['Predicate', <MetricName>].

abstractmethod compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) Dict[str, pandas.DataFrame][source]#

Compute metric for every predicate in every bag.

Parameters#

bags_dictdict

{'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y', ...]), ...}, ...}

Returns#

dict[str, pd.DataFrame]

{'Bag_1': DataFrame(['Predicate', MetricName]), ...} Each inner DataFrame is sorted descending by the metric column.

class smx.predicates.metrics.CovarianceMetric(metric: Literal['covariance', 'mutual_info'] = 'covariance', threshold: float = 0.01, n_neighbors: int = 10)[source]#

Bases: BasePredicateMetric

Association metric between zone scores and model predictions.

Supports two association measures:

  • 'covariance' — absolute covariance between zone aggregation values and continuous model predictions (linear dependency).

  • 'mutual_info' — mutual information (captures non-linear dependencies, requires scikit-learn).

Parameters#

metric{‘covariance’, ‘mutual_info’}, default ‘covariance’

Association measure to compute.

thresholdfloat, default 0.01

Predicates with metric value ≤ threshold are excluded from the result.

n_neighborsint, default 10

Number of nearest neighbours for mutual information estimation. Ignored when metric='covariance'.

metric = 'covariance'#
threshold = 0.01#
n_neighbors = 10#
property metric_column: str#
compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) Dict[str, pandas.DataFrame][source]#

Compute the association metric for each predicate in each bag.

Parameters#

bags_dictdict

Bags as returned by smx.predicates.bagging.PredicateBagger.

Returns#

dict[str, pd.DataFrame]

Keys = bag names. Each DataFrame has columns ['Predicate', 'Covariance'] (or 'Mutual_Info'), sorted descending by the metric, filtered by threshold.

class smx.predicates.metrics.PerturbationMetric(estimator: Any, Xcalclass_prep: pandas.DataFrame, predicates_df: pandas.DataFrame, spectral_cuts: List[Tuple[str, float, float]], perturbation_value: float = 0, perturbation_mode: Literal['constant', 'mean', 'median', 'min', 'max'] = 'constant', stats_source: Literal['full', 'predicate'] = 'full', metric: str = 'mean_abs_diff', normalize_by_zone_size: bool = False, zone_size_exponent: float = 1.0, verbose: bool = False, save_detailed_results: bool = True)[source]#

Bases: BasePredicateMetric

Spectral-perturbation importance metric.

For each predicate, the spectral zone is temporarily replaced by a fixed value (or a per-column statistic) and the change in model prediction is measured.

Parameters#

estimatorsklearn-compatible estimator

Trained model with a predict() method.

Xcalclass_preppd.DataFrame

Pre-processed calibration dataset (samples × features).

predicates_dfpd.DataFrame

Predicate catalogue with columns 'rule' and 'zone'.

spectral_cutslist of (name, start, end) tuples

Defines every spectral zone boundary.

perturbation_valuefloat, default 0

Constant replacement value when perturbation_mode='constant'.

perturbation_mode{‘constant’, ‘mean’, ‘median’, ‘min’, ‘max’}, default ‘constant’

How to replace the zone.

stats_source{‘full’, ‘predicate’}, default ‘full’

Data source for computing per-column statistics.

metricstr, default ‘mean_abs_diff’

Importance metric. See smx.predicates.metrics.PerturbationMetric docstring for available options per aim.

normalize_by_zone_sizebool, default False

Divide raw importance by the number of zone features (raised to zone_size_exponent) to compensate for wide-zone bias.

zone_size_exponentfloat, default 1.0

Exponent applied to zone size for normalisation.

verbosebool, default False

Print per-predicate progress.

save_detailed_resultsbool, default True

Attach a '__detailed_perturbation_results__' key to the result.

estimator#
Xcalclass_prep#
predicates_df#
spectral_cuts#
aim = 'classification'#
perturbation_value = 0#
perturbation_mode = 'constant'#
stats_source = 'full'#
metric = 'mean_abs_diff'#
normalize_by_zone_size = False#
zone_size_exponent = 1.0#
verbose = False#
save_detailed_results = True#
property metric_column: str#
compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) Dict[str, pandas.DataFrame][source]#

Compute perturbation importance for each predicate in each bag.

Parameters#

bags_dictdict

Bags as returned by smx.predicates.bagging.PredicateBagger.

Returns#

dict[str, pd.DataFrame]

Keys = bag names. Each DataFrame has columns ['Predicate', 'Perturbation'], sorted descending. When save_detailed_results=True the key '__detailed_perturbation_results__' is also included.