smx.predicates.metrics#

Predicate metric strategy classes.

Each class implements the compute(bags_dict) → dict[str, DataFrame] interface so they can be swapped transparently in the SMX pipeline.

Available metrics#

CovarianceMetric — covariance (or mutual information) between zone scores and model predictions within each predicate bag.
PerturbationMetric — perturbation-based importance: replace the spectral zone of each predicate with a constant/statistic value and measure the impact on model predictions.

Classes#

`BasePredicateMetric`	Strategy interface for predicate importance metrics.
`CovarianceMetric`	Association metric between zone scores and model predictions.
`PerturbationMetric`	Spectral-perturbation importance metric.

Module Contents#

class smx.predicates.metrics.BasePredicateMetric[source]#

Bases: abc.ABC

Strategy interface for predicate importance metrics.

Subclasses implement compute(), which accepts a bags dictionary (as returned by smx.predicates.bagging.PredicateBagger) and returns a dictionary mapping bag name → DataFrame with columns ['Predicate', <MetricName>].

abstractmethod compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) → Dict[str, pandas.DataFrame][source]#

Compute metric for every predicate in every bag.

Parameters#

bags_dictdict: {'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y', ...]), ...}, ...}

Returns#

dict[str, pd.DataFrame]: {'Bag_1': DataFrame(['Predicate', MetricName]), ...} Each inner DataFrame is sorted descending by the metric column.

class smx.predicates.metrics.CovarianceMetric(metric: Literal['covariance', 'mutual_info'] = 'covariance', threshold: float = 0.01, n_neighbors: int = 10)[source]#

Bases: BasePredicateMetric

Association metric between zone scores and model predictions.

Supports two association measures:

'covariance' — absolute covariance between zone aggregation values and continuous model predictions (linear dependency).
'mutual_info' — mutual information (captures non-linear dependencies, requires scikit-learn).

Parameters#

metric{‘covariance’, ‘mutual_info’}, default ‘covariance’: Association measure to compute.
thresholdfloat, default 0.01: Predicates with metric value ≤ threshold are excluded from the result.
n_neighborsint, default 10: Number of nearest neighbours for mutual information estimation. Ignored when metric='covariance'.

metric = 'covariance'#

threshold = 0.01#

n_neighbors = 10#

property metric_column: str#

compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) → Dict[str, pandas.DataFrame][source]#

Compute the association metric for each predicate in each bag.

Parameters#

bags_dictdict: Bags as returned by smx.predicates.bagging.PredicateBagger.

Returns#

dict[str, pd.DataFrame]: Keys = bag names. Each DataFrame has columns ['Predicate', 'Covariance'] (or 'Mutual_Info'), sorted descending by the metric, filtered by threshold.

class smx.predicates.metrics.PerturbationMetric(estimator: Any, Xcalclass_prep: pandas.DataFrame, predicates_df: pandas.DataFrame, spectral_cuts: List[Tuple[str, float, float]], perturbation_value: float = 0, perturbation_mode: Literal['constant', 'mean', 'median', 'min', 'max'] = 'constant', stats_source: Literal['full', 'predicate'] = 'full', metric: str = 'mean_abs_diff', normalize_by_zone_size: bool = False, zone_size_exponent: float = 1.0, verbose: bool = False, save_detailed_results: bool = True)[source]#

Bases: BasePredicateMetric

Spectral-perturbation importance metric.

For each predicate, the spectral zone is temporarily replaced by a fixed value (or a per-column statistic) and the change in model prediction is measured.

Parameters#

estimatorsklearn-compatible estimator: Trained model with a predict() method.
Xcalclass_preppd.DataFrame: Pre-processed calibration dataset (samples × features).
predicates_dfpd.DataFrame: Predicate catalogue with columns 'rule' and 'zone'.
spectral_cutslist of (name, start, end) tuples: Defines every spectral zone boundary.
perturbation_valuefloat, default 0: Constant replacement value when perturbation_mode='constant'.
perturbation_mode{‘constant’, ‘mean’, ‘median’, ‘min’, ‘max’}, default ‘constant’: How to replace the zone.
stats_source{‘full’, ‘predicate’}, default ‘full’: Data source for computing per-column statistics.
metricstr, default ‘mean_abs_diff’: Importance metric. See smx.predicates.metrics.PerturbationMetric docstring for available options per aim.
normalize_by_zone_sizebool, default False: Divide raw importance by the number of zone features (raised to zone_size_exponent) to compensate for wide-zone bias.
zone_size_exponentfloat, default 1.0: Exponent applied to zone size for normalisation.
verbosebool, default False: Print per-predicate progress.
save_detailed_resultsbool, default True: Attach a '__detailed_perturbation_results__' key to the result.

estimator#

Xcalclass_prep#

predicates_df#

spectral_cuts#

aim = 'classification'#

perturbation_value = 0#

perturbation_mode = 'constant'#

stats_source = 'full'#

metric = 'mean_abs_diff'#

normalize_by_zone_size = False#

zone_size_exponent = 1.0#

verbose = False#

save_detailed_results = True#

property metric_column: str#

compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) → Dict[str, pandas.DataFrame][source]#

Compute perturbation importance for each predicate in each bag.

Parameters#

bags_dictdict: Bags as returned by smx.predicates.bagging.PredicateBagger.

Returns#

dict[str, pd.DataFrame]: Keys = bag names. Each DataFrame has columns ['Predicate', 'Perturbation'], sorted descending. When save_detailed_results=True the key '__detailed_perturbation_results__' is also included.