smx.predicates.metrics#
Predicate metric strategy classes.
Each class implements the compute(bags_dict) → dict[str, DataFrame]
interface so they can be swapped transparently in the SMX pipeline.
Available metrics#
CovarianceMetric— covariance (or mutual information) between zone scores and model predictions within each predicate bag.PerturbationMetric— perturbation-based importance: replace the spectral zone of each predicate with a constant/statistic value and measure the impact on model predictions.
Classes#
Strategy interface for predicate importance metrics. |
|
Association metric between zone scores and model predictions. |
|
Spectral-perturbation importance metric. |
Module Contents#
- class smx.predicates.metrics.BasePredicateMetric[source]#
Bases:
abc.ABCStrategy interface for predicate importance metrics.
Subclasses implement
compute(), which accepts a bags dictionary (as returned bysmx.predicates.bagging.PredicateBagger) and returns a dictionary mapping bag name → DataFrame with columns['Predicate', <MetricName>].- abstractmethod compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) Dict[str, pandas.DataFrame][source]#
Compute metric for every predicate in every bag.
Parameters#
- bags_dictdict
{'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y', ...]), ...}, ...}
Returns#
- dict[str, pd.DataFrame]
{'Bag_1': DataFrame(['Predicate', MetricName]), ...}Each inner DataFrame is sorted descending by the metric column.
- class smx.predicates.metrics.CovarianceMetric(metric: Literal['covariance', 'mutual_info'] = 'covariance', threshold: float = 0.01, n_neighbors: int = 10)[source]#
Bases:
BasePredicateMetricAssociation metric between zone scores and model predictions.
Supports two association measures:
'covariance'— absolute covariance between zone aggregation values and continuous model predictions (linear dependency).'mutual_info'— mutual information (captures non-linear dependencies, requiresscikit-learn).
Parameters#
- metric{‘covariance’, ‘mutual_info’}, default ‘covariance’
Association measure to compute.
- thresholdfloat, default 0.01
Predicates with metric value ≤ threshold are excluded from the result.
- n_neighborsint, default 10
Number of nearest neighbours for mutual information estimation. Ignored when
metric='covariance'.
- metric = 'covariance'#
- threshold = 0.01#
- n_neighbors = 10#
- compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) Dict[str, pandas.DataFrame][source]#
Compute the association metric for each predicate in each bag.
Parameters#
- bags_dictdict
Bags as returned by
smx.predicates.bagging.PredicateBagger.
Returns#
- dict[str, pd.DataFrame]
Keys = bag names. Each DataFrame has columns
['Predicate', 'Covariance'](or'Mutual_Info'), sorted descending by the metric, filtered by threshold.
- class smx.predicates.metrics.PerturbationMetric(estimator: Any, Xcalclass_prep: pandas.DataFrame, predicates_df: pandas.DataFrame, spectral_cuts: List[Tuple[str, float, float]], perturbation_value: float = 0, perturbation_mode: Literal['constant', 'mean', 'median', 'min', 'max'] = 'constant', stats_source: Literal['full', 'predicate'] = 'full', metric: str = 'mean_abs_diff', normalize_by_zone_size: bool = False, zone_size_exponent: float = 1.0, verbose: bool = False, save_detailed_results: bool = True)[source]#
Bases:
BasePredicateMetricSpectral-perturbation importance metric.
For each predicate, the spectral zone is temporarily replaced by a fixed value (or a per-column statistic) and the change in model prediction is measured.
Parameters#
- estimatorsklearn-compatible estimator
Trained model with a
predict()method.- Xcalclass_preppd.DataFrame
Pre-processed calibration dataset (samples × features).
- predicates_dfpd.DataFrame
Predicate catalogue with columns
'rule'and'zone'.- spectral_cutslist of (name, start, end) tuples
Defines every spectral zone boundary.
- perturbation_valuefloat, default 0
Constant replacement value when
perturbation_mode='constant'.- perturbation_mode{‘constant’, ‘mean’, ‘median’, ‘min’, ‘max’}, default ‘constant’
How to replace the zone.
- stats_source{‘full’, ‘predicate’}, default ‘full’
Data source for computing per-column statistics.
- metricstr, default ‘mean_abs_diff’
Importance metric. See
smx.predicates.metrics.PerturbationMetricdocstring for available options per aim.- normalize_by_zone_sizebool, default False
Divide raw importance by the number of zone features (raised to zone_size_exponent) to compensate for wide-zone bias.
- zone_size_exponentfloat, default 1.0
Exponent applied to zone size for normalisation.
- verbosebool, default False
Print per-predicate progress.
- save_detailed_resultsbool, default True
Attach a
'__detailed_perturbation_results__'key to the result.
- estimator#
- Xcalclass_prep#
- predicates_df#
- spectral_cuts#
- aim = 'classification'#
- perturbation_value = 0#
- perturbation_mode = 'constant'#
- stats_source = 'full'#
- metric = 'mean_abs_diff'#
- normalize_by_zone_size = False#
- zone_size_exponent = 1.0#
- verbose = False#
- save_detailed_results = True#
- compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) Dict[str, pandas.DataFrame][source]#
Compute perturbation importance for each predicate in each bag.
Parameters#
- bags_dictdict
Bags as returned by
smx.predicates.bagging.PredicateBagger.
Returns#
- dict[str, pd.DataFrame]
Keys = bag names. Each DataFrame has columns
['Predicate', 'Perturbation'], sorted descending. Whensave_detailed_results=Truethe key'__detailed_perturbation_results__'is also included.