smx.predicates.metrics ====================== .. py:module:: smx.predicates.metrics .. autoapi-nested-parse:: Predicate metric strategy classes. Each class implements the ``compute(bags_dict) → dict[str, DataFrame]`` interface so they can be swapped transparently in the SMX pipeline. Available metrics ----------------- * :class:`CovarianceMetric` — covariance (or mutual information) between zone scores and model predictions within each predicate bag. * :class:`PerturbationMetric` — perturbation-based importance: replace the spectral zone of each predicate with a constant/statistic value and measure the impact on model predictions. Classes ------- .. autoapisummary:: smx.predicates.metrics.BasePredicateMetric smx.predicates.metrics.CovarianceMetric smx.predicates.metrics.PerturbationMetric Module Contents --------------- .. py:class:: BasePredicateMetric Bases: :py:obj:`abc.ABC` Strategy interface for predicate importance metrics. Subclasses implement :meth:`compute`, which accepts a bags dictionary (as returned by :class:`smx.predicates.bagging.PredicateBagger`) and returns a dictionary mapping bag name → DataFrame with columns ``['Predicate', ]``. .. py:method:: compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) -> Dict[str, pandas.DataFrame] :abstractmethod: Compute metric for every predicate in every bag. Parameters ---------- bags_dict : dict ``{'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y', ...]), ...}, ...}`` Returns ------- dict[str, pd.DataFrame] ``{'Bag_1': DataFrame(['Predicate', MetricName]), ...}`` Each inner DataFrame is sorted descending by the metric column. .. py:class:: CovarianceMetric(metric: Literal['covariance', 'mutual_info'] = 'covariance', threshold: float = 0.01, n_neighbors: int = 10) Bases: :py:obj:`BasePredicateMetric` Association metric between zone scores and model predictions. Supports two association measures: * ``'covariance'`` — absolute covariance between zone aggregation values and continuous model predictions (linear dependency). * ``'mutual_info'`` — mutual information (captures non-linear dependencies, requires ``scikit-learn``). Parameters ---------- metric : {'covariance', 'mutual_info'}, default 'covariance' Association measure to compute. threshold : float, default 0.01 Predicates with metric value ≤ threshold are excluded from the result. n_neighbors : int, default 10 Number of nearest neighbours for mutual information estimation. Ignored when ``metric='covariance'``. .. py:attribute:: metric :value: 'covariance' .. py:attribute:: threshold :value: 0.01 .. py:attribute:: n_neighbors :value: 10 .. py:property:: metric_column :type: str .. py:method:: compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) -> Dict[str, pandas.DataFrame] Compute the association metric for each predicate in each bag. Parameters ---------- bags_dict : dict Bags as returned by :class:`smx.predicates.bagging.PredicateBagger`. Returns ------- dict[str, pd.DataFrame] Keys = bag names. Each DataFrame has columns ``['Predicate', 'Covariance']`` (or ``'Mutual_Info'``), sorted descending by the metric, filtered by *threshold*. .. py:class:: PerturbationMetric(estimator: Any, Xcalclass_prep: pandas.DataFrame, predicates_df: pandas.DataFrame, spectral_cuts: List[Tuple[str, float, float]], perturbation_value: float = 0, perturbation_mode: Literal['constant', 'mean', 'median', 'min', 'max'] = 'constant', stats_source: Literal['full', 'predicate'] = 'full', metric: str = 'mean_abs_diff', normalize_by_zone_size: bool = False, zone_size_exponent: float = 1.0, verbose: bool = False, save_detailed_results: bool = True) Bases: :py:obj:`BasePredicateMetric` Spectral-perturbation importance metric. For each predicate, the spectral zone is temporarily replaced by a fixed value (or a per-column statistic) and the change in model prediction is measured. Parameters ---------- estimator : sklearn-compatible estimator Trained model with a ``predict()`` method. Xcalclass_prep : pd.DataFrame Pre-processed calibration dataset (samples × features). predicates_df : pd.DataFrame Predicate catalogue with columns ``'rule'`` and ``'zone'``. spectral_cuts : list of (name, start, end) tuples Defines every spectral zone boundary. perturbation_value : float, default 0 Constant replacement value when ``perturbation_mode='constant'``. perturbation_mode : {'constant', 'mean', 'median', 'min', 'max'}, default 'constant' How to replace the zone. stats_source : {'full', 'predicate'}, default 'full' Data source for computing per-column statistics. metric : str, default 'mean_abs_diff' Importance metric. See :class:`smx.predicates.metrics.PerturbationMetric` docstring for available options per *aim*. normalize_by_zone_size : bool, default False Divide raw importance by the number of zone features (raised to *zone_size_exponent*) to compensate for wide-zone bias. zone_size_exponent : float, default 1.0 Exponent applied to zone size for normalisation. verbose : bool, default False Print per-predicate progress. save_detailed_results : bool, default True Attach a ``'__detailed_perturbation_results__'`` key to the result. .. py:attribute:: estimator .. py:attribute:: Xcalclass_prep .. py:attribute:: predicates_df .. py:attribute:: spectral_cuts .. py:attribute:: aim :value: 'classification' .. py:attribute:: perturbation_value :value: 0 .. py:attribute:: perturbation_mode :value: 'constant' .. py:attribute:: stats_source :value: 'full' .. py:attribute:: metric :value: 'mean_abs_diff' .. py:attribute:: normalize_by_zone_size :value: False .. py:attribute:: zone_size_exponent :value: 1.0 .. py:attribute:: verbose :value: False .. py:attribute:: save_detailed_results :value: True .. py:property:: metric_column :type: str .. py:method:: compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) -> Dict[str, pandas.DataFrame] Compute perturbation importance for each predicate in each bag. Parameters ---------- bags_dict : dict Bags as returned by :class:`smx.predicates.bagging.PredicateBagger`. Returns ------- dict[str, pd.DataFrame] Keys = bag names. Each DataFrame has columns ``['Predicate', 'Perturbation']``, sorted descending. When ``save_detailed_results=True`` the key ``'__detailed_perturbation_results__'`` is also included.