smx.predicates.metrics
======================

.. py:module:: smx.predicates.metrics

.. autoapi-nested-parse::

   Predicate metric strategy classes.

   Each class implements the ``compute(bags_dict) → dict[str, DataFrame]``
   interface so they can be swapped transparently in the SMX pipeline.

   Available metrics
   -----------------
   * :class:`CovarianceMetric` — covariance (or mutual information) between
     zone scores and model predictions within each predicate bag.
   * :class:`PerturbationMetric` — perturbation-based importance: replace the
     spectral zone of each predicate with a constant/statistic value and measure
     the impact on model predictions.


Classes
-------

.. autoapisummary::

   smx.predicates.metrics.BasePredicateMetric
   smx.predicates.metrics.CovarianceMetric
   smx.predicates.metrics.PerturbationMetric


Module Contents
---------------

.. py:class:: BasePredicateMetric

   Bases: :py:obj:`abc.ABC`


   Strategy interface for predicate importance metrics.

   Subclasses implement :meth:`compute`, which accepts a bags dictionary
   (as returned by :class:`smx.predicates.bagging.PredicateBagger`) and
   returns a dictionary mapping bag name → DataFrame with columns
   ``['Predicate', <MetricName>]``.


   .. py:method:: compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) -> Dict[str, pandas.DataFrame]
      :abstractmethod:


      Compute metric for every predicate in every bag.

      Parameters
      ----------
      bags_dict : dict
          ``{'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y', ...]), ...}, ...}``

      Returns
      -------
      dict[str, pd.DataFrame]
          ``{'Bag_1': DataFrame(['Predicate', MetricName]), ...}``
          Each inner DataFrame is sorted descending by the metric column.


.. py:class:: CovarianceMetric(metric: Literal['covariance', 'mutual_info'] = 'covariance', threshold: float = 0.01, n_neighbors: int = 10)

   Bases: :py:obj:`BasePredicateMetric`


   Association metric between zone scores and model predictions.

   Supports two association measures:

   * ``'covariance'`` — absolute covariance between zone aggregation values
     and continuous model predictions (linear dependency).
   * ``'mutual_info'`` — mutual information (captures non-linear dependencies,
     requires ``scikit-learn``).

   Parameters
   ----------
   metric : {'covariance', 'mutual_info'}, default 'covariance'
       Association measure to compute.
   threshold : float, default 0.01
       Predicates with metric value ≤ threshold are excluded from the result.
   n_neighbors : int, default 10
       Number of nearest neighbours for mutual information estimation.
       Ignored when ``metric='covariance'``.


   .. py:attribute:: metric
      :value: 'covariance'


   .. py:attribute:: threshold
      :value: 0.01


   .. py:attribute:: n_neighbors
      :value: 10


   .. py:property:: metric_column
      :type: str


   .. py:method:: compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) -> Dict[str, pandas.DataFrame]

      Compute the association metric for each predicate in each bag.

      Parameters
      ----------
      bags_dict : dict
          Bags as returned by :class:`smx.predicates.bagging.PredicateBagger`.

      Returns
      -------
      dict[str, pd.DataFrame]
          Keys = bag names.  Each DataFrame has columns
          ``['Predicate', 'Covariance']`` (or ``'Mutual_Info'``),
          sorted descending by the metric, filtered by *threshold*.


.. py:class:: PerturbationMetric(estimator: Any, Xcalclass_prep: pandas.DataFrame, predicates_df: pandas.DataFrame, spectral_cuts: List[Tuple[str, float, float]], perturbation_value: float = 0, perturbation_mode: Literal['constant', 'mean', 'median', 'min', 'max'] = 'constant', stats_source: Literal['full', 'predicate'] = 'full', metric: str = 'mean_abs_diff', normalize_by_zone_size: bool = False, zone_size_exponent: float = 1.0, verbose: bool = False, save_detailed_results: bool = True)

   Bases: :py:obj:`BasePredicateMetric`


   Spectral-perturbation importance metric.

   For each predicate, the spectral zone is temporarily replaced by a
   fixed value (or a per-column statistic) and the change in model
   prediction is measured.

   Parameters
   ----------
   estimator : sklearn-compatible estimator
       Trained model with a ``predict()`` method.
   Xcalclass_prep : pd.DataFrame
       Pre-processed calibration dataset (samples × features).
   predicates_df : pd.DataFrame
       Predicate catalogue with columns ``'rule'`` and ``'zone'``.
   spectral_cuts : list of (name, start, end) tuples
       Defines every spectral zone boundary.
   perturbation_value : float, default 0
       Constant replacement value when ``perturbation_mode='constant'``.
   perturbation_mode : {'constant', 'mean', 'median', 'min', 'max'}, default 'constant'
       How to replace the zone.
   stats_source : {'full', 'predicate'}, default 'full'
       Data source for computing per-column statistics.
   metric : str, default 'mean_abs_diff'
       Importance metric. See :class:`smx.predicates.metrics.PerturbationMetric`
       docstring for available options per *aim*.
   normalize_by_zone_size : bool, default False
       Divide raw importance by the number of zone features (raised to
       *zone_size_exponent*) to compensate for wide-zone bias.
   zone_size_exponent : float, default 1.0
       Exponent applied to zone size for normalisation.
   verbose : bool, default False
       Print per-predicate progress.
   save_detailed_results : bool, default True
       Attach a ``'__detailed_perturbation_results__'`` key to the result.


   .. py:attribute:: estimator


   .. py:attribute:: Xcalclass_prep


   .. py:attribute:: predicates_df


   .. py:attribute:: spectral_cuts


   .. py:attribute:: aim
      :value: 'classification'


   .. py:attribute:: perturbation_value
      :value: 0


   .. py:attribute:: perturbation_mode
      :value: 'constant'


   .. py:attribute:: stats_source
      :value: 'full'


   .. py:attribute:: metric
      :value: 'mean_abs_diff'


   .. py:attribute:: normalize_by_zone_size
      :value: False


   .. py:attribute:: zone_size_exponent
      :value: 1.0


   .. py:attribute:: verbose
      :value: False


   .. py:attribute:: save_detailed_results
      :value: True


   .. py:property:: metric_column
      :type: str


   .. py:method:: compute(bags_dict: Dict[str, Dict[str, pandas.DataFrame]]) -> Dict[str, pandas.DataFrame]

      Compute perturbation importance for each predicate in each bag.

      Parameters
      ----------
      bags_dict : dict
          Bags as returned by :class:`smx.predicates.bagging.PredicateBagger`.

      Returns
      -------
      dict[str, pd.DataFrame]
          Keys = bag names.  Each DataFrame has columns
          ``['Predicate', 'Perturbation']``, sorted descending.
          When ``save_detailed_results=True`` the key
          ``'__detailed_perturbation_results__'`` is also included.