smx.predicates.bagging
======================

.. py:module:: smx.predicates.bagging

.. autoapi-nested-parse::

   PredicateBagger: bootstrap/subsample predicates across multiple bags for
   robust metric estimation.


Classes
-------

.. autoapisummary::

   smx.predicates.bagging.PredicateBagger


Module Contents
---------------

.. py:class:: PredicateBagger(random_seed, n_bags: int = 10, n_predicates_per_bag: int = 20, n_samples_fraction: float = 0.8, replace: bool = False, sample_bagging: bool = True, predicate_bagging: bool = False)

   Perform predicate bagging with granular control over sampling strategy.

   Bagging creates repeated random subsets of samples and/or predicates,
   evaluating each predicate on the subset.  This yields a distribution of
   predicate coverage that is used downstream to compute robust association
   metrics (see :mod:`smx.predicates.metrics`).

   Parameters
   ----------
   n_bags : int, default 50
       Number of bags (iterations) to create.
   n_predicates_per_bag : int, default 20
       Number of predicates to draw per bag (ignored when
       ``predicate_bagging=False``).
   n_samples_fraction : float, default 0.8
       Fraction of samples to draw per bag (ignored when
       ``sample_bagging=False``).  The minimum samples per predicate is
       hardcoded to 20 % of the dataset.
   replace : bool, default True
       Whether to sample with replacement (bootstrap).  Ignored when
       ``sample_bagging=False``.
   random_seed : int, default 42
       Base random seed for reproducibility.
   sample_bagging : bool, default True
       If ``False``, all samples are used in every bag.
   predicate_bagging : bool, default True
       If ``False``, all predicates are used in every bag.


   .. py:attribute:: n_bags
      :value: 10


   .. py:attribute:: n_predicates_per_bag
      :value: 20


   .. py:attribute:: n_samples_fraction
      :value: 0.8


   .. py:attribute:: replace
      :value: False


   .. py:attribute:: random_seed


   .. py:attribute:: sample_bagging
      :value: True


   .. py:attribute:: predicate_bagging
      :value: False


   .. py:method:: run(zone_scores_df: pandas.DataFrame, y_predicted_numeric: Union[pandas.Series, numpy.ndarray], predicates_df: pandas.DataFrame) -> Dict[str, Dict[str, pandas.DataFrame]]

      Create bags by sampling samples and/or predicates.

      Parameters
      ----------
      zone_scores_df : pd.DataFrame
          Aggregated zone scores (samples × zones).
      y_predicted_numeric : pd.Series or np.ndarray
          Continuous model predictions aligned with *zone_scores_df*.
      predicates_df : pd.DataFrame
          Predicate catalogue with columns ``rule``, ``zone``,
          ``thresholds``, ``operator``.

      Returns
      -------
      dict
          ``{'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y',
          'Sample_Index']), ...}, 'Bag_2': ...}``