smx.predicates.bagging ====================== .. py:module:: smx.predicates.bagging .. autoapi-nested-parse:: PredicateBagger: bootstrap/subsample predicates across multiple bags for robust metric estimation. Classes ------- .. autoapisummary:: smx.predicates.bagging.PredicateBagger Module Contents --------------- .. py:class:: PredicateBagger(random_seed, n_bags: int = 10, n_predicates_per_bag: int = 20, n_samples_fraction: float = 0.8, replace: bool = False, sample_bagging: bool = True, predicate_bagging: bool = False) Perform predicate bagging with granular control over sampling strategy. Bagging creates repeated random subsets of samples and/or predicates, evaluating each predicate on the subset. This yields a distribution of predicate coverage that is used downstream to compute robust association metrics (see :mod:`smx.predicates.metrics`). Parameters ---------- n_bags : int, default 50 Number of bags (iterations) to create. n_predicates_per_bag : int, default 20 Number of predicates to draw per bag (ignored when ``predicate_bagging=False``). n_samples_fraction : float, default 0.8 Fraction of samples to draw per bag (ignored when ``sample_bagging=False``). The minimum samples per predicate is hardcoded to 20 % of the dataset. replace : bool, default True Whether to sample with replacement (bootstrap). Ignored when ``sample_bagging=False``. random_seed : int, default 42 Base random seed for reproducibility. sample_bagging : bool, default True If ``False``, all samples are used in every bag. predicate_bagging : bool, default True If ``False``, all predicates are used in every bag. .. py:attribute:: n_bags :value: 10 .. py:attribute:: n_predicates_per_bag :value: 20 .. py:attribute:: n_samples_fraction :value: 0.8 .. py:attribute:: replace :value: False .. py:attribute:: random_seed .. py:attribute:: sample_bagging :value: True .. py:attribute:: predicate_bagging :value: False .. py:method:: run(zone_scores_df: pandas.DataFrame, y_predicted_numeric: Union[pandas.Series, numpy.ndarray], predicates_df: pandas.DataFrame) -> Dict[str, Dict[str, pandas.DataFrame]] Create bags by sampling samples and/or predicates. Parameters ---------- zone_scores_df : pd.DataFrame Aggregated zone scores (samples × zones). y_predicted_numeric : pd.Series or np.ndarray Continuous model predictions aligned with *zone_scores_df*. predicates_df : pd.DataFrame Predicate catalogue with columns ``rule``, ``zone``, ``thresholds``, ``operator``. Returns ------- dict ``{'Bag_1': {rule: DataFrame(['Zone_Sum', 'Predicted_Y', 'Sample_Index']), ...}, 'Bag_2': ...}``