add_shuffled_attrs#
- skrough.dataprep.add_shuffled_attrs(df: pd.DataFrame, target_attr: str | int, shuffled_attrs_prefix: str = 'shuffled_', seed: rght.Seed = None) pd.DataFrame[source]#
Add shuffled attrs.
Add shuffled counterpart attribute for each conditional attribute (for all but one distinguished target attribute) of the input dataset. A shuffled (reordered) attribute for a given original attribute consists of the same values but permuted in random order. In other words, a shuffled attribute is an attribute of the same empirical distribution as the original one but (possibly) uncorrelated with the target attribute.
- Parameters:
df – Input dataset.
target_attr – Identifier of the target column in the input dataset.
shuffled_attrs_prefix – A prefix for shuffled attribute names.
seed – Random seed. Defaults to
None.
- Returns:
A dataset with shuffled counterpart attributes added.
Examples
>>> df = pd.DataFrame([[5, 3, 3], ... [9, 3, 1], ... [5, 2, 3]], columns=["a", "b", "d"]) >>> add_shuffled_attrs(df, target_attr="d", shuffled_attrs_prefix="s_", seed=0) a b s_a s_b dec 0 5 3 5 2 3 1 9 3 5 3 1 2 5 2 9 3 3