add_shuffled_attrs#

skrough.dataprep.add_shuffled_attrs(df: pd.DataFrame, target_attr: str | int, shuffled_attrs_prefix: str = 'shuffled_', seed: rght.Seed = None) pd.DataFrame[source]#

Add shuffled attrs.

Add shuffled counterpart attribute for each conditional attribute (for all but one distinguished target attribute) of the input dataset. A shuffled (reordered) attribute for a given original attribute consists of the same values but permuted in random order. In other words, a shuffled attribute is an attribute of the same empirical distribution as the original one but (possibly) uncorrelated with the target attribute.

Parameters:
  • df – Input dataset.

  • target_attr – Identifier of the target column in the input dataset.

  • shuffled_attrs_prefix – A prefix for shuffled attribute names.

  • seed – Random seed. Defaults to None.

Returns:

A dataset with shuffled counterpart attributes added.

Examples

>>> df = pd.DataFrame([[5, 3, 3],
...                    [9, 3, 1],
...                    [5, 2, 3]], columns=["a", "b", "d"])
>>> add_shuffled_attrs(df, target_attr="d", shuffled_attrs_prefix="s_", seed=0)
   a  b  s_a  s_b  dec
0  5  3    5    2    3
1  9  3    5    3    1
2  5  2    9    3    3