prepare_factorized_data#

skrough.dataprep.prepare_factorized_data(df: pd.DataFrame, target_attr: str | int) tuple[np.ndarray, np.ndarray, np.ndarray, int][source]#

Factorize conditional and target attrs from data frame.

Factorize data frame and return statistics of feature domain sizes for conditional and target attrs.

Parameters:
  • df – A dataset to be factorized.

  • target_attr – Identifier of the target column in the input dataset.

Returns:

Result is consisted of the following elements

  • factorized conditional data returned in a form of a 2D array

  • conditional data feature domain sizes returned in a form of 1D array, i.e., a single value (domain size) returned for each column

  • factorized target data returned in form of 1d array

  • target feature domain size

Examples

>>> df = pd.DataFrame([[5, 3, 3],
...                    [9, 3, 1],
...                    [5, 2, 3]], columns=["a", "b", "dec"])
>>> prepare_factorized_data(df, target_attr="dec")
(array([[0, 0],
        [1, 0],
        [0, 1]]),
array([2, 2]),
array([0, 1, 0]),
2)