prepare_factorized_data#
- skrough.dataprep.prepare_factorized_data(df: pd.DataFrame, target_attr: str | int) tuple[np.ndarray, np.ndarray, np.ndarray, int][source]#
Factorize conditional and target attrs from data frame.
Factorize data frame and return statistics of feature domain sizes for conditional and target attrs.
- Parameters:
df – A dataset to be factorized.
target_attr – Identifier of the target column in the input dataset.
- Returns:
Result is consisted of the following elements
factorized conditional data returned in a form of a 2D array
conditional data feature domain sizes returned in a form of 1D array, i.e., a single value (domain size) returned for each column
factorized target data returned in form of 1d array
target feature domain size
Examples
>>> df = pd.DataFrame([[5, 3, 3], ... [9, 3, 1], ... [5, 2, 3]], columns=["a", "b", "dec"]) >>> prepare_factorized_data(df, target_attr="dec") (array([[0, 0], [1, 0], [0, 1]]), array([2, 2]), array([0, 1, 0]), 2)