heterogeneous_groups_decisions_replace#

skrough.homogeneity.heterogeneous_groups_decisions_replace(x: ndarray, x_counts: ndarray, y: ndarray, y_count: int, attrs: Union[Sequence[int], ndarray[Any, dtype[int64]]], distinguish_generalized_decisions: bool = False) Tuple[ndarray, int][source]#

Return consistent decision values.

Prepare new decision values in a way that makes data consistent (in the meaning of a consistent decision table). The groups (equivalence classes in the context of the indiscernibility relation) are induced from the given dataset x and a subset of attributes attrs. Original decisions y are then processed to prepare new consistent decision values. It is done by preserving decision values for homogenous groups and replacing decisions for objects from heterogenous ones. The distinguish_generalized_decisions boolean flag can be used to control whether heterogenous groups should be distinguished from each other (distinguish_generalized_decisions is True) or treated equally (distinguish_generalized_decisions is False). Distinguishing the heterogenous groups means that objects from groups of different characteristics (a different subset of decision values appearing in a group, cf. get_heterogeneity()) are assigned different new decision values. When heterogenous groups are not to be distinguished then objects from all heterogenous groups are assigned the same new decision value.

Parameters:
  • x – Factorized data table representing conditional features/attributes for the objects the computation should be performed on. The values in each column should be given in a form of integer-location based indexing sequence of the factorized conditional attribute values, i.e., 0-based values that index distinct values of the conditional attribute.

  • x_counts – Number of distinct attribute values given for each conditional attribute. The argument is expected to be given as a 1D array.

  • y – Factorized decision values for the objects represented by the input x argument. The values should be given in a form of integer-location based indexing sequence of the factorized decision values, i.e., 0-based values that index distinct decisions.

  • y_count – Number of distinct decision attribute values.

  • attrs – A subset of conditional attributes the check should be performed on. It should be given in a form of a sequence of integer-location based indexing of the selected conditional attributes from x. None value means to use all available conditional attributes. Defaults to None.

  • distinguish_generalized_decisions – A flag to control whether heterogenous groups should be distinguished from each other or not. Defaults to False.

Returns:

New decision values returned in a form of 2-element tuple with the following elements

  • factorized decision attribute returned in form of 1d array

  • decision attribute domain size

The new decision values together with the input data x and x_counts form a consistent decision table.

Examples

>>> from skrough.dataprep import (
...     prepare_factorized_array,
...     prepare_factorized_vector
... )
>>> x, x_counts = prepare_factorized_array(np.asarray([[8, 8, 8],
...                                                    [8, 8, 8],
...                                                    [1, 7, 8],
...                                                    [1, 8, 8],
...                                                    [1, 1, 8],
...                                                    [1, 1, 1]]))
>>> y, y_count = prepare_factorized_vector(np.asarray([3, 4, 8, 9, 4, 5]))
>>> y, y_count
(array([0, 1, 2, 3, 1, 3]), 5)
>>> replace_heterogeneous_groups_decisions(
...     x,
...     x_counts,
...     y,
...     y_count,
...     attrs=[0, 1],
...     distinguish_generalized_decisions=False,
... )
(array([5, 5, 2, 3, 5, 5]), 6)
>>> replace_heterogeneous_groups_decisions(
...     x,
...     x_counts,
...     y,
...     y_count,
...     attrs=[0, 1],
...     distinguish_generalized_decisions=True,
... )
(array([6, 6, 2, 3, 5, 5]), 7)