heterogeneous_groups_decisions_replace#
- skrough.homogeneity.heterogeneous_groups_decisions_replace(x: ndarray, x_counts: ndarray, y: ndarray, y_count: int, attrs: Union[Sequence[int], ndarray[Any, dtype[int64]]], distinguish_generalized_decisions: bool = False) Tuple[ndarray, int][source]#
Return consistent decision values.
Prepare new decision values in a way that makes data consistent (in the meaning of a consistent decision table). The groups (equivalence classes in the context of the indiscernibility relation) are induced from the given dataset
xand a subset of attributesattrs. Original decisionsyare then processed to prepare new consistent decision values. It is done by preserving decision values for homogenous groups and replacing decisions for objects from heterogenous ones. Thedistinguish_generalized_decisionsboolean flag can be used to control whether heterogenous groups should be distinguished from each other (distinguish_generalized_decisions is True) or treated equally (distinguish_generalized_decisions is False). Distinguishing the heterogenous groups means that objects from groups of different characteristics (a different subset of decision values appearing in a group, cf.get_heterogeneity()) are assigned different new decision values. When heterogenous groups are not to be distinguished then objects from all heterogenous groups are assigned the same new decision value.- Parameters:
x – Factorized data table representing conditional features/attributes for the objects the computation should be performed on. The values in each column should be given in a form of integer-location based indexing sequence of the factorized conditional attribute values, i.e., 0-based values that index distinct values of the conditional attribute.
x_counts – Number of distinct attribute values given for each conditional attribute. The argument is expected to be given as a 1D array.
y – Factorized decision values for the objects represented by the input
xargument. The values should be given in a form of integer-location based indexing sequence of the factorized decision values, i.e., 0-based values that index distinct decisions.y_count – Number of distinct decision attribute values.
attrs – A subset of conditional attributes the check should be performed on. It should be given in a form of a sequence of integer-location based indexing of the selected conditional attributes from
x.Nonevalue means to use all available conditional attributes. Defaults toNone.distinguish_generalized_decisions – A flag to control whether heterogenous groups should be distinguished from each other or not. Defaults to
False.
- Returns:
New decision values returned in a form of 2-element tuple with the following elements
factorized decision attribute returned in form of 1d array
decision attribute domain size
The new decision values together with the input data
xandx_countsform a consistent decision table.
Examples
>>> from skrough.dataprep import ( ... prepare_factorized_array, ... prepare_factorized_vector ... ) >>> x, x_counts = prepare_factorized_array(np.asarray([[8, 8, 8], ... [8, 8, 8], ... [1, 7, 8], ... [1, 8, 8], ... [1, 1, 8], ... [1, 1, 1]])) >>> y, y_count = prepare_factorized_vector(np.asarray([3, 4, 8, 9, 4, 5])) >>> y, y_count (array([0, 1, 2, 3, 1, 3]), 5) >>> replace_heterogeneous_groups_decisions( ... x, ... x_counts, ... y, ... y_count, ... attrs=[0, 1], ... distinguish_generalized_decisions=False, ... ) (array([5, 5, 2, 3, 5, 5]), 6) >>> replace_heterogeneous_groups_decisions( ... x, ... x_counts, ... y, ... y_count, ... attrs=[0, 1], ... distinguish_generalized_decisions=True, ... ) (array([6, 6, 2, 3, 5, 5]), 7)