get_heterogeneity#

skrough.homogeneity.get_heterogeneity(distribution: ndarray[Any, dtype[int64]]) ndarray[Any, dtype[int64]][source]#

Compute distribution heterogeneity.

Compute heterogeneity for a given input distribution. The function is mainly used for computation of heterogeneity of decision attributes. The distribution format is defined as a 2D array where:

  • rows correspond to separate contexts, e.g., groups of objects or equivalence classes,

  • values in columns for a particular row represent discrete distribution, i.e., the number of occurrences of each possible decision attribute distinct value.

The result is a sequence of integer values (0 or >=1), where each corresponds to a group/context (row) in the distribution input. A value of 0 means that there is at most one non-zero value in a given row (meaning that a row is non-heterogenous/homogenous). Values >=1 represent heterogenous rows, where different positive values show different kinds of heterogeneity. E.g., the function distinguishes a row where there are non zero values on positions 0 and 1 from a row where there are non zero values on positions 1 and 2. The actual return value >=1 that corresponds to a given row is created as a binary represented number with bits set for places where discrete distribution counts are greater than 0.

Parameters:

distribution – A 2D array representing a distribution.

Raises:
  • ValueError – If distribution is not a two-dimensional array.

  • ValueError – If the number of columns in the distribution input argument is greater than 63.

Returns:

An array consisting of integer values 0 or >=1 indicating that a corresponding row in the distribution input argument is either non-heterogenous/homogenous (for 0) or heterogenous (for >=1).

Examples

>>> get_heterogeneity(
...     np.asarray(
...         [
...             [0, 0, 0],
...             [1, 0, 0],
...             [0, 1, 0],
...             [0, 0, 1],
...             [1, 1, 0],
...             [1, 9, 0],
...             [9, 1, 0],
...             [1, 0, 1],
...             [1, 0, 9],
...             [9, 0, 1],
...             [0, 1, 1],
...             [0, 9, 1],
...             [0, 1, 9],
...             [1, 1, 1],
...             [1, 8, 9],
...             [8, 9, 1],
...         ]
...     )
... )
array([0, 0, 0, 0, 6, 6, 6, 5, 5, 5, 3, 3, 3, 7, 7, 7])