In our paper we propose and study one way of normalizing the
mutual information. There are two factors which make our
normalization attractive.
First, the coefficient we get possesses a consistent behaviour for a
family of test distributions. In a situation where we generate random
variables having a "prescribed amount of dependence" among them,
we obtain a high degree of compatibility between the entropy-based
correlation coefficient and the a priori amount of dependence.
Secondly, the definition of the information and the normalization
procedure generalize directly to three dimensios. They produce a
measure of total dependence among the three variables that
possesses the ability to reveal also inverse association or negative
dependence between the random variables (even for pure categorical
variables).
In the two dimensional case we define the entropy correlation
coefficient r(H) by
(1) r(H) = (2*I(X,Y)/(H(X)+H(Y)))**(1/2) = (2*(1-
H(X,Y)/(H(X)+H(Y))))**(1/2),
where H(X,Y) is the joint entropy of X and Y, H(X) and H(Y) are the
entropies of X and Y, respectively, and I(X,Y) = H(X) + H(Y) - H(X,Y)
is the mutual information between X and Y.
The entropy correlation coefficient is shown to have e.g. the
following properties: r(H) is scaled to (0,1), such that 0 indicates full
independence and 1 complete dependence between the two
variables. Further, r(H) increases almost linearly from 0 to 1 with
increasing amount of dependence between X and Y.
The entropy correlation coefficient for a three-dimensional
distribution is defined as
(2) r(H) = (3*I(X,Y,Z)/(H(X)+H(Y)+H(Z)))**(1/3) ,
where the total information I(X,Y,Z) between the three random
variables X, Y and Z is defined using the entropies of different
orders:
(3) I(X,Y,Z) = H(X,Y,Z) - H(X,Y) - H(Y,Z) - H(Z,X) + H(X) + H(Y) + H(Z).
It can be shown that r(H) in (2) is scaled to (-1, 1), 0 indicating
independence, 1 complete dependence, and -1 complete inverse
dependence.
(Proceedings of The First World Congress of Bernoulli Society, Tashkent 8.-14.9.1986, 4p.)