Entropy as a Measure of Homogeneity in Categorical Grouping Analysis

Erkki Latosaari and Ilkka Virtanen

Abstract

The paper deals with the concept of Shannon´s entropy from the point of view of statistics. Entropy is considered as a measure of dispersion for a categorical variable. Of special interest in the paper is the case where the classes or categories of the variable have been aggregated to from homogeneous (with respect to the class frequencies) groups. The total entropy of the variable is divided into two components, the entropy between the groups and the entropy within the groups. This division forms the basis for analyzing the homogeneity of the aggregated groups. Further, an entropy-based test statistic, viz. Kullback´s information statistic, is introduced to carry out homogeneity tests for the groups in the case of sample data. The grouping procedure is illustrated with an application to the finnish representative elections.

Key words: categorical variables, entropy decomposition, grouping analysis, information statistic, measure of homogeneity.

(Proceedings of the University of Vaasa,Research Papers, No. 96, 1983, 44 p.