On Joint Dimension Reduction and Clustering of Categorical Data


Speaker


Abstract

There exist several methods for clustering high-dimensional data. One popular approach is to use a two-step procedure. In the first step, a dimension reduction technique is used to reduce the dimensionality of the data. In the second step, cluster analysis is applied to the data in the reduced space. This method may be referred to as the tandem approach. An important drawback of this method is that the dimension reduction may distort or hide the cluster structure. Vichi and Kiers (2001) showed in a simulation study how the tandem approach may fail to retrieve the clusters in low dimensional space.In the context of categorical data, Van Buuren and Heiser (1989) proposed a method in which object scores are restricted using cluster memberships. Hwang, Dillon and Takane (2006) proposed a joined multiple correspondence analysis (MCA) and K-means clustering method that uses user-specified weights for the clustering and dimension reduction parts. For binary data, Iodice DEnza and Palumbo (2012) recently proposed a new method which they refer to as iterative factorial clustering for binary data (i-FCB). In this paper we review existing joint dimension reduction and clustering methods in a unified framework that facilitates comparison.

This event is organised by the Econometric Institute.
Twitter: @MetricsSeminars