A preliminary study of optimal variable weighting in k-means clustering |
| |
Authors: | Paul E. Green Jonathan Kim Frank J. Carmone |
| |
Affiliation: | (1) University of Pennsylvania, Suite 1400 Steinberg Hall-Dietrich Hall, 19104 Philadelphia, PA;(2) Marketing Department, Drexel University, 19104 Philadelphia, PA, USA |
| |
Abstract: | Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of these algorithms cross-validates extremely well.The present study applies a k-means, optimal weighting procedure to two empirical data sets and contrasts its cross-validation performance with that of unit (i.e., equal) weighting of the variables. We find that the optimal weighting procedure cross-validates better in one of the two data sets. In the second data set its comparative performance strongly depends on the approach used to find seed values for the initial k-means partitioning.The authors would like to acknowledge the support of the Citibank Fellowship from the Sol C. Snider Entrepreneurial Center at the Wharton School. The authors would like to express their appreciation to J. Douglas Carroll and Abba M. Kreiger for comments on an earlier version of the paper. |
| |
Keywords: | k-means clustering optimal weighting Rand index Cross validation |
本文献已被 SpringerLink 等数据库收录! |
|