The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering |
| |
Authors: | Fionn Murtagh |
| |
Affiliation: | (1) Genzyme Corporation, Framingham, MA 01702, USA;(2) Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA |
| |
Abstract: | An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|