首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于主成分分析及匹配聚类分析的数据表语义压缩方法
引用本文:冯静,金远平,冯欣.基于主成分分析及匹配聚类分析的数据表语义压缩方法[J].东南大学学报(自然科学版),2006,36(6):927-930.
作者姓名:冯静  金远平  冯欣
作者单位:东南大学计算机科学与工程学院,南京,210096;东南大学计算机科学与工程学院,南京,210096;东南大学计算机科学与工程学院,南京,210096
基金项目:国家自然科学基金重大研究资助项目(90412014)、东南大学科学基金资助项目(XJ0409150).
摘    要:提出一种基于主成分分析及匹配聚类分析的数据表语义压缩方法PCA-C lustering.主成分分析利用属性间相关性,提取主成分以实现纵向压缩;匹配聚类通过对匹配程度的量度决定元组的隶属,用较少的簇集代表元组代替所有元组以实现横向压缩,并充分利用较小的允许误差取得更好的压缩比.仿真实验结果表明,在数据属性间线性相关关系明显的情况下,PCA-C lustering在压缩比方面平均优于Fascicles和ItCompress 10%~15%左右;与采用CaRT模型的SPARTAN相比,由于CaRT对于线性相关明显的数值型属性效果不够理想,PCA-C lustering仍然具有较好的压缩比.

关 键 词:语义压缩  主成分分析  匹配程度
文章编号:1001-0505(2006)06-0927-04
收稿时间:03 15 2006 12:00AM
修稿时间:2006-03-15

Semantic compression for data tables based on principal component and matching clustering analysis
Feng Jing,Jin Yuanping,Feng Xin.Semantic compression for data tables based on principal component and matching clustering analysis[J].Journal of Southeast University(Natural Science Edition),2006,36(6):927-930.
Authors:Feng Jing  Jin Yuanping  Feng Xin
Institution:School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
Abstract:A principal component analysis and matching clustering based approach to semantic compression for data tables,PCA-Clustering,is proposed.The principal component analysis extracts the principal component and implements the column-wise compression,using the correlation between attributes.The matching clustering analysis determines which group a row should belong to through matching degree measurement,replacing all rows with the cluster representative rows of which the number is much small and thus implementing the row-wise compression.The simulation experiment results show that when there is a strong linear correlation between data attributes,PCA-Clustering can achieve better compression effect than existed methods.More specifically,the compression ratio of PCA-Clustering is about 10%-15% higher than that of Fascicles and ItCompress.Compared with SPARTAN using CaRT model,PCA-Clustering also has a better compression ratio because CaRT is not very effective for numeric attributes with a strong linear correlation.
Keywords:semantic compression  principal component analysis  matching degree
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号