首页 | 本学科首页   官方微博 | 高级检索  
     

基于属性熵和加权余弦相似度的离群算法
引用本文:刘爱琴,荀亚玲. 基于属性熵和加权余弦相似度的离群算法[J]. 太原科技大学学报, 2014, 0(3): 171-174
作者姓名:刘爱琴  荀亚玲
作者单位:太原科技大学计算机学院,太原030024
基金项目:太原科技大学青年基金项目(20093015)
摘    要:离群点检测是数据挖掘的一个重要研究方向,大多数离群数据挖掘算法在应用到高维数据集时效率较低。给出了一种基于属性熵和加权余弦相似度的离群数据挖掘算法LEAWCD.该算法首先根据局部属性熵分析每个对象在其k-邻域内的局部离群属性,并依据各离群属性的属性偏离度自动设置属性权向量;其次使用对高维数据有效的余弦相似度经加权后度量各对象在k-邻域内的离群程度,实现高维局部离群点检测;最后采用国家天文台提供的天体光谱数据作为数据集,实验验证了LEAWCD算法具有伸缩性强和检测精度高等优点。

关 键 词:属性熵  余弦相似度  离群数据  天体光谱

An Outlier Mining Algorithm Based on Attribute Entropy and Weighted Cosine Similarity
LIU Ai-qin,XUN Ya-ling. An Outlier Mining Algorithm Based on Attribute Entropy and Weighted Cosine Similarity[J]. Journal of Taiyuan University of Science and Technology, 2014, 0(3): 171-174
Authors:LIU Ai-qin  XUN Ya-ling
Affiliation:( School of Computer Science and Technology,Taiyuan University of Science and Technology, Taiyuan 030024, China)
Abstract:Outlier mining is an important branch of data mining field. At present, most of the outlier mining algorithms with high-dimensional data are low efficient. An outlier mining algorithm based on attribute entropy and weighted cosine similarity by the name of LEAWCD,is proposed in this paper. Firstly, the outlier attributes of each object in its k-neighborhood are determined by analyzing local attribute entropy. Secondly, attribute weight vector is set automatically on the basis of deviation degree of outlier attributes. Then the weighted cosine similarity, which is effective for high-dimensional data, is used to measure each object's outlier degree. Thus the local outliers are mined in high-dimensional data. Finally, the experiments show that LEAWCD has strong scalability and high precision by using the celestial spectrum data provided by the National Astronomical Observatory as experimental data.
Keywords:attribute entropy   cosine similarity   outlier data   celestial spectra
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号