基于互信息的Web文档聚类方法 Method of Web Document Clustering Based on Mutual Information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于互信息的Web文档聚类方法

引用本文：	索红光,杨涛. 基于互信息的Web文档聚类方法[J]. 广西师范大学学报(自然科学版), 2007, 25(2): 131-134

作者姓名：	索红光杨涛

作者单位：	中国石油大学,计算机与通信工程学院,山东,东营,257061;北京理工大学,计算机科学技术学院,北京,100081;中国石油大学,计算机与通信工程学院,山东,东营,257061

基金项目：	国家自然科学基金资助项目(60503050)

摘要：	由于网络信息的激增,如何充分利用大量的信息,并有效地为Web用户服务成为一个急需解决的问题。相关研究表明利用Web文档聚类的方法可以缩小信息检索的范围,提高查询准确率。通过分析Web文档的特征以及常用Web文档聚类方法的优缺点,提出了一种基于互信息理论的Web文档聚类的方法。在聚类的过程中,计算特征词之间的互信息值,根据阈值判断特征词是否属于同一类别。实验结果表明,该方法与K-Means聚类算法相比较,在准确率和召回率方面均有提高。
关键词：	信息检索文档聚类互信息特征选取向量空间模型
文章编号：	1001-6600（2007）02-0131-04
收稿时间：	2006-12-15
修稿时间：	2006-12-15
Method of Web Document Clustering Based on Mutual Information

SUO Hong-guang,YANG Tao. Method of Web Document Clustering Based on Mutual Information[J]. Journal of Guangxi Normal University(Natural Science Edition), 2007, 25(2): 131-134

Authors:	SUO Hong-guang YANG Tao

Affiliation:	1. College of Computer and Communication Engineering,China University of Petroleum,Dongying 257061 ,China; 2. School of Computer Science and Engineering,Beijing Institute of Technology,Beijing 100081 ,China

Abstract:	With the increase of information on Web,making full use of information and providing effective services become a burning problem.The scope of search is reduced and the precision of information retrieval is raised based on Web document clustering.The characteristics of the text as well as commonly used text clustering method is analyzed,a method of Web document clustering is proposed based on mutual information.In the process of clustering,mutual information value of terms is calculated to judge whether they are in the same sort according to the threshold.Evaluation results show that the precision and the recall can be significantly improved compared with K-Means clustering method.

Keywords:	information retrieval document clustering mutual information term selection vector space mode
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏