首页 | 本学科首页   官方微博 | 高级检索  
     

用半监督聚类算法实现WEB文本挖掘
引用本文:胡敏杰. 用半监督聚类算法实现WEB文本挖掘[J]. 漳州师院学报, 2010, 0(4): 50-57
作者姓名:胡敏杰
作者单位:漳州师范学院计算中心,福建漳州363000
摘    要:随着Internet网络的高速发展,海量的未标签文档和相对少量的已标签文档是当前Web文档的一个普遍情形,如何有效的利用少量的已标签文档去聚类海量的未标签文档,从而更好地获取有价值的信息,即半监督学习问题,已成为当前研究的热点.本文针对目前Web文本挖掘领域的无监督学习算法的检测率不高,而监督学习算法需要大量的标签数据又不易获得的问题,将半监督中的标签绑定技术与优化球形k-均值聚类算法相结合进行Web文本挖掘,并使用真实的测试数据对Web文本挖掘系统进行实验.结果表明本文方法对有价值文本具有较高检测率及较低的误报率,整体检测性能优于基于监督和无监督学习的Web文本挖掘算法.

关 键 词:Web文本挖掘  聚类  半监督

Semi-supervised Clustering Method Based on Web Text Mining
HU Min-jie. Semi-supervised Clustering Method Based on Web Text Mining[J]. Journal of ZhangZhou Teachers College(Philosophy & Social Sciences), 2010, 0(4): 50-57
Authors:HU Min-jie
Affiliation:HU Min-jie(Computing Center,Zhangzhou Normal University,Zhangzhou,Fujian 363000,China)
Abstract:With the rapid development of Internet network,magnanimous data that being not tag or less tag data is a universal situation in present Web documents.How to get useful information better using a small amount of data that have been tagged to clustering mass of not tag data,that is Semi-supervised learning question,has been a hot point in research today.in view of low examination rate in web text mining's non-supervised learning and a lot of tag data being not easily available in supervised learning algorithm,therefor tag binding of semi-supervised learning algorithm and refined bisecting k-means clustering algorithm will be combined for Web text mining.Finally real examination data for experiment shows that this mathod has higher examination rate and lower misinformation rate for texts on the value.It's overall detection performance is superior to the web text mining algorithms based on non-supervised learning and the supervision learning.
Keywords:Web text mining  Clustering  semi-supervised
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号