首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于半监督学习的中文多文档子主题划分
引用本文:徐晓丹.基于半监督学习的中文多文档子主题划分[J].浙江师范大学学报(自然科学版),2011,34(3):302-305.
作者姓名:徐晓丹
作者单位:浙江师范大学 数理与信息工程学院,浙江 金华,321004
摘    要:为了能在多文档自动摘要过程中更好地划分子主题,提出了一种基于半监督学习的子主题划分方法:首先计算句子的语义相似度;然后通过层次聚类对可信度高的句子进行主题类别标记,生成少量已标记主题类别的句子集,在此基础上对所有句子进行constrained-k-means聚类,通过交叉验证的方法确定子主题的数目k;最后使用k-means聚类获得多文档的各个子主题.实验结果表明,该方法有效地提高了子主题的识别率.

关 键 词:多文档文摘  子主题  半监督学习  k-means聚类

Sub-topic detecting for chinese multi-documentsbased on semi-supervised learning
XU Xiaodan.Sub-topic detecting for chinese multi-documentsbased on semi-supervised learning[J].Journal of Zhejiang Normal University Natural Sciences,2011,34(3):302-305.
Authors:XU Xiaodan
Institution:XU Xiaodan(College of Mathematics,Physics and Information Engineering,Zhejiang Normal University,Jinhua Zhejiang 321004,China)
Abstract:Aimed to depart the sub-topic of multi-documents more effectively,it was proposed a new method based on semi-supervised learning: it firstly got the primal sets of topics by hierarchy clustering based on semantic distance of sentences,and labeled the sentences which had high scores in the topics,then used the method of constrained-k-means to decide the number of topics k,and finally obtained the topic sets by k-means clustering.The experiment results indicated that this method improved the accuracy of sub-t...
Keywords:multi-documents summarization  sub-topic  semi-supervised learning  k-means clustering  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号