首页 | 本学科首页   官方微博 | 高级检索  
     

基于词语相关度的文档主题抽取算法
引用本文:袁晓峰. 基于词语相关度的文档主题抽取算法[J]. 成都大学学报(自然科学版), 2012, 31(4): 367-369
作者姓名:袁晓峰
作者单位:盐城师范学院信息科学与技术学院,江苏盐城,224002
摘    要:考虑到文档中出现频率较高的词语能够体现文档的主题,设计了一种中文文档主题抽取算法.该算法首先对目标文档进行预处理,然后计算文档中每个词语的出现频率,用出现频率最高的几个词语作为文档的主题.其中,将词语间的相关度作为计算出现频率的参考因素.词语相关度的计算是基于中文知识库《知网》的方法.实验证明,本算法具有较高的准确性.

关 键 词:词语相关度  出现频率  知网  主题抽取

Algorithm of Document Subject Extraction Based on Word Relevancy
YUAN Xiaofeng. Algorithm of Document Subject Extraction Based on Word Relevancy[J]. Journal of Chengdu University (Natural Science), 2012, 31(4): 367-369
Authors:YUAN Xiaofeng
Affiliation:YUAN Xiaofeng(College of Information Science and Technology,Yancheng Teachers University,Yancheng 224002,China)
Abstract:A kind of subject extraction algorithm was designed based on the consideration that words with high frequent occurrence could represent the theme of the document.Firstly,this algorithm pre-processed the sample document and calculated the occurrence frequency of each word of the document.Some most frequent words were used to represent the subject.The relevancy between words was referred to calculate the frequency of each word and the calculation of relevancy was based on the ontology Hownet.At last,the high accuracy of the algorithm was testified by the experiment.
Keywords:word relevancy  occurrence frequency  Hownet  subject extraction
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号