首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于主题词权重和句子特征的自动文摘
引用本文:蒋昌金,彭宏,陈建超,马千里.基于主题词权重和句子特征的自动文摘[J].华南理工大学学报(自然科学版),2010,38(7).
作者姓名:蒋昌金  彭宏  陈建超  马千里
作者单位:1. 华南理工大学,计算机科学与工程学院,广东,广州,510006
2. 广东商学院,数学与计算科学学院,广东,广州,510320
基金项目:广东省自然科学基金资助项目,广东省科技攻关项目 
摘    要:为获得高质量的自动文摘,在组合词识别算法的基础上,充分考虑词的频率、词性、词的位置、词长等因素,构建了一个词语权重计算公式,该公式能使表达主题的词和短语具有较高的权重.对句子权重的计算,则考虑了句子的内容、位置以及线索词的作用和用户偏好等.摘要的生成充分考虑了候选文摘句的相似性,避免了冗余信息的加入.对摘要的评估进行了从句子粒度到词语粒度的改进,提出了一种基于词语粒度的准确率和召回率计算方法.实验证明,该算法生成的自动文摘有着较高的质量,平均准确率达到77.1%.

关 键 词:主题词  自动文摘  组合词  权重计算  
收稿时间:2009-12-4
修稿时间:2010-2-28

Automatic Text Summarization Based on Thematic Word Weight and Sentence Features
Jiang Chang-jin,Peng Hong,Chen Jian-chao,Ma Qian-li.Automatic Text Summarization Based on Thematic Word Weight and Sentence Features[J].Journal of South China University of Technology(Natural Science Edition),2010,38(7).
Authors:Jiang Chang-jin  Peng Hong  Chen Jian-chao  Ma Qian-li
Abstract:A formula which was used to computing the weight of word/phrase in a text has been created after the consideration of its frequency,part of speech, position and length. The formula was based on Combined Word recognition algorithm. By using the formula, a word/phrase could obtain a greater weight if its meaning are close to the theme of the text.The weight of a sentence depends on its content,position,cue words in it,user’s preference. So a formula which computing the weight of a sentence also been built afer considering these factors.To avoid redundance,only one of the two similar sentences in the candidate set was selected for generating the final summarization. Further more, the evaluation approach which based on precision and recall was improved. The computing of precision and recall of a summarization now is based on word level instead of sentence level. The experimental results show that the proposed algorithm has a better performance compared with the traditional automatic summarization algorithms based on TF-ISF method.
Keywords:Thematic Words  Automatic Text Summarization  Combined Word  Weight Computing
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《华南理工大学学报(自然科学版)》浏览原始摘要信息
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号