首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义图优化算法的中文微博观点摘要研究
引用本文:张聪,裴家欢,黄锴宇,黄德根,殷章志. 基于语义图优化算法的中文微博观点摘要研究[J]. 山东大学学报(理学版), 2017, 52(7): 59-65. DOI: 10.6040/j.issn.1671-9352.1.2016.PC2
作者姓名:张聪  裴家欢  黄锴宇  黄德根  殷章志
作者单位:大连理工大学计算机科学与技术学院, 辽宁 大连 116024
基金项目:国家自然科学基金资助项目(61672127)
摘    要:为从海量微博中高效地获取不同话题下的关键信息,微博观点摘要成为自然语言处理领域近期研究的热点之一。基线方法基于TF-IDF算法抽取微博句中的关键词,并据此计算微博的重要性分数,直接筛选出观点摘要;朴素改进方法在基线方法的基础上,增加了情感分类步骤,并利用微博句之间的语义距离,将摘要句候选集中语义重复、重要度较小的句子去除,生成观点摘要;基于语义图优化算法的方法在朴素改进方法的基础上,利用微博句的重要性分数及微博句之间的语义距离构建语义图结构,并通过图优化算法筛选出观点摘要。朴素改进方法在COAE2016评测任务一测试数据集上,10个话题的平均ROUGE-1值达到26.39%,平均ROUGE-2值达到0.68%,平均ROUGE-SU4值达到5.69%,且评测官方公布结果显示,该方法在9项评价指标中获得6项最佳性能。基于语义图优化算法的方法在评测样例数据集上进行了实验,结果显示,该方法比朴素改进方法在ROUGE-1,ROUGE-2,ROUGE-SU4值上分别提升了0.63%, 1.51%, 2.69%。

关 键 词:微博摘要  TF-IDF  语义图优化  句子相似度  
收稿时间:2016-11-25

Semantic graph optimization algorithm based chinesemicroblog opinion summarization
ZHANG Cong,PEI Jia-huan,HUANG Kai-yu,HUANG De-gen,YIN Zhang-zhi. Semantic graph optimization algorithm based chinesemicroblog opinion summarization[J]. Journal of Shandong University, 2017, 52(7): 59-65. DOI: 10.6040/j.issn.1671-9352.1.2016.PC2
Authors:ZHANG Cong  PEI Jia-huan  HUANG Kai-yu  HUANG De-gen  YIN Zhang-zhi
Affiliation:School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
Abstract:To obtain key information in different topics efficiently, microblog opinion summarization has been a hot spot in natural language processing recently. The baseline method of this paper extracts keywordsusing TF-IDF algorithm, and calculate the importance scores of microblogs to filter out opinion summarization directly; the naive improved methodadded a step of sentiment classification, andremove microblogs which are of low importance and high semantic repetitionusing semantic distance between microblogs to generate opinion summarization;the method based on semantic graph optimization algorithm constructs a complete graph using importance scores and semantic distance of microblogs, and filters out the opinion summarization using graph optimization algorithm. According to the official result of evaluation,on the test dataset of COAE2016, the average ROUGE-1 value, ROUGE-2 value and ROUGE-SU4 value of 10topics using the naive improved methodreached 26.39%, 0.68% and 5.69% respectively, and got 6 max values out of 9 kinds of evaluation index. Besides, the results of experiments done on COAE2016 sample datasetshows that by using the method based on semantic graph optimization algorithmthe ROUGE-1 value, ROUGE-2 value and ROUGE-SU4 value increased by 0.63%, 1.51%, 2.69% respectively.
Keywords:microblogssummarization  semantic graph optimization  TF-IDF  sentence similarity  
本文献已被 CNKI 等数据库收录!
点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
点击此处可从《山东大学学报(理学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号