首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于边界特征的情感新词提取方法
引用本文:朱波,侯敏.基于边界特征的情感新词提取方法[J].重庆邮电大学学报(自然科学版),2014,26(6):786-802.
作者姓名:朱波  侯敏
作者单位:中国传媒大学 国家语言资源监测与研究有声媒体中心,北京100024;中国传媒大学 国家语言资源监测与研究有声媒体中心,北京100024
摘    要:情感词典作为情感分析任务中的一项基础资源,是观点发现及情感极性判断的重要依据。随着网络新词的大量出现,情感新词的抽取成为一个亟待解决的问题。针对这一问题提出基于边界特征的情感新词的提取方法。该方法利用skip-gram模型挖掘情感词的边界特征、构建边界特征集,利用边界特征集提取情感新词候选集,通过bigram搭配、序列模式等方法对情感新词候选集进行过滤,根据候选串的频次、与其搭配的边界特征在语料中的分布情况对候选串进行评分。在微博语料上的实验结果显示,该方法对情感新词识别的准确率与候选串得分正相关,当候选串得分为11时准确率为83.33%。实验证明,基于边界特征的情感新词的提取方法能够有效地识别大规模语料中的情感新词。

关 键 词:情感新词  边界特征  skip-gram  序列模式
收稿时间:2014/7/11 0:00:00
修稿时间:2014/10/9 0:00:00

Method for new sentiment word extraction based on boundary feature
ZHU Bo and HOU Min.Method for new sentiment word extraction based on boundary feature[J].Journal of Chongqing University of Posts and Telecommunications,2014,26(6):786-802.
Authors:ZHU Bo and HOU Min
Institution:Broadcast Media Language Branch, National Language Resources Monitoring and Research Center, Communication University of China, Beijing 100024 , P.R. China;Broadcast Media Language Branch, National Language Resources Monitoring and Research Center, Communication University of China, Beijing 100024 , P.R. China
Abstract:Sentiment dictionary is one of basic language resources. It is an important basis for opinion mining and sentimental orientation identification. With the new words teeming, new sentiment word extraction is a problem demanding to be solved. In order to solve this problem, this paper presents a method to extract new sentiment words based on boundary feature. It uses skip-gram model and existing sentiment words to extract boundary feature of sentiment words and construct the set of boundary feature. Then it extracts new sentiment words with boundary feature. After the filtering about bigram and array model, to score the candidate words. Experimental result on microblog data show that the precision is positively related to the candidate score. The precision is 83.33% when candidate score is 11. The experiment proved that this method is able to extract new sentiment words effectively in big scale data.
Keywords:
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号