首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词典与语料结合的中文微博主观句抽取方法
引用本文:朱海欢,余青松.基于词典与语料结合的中文微博主观句抽取方法[J].华东师范大学学报(自然科学版),2014,2014(4):62-68,87.
作者姓名:朱海欢  余青松
作者单位:华东师范大学~~计算中心, 上海 200062
摘    要:提出一种基于词典与语料结合的中文微博主观句抽取方法,通过判断句子中是否包含情感表达文本来判断句子是否为主观句.首先,从现有的情感词典中挑选出情感倾向较为固定的情感词构建了一个高可信情感词典,用于抽取句子中的情感表达文本,保证情感表达文本抽取的准确率;然后提出N-POSW模型,并基于2-POS W模型通过语料学习的方法较为准确地抽取句子中的剩余情感表达文本,保证了情感表达文本抽取的召回率.实验结果表明,相比于传统的基于大规模情感词典的方法,本文方法主观句抽取的F值提高了7%.

关 键 词:情感词典  高可信情感词典  N-POSW模型  主观句
收稿时间:2013-07-01

Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus
ZHU Hai-huan,YU Qing-song.Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus[J].Journal of East China Normal University(Natural Science),2014,2014(4):62-68,87.
Authors:ZHU Hai-huan  YU Qing-song
Institution:Computer Center, East China Normal University, Shanghai 200062, China
Abstract:In this paper, we propose a new method for the extraction
of Chinese microblog subjective sentence, which is based on a
combination of lexicon and corpus. By determining whether the
sentence contains emotional expressions, it can be classified as a
subjective or objective sentence. Firstly, a highly credible
sentiment lexicon was built based on the words whose emotional
orientation is fixed from the existing sentiment dictionary. Based
on the highly credible sentiment lexicon, sentiment expressions can
be extracted with assurance of accuracy. Finally, a N-POSW model was
proposed for the corpus-based learning method. Through the 2-POSW
model, the remained sentiment expressions in the sentence can be
extracted, thus guaranteeing the overall recall rate. Experimental
results show that the F Value in this paper increases 7{\%} compared
with the traditional method, which is based on the large-scale
sentiment lexicon.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《华东师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《华东师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号