首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于隐特征提取和CRF模型的中文词义消歧
引用本文:黄莹,陈笑蓉.基于隐特征提取和CRF模型的中文词义消歧[J].贵州大学学报(自然科学版),2013(6):91-95.
作者姓名:黄莹  陈笑蓉
作者单位:贵州大学计算机科学与技术学院,贵州贵阳550025
基金项目:国家自然科学基金项目资助(61363066)
摘    要:传统的中文词义消歧方法是通过观察文本的上下文信息、词性等显性特征建立消歧模型,本文通过对歧义产生原因进行深入的分析,发现词语之间隐含的语法结构、语义信息等也会导致歧义的产生,可以考虑将这些信息加入消歧模型进行消歧。由于《知网》知识库中对词语之间的搭配信息进行了总结,本文借助《知网》提取训练语料库所获取的词语搭配信息的隐性语义特征,结合显性的上下文特征,采用条件随机场的方法进行词义消歧。最后,通过实验进行词义消歧和效果验证,结果表明:本文采用的方法与传统的条件随机场消歧相比,词义消歧的准确率得到了提高。

关 键 词:条件随机场  词义消歧  机器学习

Chinese Word Sense Disambiguation Based on Hidden Feature Extraction and CRF Model
HUANG Ying*,CHEN Xiao-rong.Chinese Word Sense Disambiguation Based on Hidden Feature Extraction and CRF Model[J].Journal of Guizhou University(Natural Science),2013(6):91-95.
Authors:HUANG Ying*  CHEN Xiao-rong
Institution:( College of Computer Science and Technology, Guizhou University, Guiyang 550025, China)
Abstract:The disambiguation model is built in the traditional methods of Chinese word sense disambiguation by observing dominant features, such as the context information and part of speech. We found the grammatical structure and semantic information hidden in those words also lead to ambiguities by analysis in-depth the reason of producing ambiguity. We can consider this information into the disambiguation model. Because the collocation information between words is summarized in the How Net, we extracted hidden semantic features of the colloca- tion information from the training corpus by the How Net. Then, we combined it with the characteristics of the dominant context to do the word sense disambiguation by using conditional random field (CRF) method. Finally, we did the experiments of the word sense disambiguation and verification of its effects. We found that the method used in this paper improved accuracy of word sense disambiguation by compared with the traditional CRF.
Keywords:CRF  word sense disambiguation  machine learning
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号