首页 | 本学科首页   官方微博 | 高级检索  
     检索      

中文微博命名体识别
引用本文:韩春燕,刘玉娇,琚生根,李若晨,苏翀.中文微博命名体识别[J].四川大学学报(自然科学版),2015,52(3):511-516.
作者姓名:韩春燕  刘玉娇  琚生根  李若晨  苏翀
作者单位:四川民族学院计算机科学系;四川大学计算机学院;四川大学计算机学院;四川大学计算机学院;四川大学计算机学院
基金项目:国家自然科技基金项目(61332066, 81373239)
摘    要:近年来微博的快速发展为命名体识别提供了新的载体,同时微博的特点也为命名体识别研究带来了挑战.针对微博特点,本文提出了基于拼音相似距离以及文本相似距离聚类算法对微博文本进行规范化,消除了微博的语言表达不规范造成的干扰.同时,本文还提出了篇章级、句子级以及词汇级三级粒度的特征提取,使用条件随机场模型进行训练数据,并识别命名体,采用由微博文本相似聚类获得的实体关系类对命名体类型进行修正.由于缺少大量的微博训练数据,本文采用半监督学习框架训练模型.通过对新浪微博数据的实验结果表明,本方法能够有效地提高微博中命名体识别的效果.

关 键 词:微博  条件随机场  命名实体  三级粒度特征  短文本
收稿时间:1/4/2015 12:00:00 AM

Named entity recognition in Chinese micro blog
HAN Chun-Yan;LIU Yu-Jiao;JU Sheng-Gen;LI Ruo-Chen;SU Chong.Named entity recognition in Chinese micro blog[J].Journal of Sichuan University (Natural Science Edition),2015,52(3):511-516.
Authors:HAN Chun-Yan;LIU Yu-Jiao;JU Sheng-Gen;LI Ruo-Chen;SU Chong
Institution:College of Computer Science,Sichuan University for Nationalities,;College of Computer, Sichuan University;College of Computer, Sichuan University;College of Computer, Sichuan University;College of Computer, Sichuan University
Abstract:In recent years, the rapid development of micro blog provides named entity recognition(NER) with a new carrier. While the characteristics of the micro blogs also brings challenges for NER research. Considering the characteristics of micro blogs, this paper proposed a mehtod, which was based on an pinyin similar distance and text similar distance, to normalize the micro blogging text, eliminating the interference caused by non standardized expression. Meanwhile, the paper also proposed three level features extraction and applied the conditional random field model to train and identify the named entities. Besides, a simple method was employed to fix the named entity recognition results, which was obtained from clustering the similar micro blogs text. Lacking of training data, this paper built a semi supervised learning framework to train the model. The results of experiment on Sina micro blogs data showed that this approach could improve the named entity recognition effectively.
Keywords:Micro blog  Conditional random fields  Named entity  Three level features  Short text
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号