首页 | 本学科首页   官方微博 | 高级检索  
     检索      

汉语中新词识别方法研究
引用本文:王倩倩,范通让.汉语中新词识别方法研究[J].河北省科学院学报,2014(2):35-40.
作者姓名:王倩倩  范通让
作者单位:石家庄铁道大学信息科学与技术学院;
摘    要:随着互联网和社会的飞速发展,新词不断涌现。识别和整理这些新词语,是中文信息处理中的一个重要研究课题。提出一种新词识别方法,该方法利用基于PAT-Array的重复字符串抽取候选串,提高了新词的召回率。并在此基础上分析新词内部模式,添加了垃圾串过滤机制。单字串过滤主要是运用垃圾词典的方法,多字词模式新词的确定是利用改进的互信息与独立成词概率结合的方法。由此,大幅度提高了新词识别的准确率。

关 键 词:新词  PAT-Array  互信息  垃圾串过滤  内部模式

Research of Chinese new word identification method
WANG Qian-qian,FAN Tong-rang.Research of Chinese new word identification method[J].Journal of The Hebei Academy of Sciences,2014(2):35-40.
Authors:WANG Qian-qian  FAN Tong-rang
Institution:(School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang , Hebei 050043, China)
Abstract:With the rapid development of internet and society,new words are emerging.Identifying and organizing these new words,is an important research topic of Chinese information processing.This paper presents a new word recognition method via using PAT-Array repeated extractions of candidate strings to improve the recall of new words.Based on this method,analyses the internal model of new words and adds a garbage string filtering mechanism.Use the garbage dictionary to filter the single string.The improved mutual information is combined with a separate word combination methods to determine more new words.Our achievements can significantly improve the accuracy of new word recognitions.
Keywords:New words  PAT-Array  Mutual information  Garbage string filter  Internal model
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号