首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于N元的候选词库的建立
引用本文:李群.基于N元的候选词库的建立[J].渤海大学学报(自然科学版),2005,26(2):134-136.
作者姓名:李群
作者单位:鞍山师范学院,计算中心,辽宁,鞍山,114005
摘    要:随着internet的发展,网上各种新词语的创建和发展也超过以往任何时候,新词语的自动识别一直是中信息处理中的一个热点研究课题,研究了网上新词语的自动识别技术,主要研究内容是利用初加工语料,采用分解策略将N元组候选词库的形成分为预处理、二元候选字段,三元候选字段、四元候选字段几个过程,降低了整体处理难度,提出了一种以规则剔除噪声词串和构词相结合的新词语的识别技术。

关 键 词:网络词语  N元组  新词语自动识别
文章编号:1673-0569(2005)02-0134-03
修稿时间:2005年4月18日

Building of N -gram-based candidate dictionary
LI Qun.Building of N -gram-based candidate dictionary[J].Journal of Bohai University:Natural Science Edition,2005,26(2):134-136.
Authors:LI Qun
Abstract:With the development of Internet,large quantities of new words are sprouting from the internet on very rapid speed,thus automatic recognition of new words is rewuired.The technology of new words automatic recognition has become a hotspot of Chinese information processing.In this articla,the technology of new words automatic recognition are studied.Based on the corpus preprocessed,decompose strategies are applied to formation of N -gram candidate dictionary which fall into four phases that condidate characters,and reduce the difficulty of the whole procedure.A new words recognition technology,based on combination of rule-eliminating method and word-building method,is proposed and word-building rule-base for new words is established.
Keywords:internet word  N-gram  new words automatic recognition
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号