基于N元的候选词库的建立 Building of N -gram-based candidate dictionary期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于N元的候选词库的建立

引用本文：	李群.基于N元的候选词库的建立[J].渤海大学学报(自然科学版),2005,26(2):134-136.

作者姓名：	李群

作者单位：	鞍山师范学院,计算中心,辽宁,鞍山,114005

摘要：	随着internet的发展，网上各种新词语的创建和发展也超过以往任何时候，新词语的自动识别一直是中信息处理中的一个热点研究课题，研究了网上新词语的自动识别技术，主要研究内容是利用初加工语料，采用分解策略将N元组候选词库的形成分为预处理、二元候选字段，三元候选字段、四元候选字段几个过程，降低了整体处理难度，提出了一种以规则剔除噪声词串和构词相结合的新词语的识别技术。
关键词：	网络词语 N元组新词语自动识别
文章编号：	1673-0569(2005)02-0134-03
修稿时间：	2005年4月18日
Building of N -gram-based candidate dictionary

LI Qun.Building of N -gram-based candidate dictionary[J].Journal of Bohai University:Natural Science Edition,2005,26(2):134-136.

Authors:	LI Qun

Abstract:	With the development of Internet,large quantities of new words are sprouting from the internet on very rapid speed,thus automatic recognition of new words is rewuired.The technology of new words automatic recognition has become a hotspot of Chinese information processing.In this articla,the technology of new words automatic recognition are studied.Based on the corpus preprocessed,decompose strategies are applied to formation of N -gram candidate dictionary which fall into four phases that condidate characters,and reduce the difficulty of the whole procedure.A new words recognition technology,based on combination of rule-eliminating method and word-building method,is proposed and word-building rule-base for new words is established.

Keywords:	internet word N-gram new words automatic recognition
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏