基于大规模语料的新词语识别方法 New word identification based on large-scale corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于大规模语料的新词语识别方法

引用本文：	施水才,俞鸿魁,吕学强,李渝勤.基于大规模语料的新词语识别方法[J].山东大学学报(理学版),2006,41(3):42-45.

作者姓名：	施水才俞鸿魁吕学强李渝勤

作者单位：	北京信息科技大学,中文信息处理研究中心,北京,100101

基金项目：	国家高技术研究发展计划(863计划);北京市教委科技发展计划项目;北京市教委共建项目

摘要：	根据新词语的不同特征，提出了一整套自动检测新词语的方法，通过大规模地统计分析，分别建立字，词，N元组的词典，从中自动检测出新词语来，然后再根据构词规则对自动检测的结果进行进一步的过滤，最终抽取出语料中的新词语. 根据此方案实现的系统，可以抽取不限长度不限领域的新词语.
关键词：	新词语流行语语料库
文章编号：	1671-9352（2006）03-0089-03
收稿时间：	2006-03-29
修稿时间：	2006年3月29日
New word identification based on large-scale corpus

SHI Shui-cai,YU Hong-kui,L Xue-qiang,LI Yu-qin.New word identification based on large-scale corpus[J].Journal of Shandong University,2006,41(3):42-45.

Authors:	SHI Shui-cai YU Hong-kui L Xue-qiang LI Yu-qin

Institution:	Chinese Information Processing and Research Center, Beijing Information Science & Technology Univ.,

Abstract:	String frequent static, sub string reduction and several faltering method are used to analyze one set Chinese new wont mining system and identify new word by using character, word and N-gram dictionary based on statistic large-scale corpus. With the system based on those methods, new word without length and domain limit can be identified.

Keywords:	new word catchword corpus
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
	点击此处可从《山东大学学报(理学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏