支持多语言的自然语言倒序分词最大成词算法 Maximum Term Segmentation Algorithm in Reverse Order for Multiform Natural Language期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

支持多语言的自然语言倒序分词最大成词算法

引用本文：	王智慧,姜建国,张秋亮.支持多语言的自然语言倒序分词最大成词算法[J].科学技术与工程,2007,7(17):4311-4315.

作者姓名：	王智慧姜建国张秋亮

作者单位：	1. 西安电子科技大学计算机学院,西安,710071 2. 华北电力大学计算机学院,保定,071003

摘要：	提出一种支持多语言的分词算法,该算法可以按照以下层次来理解：首先,对不同编码的源词库文件编码转换,生成Unicode编码的源词库文件;然后,用Unicode编码的词库文件生成Unicode词库索引;最后,将待分词的自然语句转换成Unicode编码的语句并按照索引倒序分词。该算法已经用C＋＋语言实现,基于此算法实现的分析系统能够自动探测词库的更新并确定是否需要更新索引,能够支持多种编码方式,其编码转换和分词代码与平台无关,分词效率在9MB/s以上,正确率在90%以上。
关键词：	多语言索引树倒序分词最大成词算法
文章编号：	1671-1819(2007)17-4311-05
修稿时间：	2007-04-26
Maximum Term Segmentation Algorithm in Reverse Order for Multiform Natural Language

WANG Zhi-hui,JIANG Jian-guo,ZHANG Qiu-liangl.Maximum Term Segmentation Algorithm in Reverse Order for Multiform Natural Language[J].Science Technology and Engineering,2007,7(17):4311-4315.

Authors:	WANG Zhi-hui JIANG Jian-guo ZHANG Qiu-liangl

Abstract:	The word segmentation algorithm of support multi-language is proposed. The algorithm can be understood according to the following levels: The first is the code conversion,the different source thesaurus documents are turned to Unicode thesaurus documents; Then,Unicode thesaurus file index is generated based on Unicode thesaurus documents; Finally, the natural language will be converted into Unicode encoding, and the words begin to be segmented in a reverse order according to the Unicode thesaurus file index. The algorithm has been completed by using C++ language, and the system can detect the changes of the source thesaurus documents automatically to determine whether there is a need to update the Unicode thesaurus index. The system can support a variety of coding types. The process of code conversion and word segmentation is independent on the platform. The efficiency of the segmentation is more than 9 MB/s , the accuracy rate is more than 90%.

Keywords:	multi-language file index segmentation in reverse maximum term
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《科学技术与工程》浏览原始摘要信息
	点击此处可从《科学技术与工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏