首页 | 本学科首页   官方微博 | 高级检索  
     

一种Hash高速分词算法
引用本文:李向阳,张亚非. 一种Hash高速分词算法[J]. 解放军理工大学学报(自然科学版), 2004, 5(2): 40-44
作者姓名:李向阳  张亚非
作者单位:解放军理工大学,通信工程学院,江苏,南京,210007;解放军理工大学,训练部,江苏,南京,210007
摘    要:对于基于词的搜索引擎等中文处理系统,分词速度要求较高。设计了一种高效的中文电子词表的数据结构,它支持首字和词的Hash查找。提出了一种Hash高速分词算法,理论分析表明,其平均匹配次数低于1.08,优于目前的同类算法。

关 键 词:自动分词  数据结构  Hash
文章编号:1009-3443(2004)02-0040-05
修稿时间:2003-05-27

Fast Hash Algorithm for Chinese Word Segmentation
LI Xiang-yang and ZHANG Ya-fei. Fast Hash Algorithm for Chinese Word Segmentation[J]. Journal of PLA University of Science and Technology(Natural Science Edition), 2004, 5(2): 40-44
Authors:LI Xiang-yang and ZHANG Ya-fei
Affiliation:LI Xiang-yang~1,ZHANG Ya-fei~2
Abstract:The speed of Chinese word segmentation is very important for many Chinese NLP systems, such as web search engines based on words. The paper designs an efficient data structure for Chinese thesaurus, which supports hashing operations by first Chinese character of a string or the whole string. A fast Hash algorithm for Chinese word segmentation is suggested. Analysis shows that its average matching times is lower than 1.08 in theory, which is superior to that of the other algorithms for Chinese word segmentation.
Keywords:automatic segmentation  data structure  Hash
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《解放军理工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《解放军理工大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号