首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于双数组Trie树中文分词研究
引用本文:赵欢,朱红权.基于双数组Trie树中文分词研究[J].湖南大学学报(自然科学版),2009,36(5).
作者姓名:赵欢  朱红权
作者单位:湖南大学,计算机与通信学院,湖南,长沙,410082
基金项目:教育部科学技术研究重点项目 
摘    要:对双数组Trie树(Double—ArrayTrie)分词算法进行了优化:在采用Trie树构造双数组Trie树的过程中,优先处理分支节点多的结点,以减少冲突;构造一个空状态序列;将冲突的结点放入Hash表中,不需要重新分配结点.然后,利用这些方法构造了一个中文分词系统,并与其他几种分词方法进行对比,结果表明,优化后的双数组Trie树插入速度和空间利用率得到了很大提高,且分词查询效率也得到了提高.

关 键 词:自然语言处理  双数组  Trie树  词典  分词

Research of Chinese Word Segmentation Based on Double-Array Trie
ZHAO Huan,ZHU Hong-quan.Research of Chinese Word Segmentation Based on Double-Array Trie[J].Journal of Hunan University(Naturnal Science),2009,36(5).
Authors:ZHAO Huan  ZHU Hong-quan
Abstract:This paper proposed some improved strategies for the algorithm of Double-Array Trie.Firstly, the priority was given to the node with most child nodes in order to avoid the collision;secondly, an empty-list was defined;Finally, the collision node was added to a hash table, which avoided re-allocation.Then, we implemented a program for a Chinese word segmentation system based on the improved Double-Array Trie and compared it with several other methods.From the results, it turns out that the insertion time and the space efficiency are achieved, and that search efficiency is improved.
Keywords:natural language processing systems  double-array  trie  lexicon  word segmentation
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《湖南大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《湖南大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号