首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种面向中文的快速字串多模式匹配算法
引用本文:沈洲,王永成,许一震.一种面向中文的快速字串多模式匹配算法[J].上海交通大学学报,2001,35(9):1285-1289.
作者姓名:沈洲  王永成  许一震
作者单位:上海交通大学电子信息学院,
基金项目:国家"863”高科技资助项目(863-306-ZD03-04-1)
摘    要:针对中文字串匹配问题,提出一种快速模式匹配算法,算法采用新型组合状态自动机,将2个状态组合起来匹配一个双字符,从而解决了双字节符构建完全Hash表时带来的存储空间膨胀问题;同时考虑到待匹配模式串中的字符在大字符集中稀疏分布的特点,尝试将单模式QS匹配算法的思想与DFSA算法进行结合,应用于多模式匹配中,实验结果显示,本算法明显优于DFSA算法,平均所花费时间仅为DFSA算法的45.2%。

关 键 词:字符串  有限状态自动机  多模式匹配  单模式QS匹配  DFSA算法  存储空间膨胀
文章编号:1006-2467(2001)09-1285-05
修稿时间:2000年11月28

A Fast Multiple Pattern Algorithm for Chinese String Matching
SHEN Zhou,WANG Yong cheng,XU Yi zhen.A Fast Multiple Pattern Algorithm for Chinese String Matching[J].Journal of Shanghai Jiaotong University,2001,35(9):1285-1289.
Authors:SHEN Zhou  WANG Yong cheng  XU Yi zhen
Abstract:For the problem of Chinese string matching, a fast multiple pattern matching algorithm was provided. With the new combinatorial state automata, the unbearable memory cost problem which results from constructing Hash table for every double byte characters, is resolved by binding two states to match one double byte character. The characters in the specified strings, which would be searched in the whole text, represent a kind of sparse distribution state in the large character set language. So, we try to combine the advantages of QS algorithm with the DFSA algorithm and apply to the multiple pattern matching. The experiment datas show that the new algorithm is much better than the DFSA algorithm. For the average case, the time spent by the new algorithm is only 45.2% of that spent by the DFSA.
Keywords:match  string  finite state automata  multiple pattern match
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号