首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于Rough Set的汉语检索算法
引用本文:廖剑平,元昌安,邓松,饶元.一种基于Rough Set的汉语检索算法[J].广西师范学院学报(自然科学版),2005,22(4):33-39.
作者姓名:廖剑平  元昌安  邓松  饶元
作者单位:广西师范学院,资源与环境科学学院,广西,南宁,530001;广西师范学院,信息技术系,广西,南宁,530001
基金项目:973计划项目(2002CB111504),广西自然科学基金(0339039)
摘    要:传统的语料检索的不足主要为:(1)无法模糊匹配检索;(2)存在跨行词问题,无法保证查全率;(3)难以对检索结果缩检和扩检.为了克服这些不足,该文提出了基于RoughSet批处理汉语语料的词句.根据RoughSet和汉语语料的特征,给出了模糊检索算法(AMTRT).通过与单汉字索引检索算法比较验证了AMTRT的有效性.AMTRT在实现各种模糊匹配,节省空间开销且不降低精确匹配查准率基础上,将词句的查全率提高近50%.

关 键 词:语料检索  粗糙集  AMTRT算法
文章编号:1002-8743(2005)04-0033-07
修稿时间:2005年10月9日

Algorithm of Chinese Retrieval Based on Rough Set
LIAO Jian-ping,YUAN Chang-an,DENG Song,RAO Yuan.Algorithm of Chinese Retrieval Based on Rough Set[J].Journal of Guangxi Teachers Education University:Natural Science Edition,2005,22(4):33-39.
Authors:LIAO Jian-ping  YUAN Chang-an  DENG Song  RAO Yuan
Institution:LIAO Jian-ping~a,YUAN Chang-an~b,DENG Song~a,RAO Yuan~a
Abstract:The main deficiency of the traditional methods in Chinese corpuses retrieval are as follows: 1) Can't solve fuzzy-match retrieval;2) Can't guarantee to find the phrase that consists of words in two lines;3) It is difficult to compress or extend retrieval based on the retrieval result.To solve the problem,this paper makes following contributions: 1) Propose words of batch disposal of Chinese corpuses system based on Rough Set;2) Propose the retrieval algorithm of Ambiguous Multi-retrieval in Text Base on Rough Set Technique(AMTRT);3) The results show that the AMTRT method increases the rate of thorough inquiry by 50% without reducing the rate of accurate inquiry as against the traditional methods.
Keywords:Chinese corpuses system  Rough Set  AMTRT Algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号