首页 | 本学科首页   官方微博 | 高级检索  
     

倒排索引压缩及在RDBMS全文检索中的实现
引用本文:朱虹,吴林. 倒排索引压缩及在RDBMS全文检索中的实现[J]. 华中科技大学学报(自然科学版), 2005, 33(4): 7-9
作者姓名:朱虹  吴林
作者单位:华中科技大学,计算机科学与技术学院,湖北,武汉,430074
基金项目:湖北省科技攻关项目(2002AA103A06).
摘    要:提出了一种对倒排索引进行压缩的方法,在保证较高压缩率的前提下,对压缩后的数据提供了随机访问的能力.这种方法将压缩后的数据分为两部分,第一部分用来表示单词在子区间的出现次数,第二部分用来表示单词在子区间的具体出现位置,详细描述了检索过程,通过第一部分的信息可以直接对第二部分的任意位置进行解压缩,体现了其随机访问能力,并分析了压缩比和检索效率,讨论了该压缩方法在RDBMS全文检索中的实现,以及如何用表格形式对其进行存储,针对多关键字的检索对算法进行了优化.该实现方法一方面充分利用了数据系统的优点,获得了良好的动态性能,另一方面节省了倒排索引对空间的需求,并提高了检索效率.

关 键 词:全文检索  倒排索引  索引压缩  编码
文章编号:1671-4512(2005)04-0007-03
修稿时间:2004-07-13

Compression of inverted and implementation in full-text information retrieval system RDBMS
Zhu Hong,Wu Lin. Compression of inverted and implementation in full-text information retrieval system RDBMS[J]. JOURNAL OF HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY.NATURE SCIENCE, 2005, 33(4): 7-9
Authors:Zhu Hong  Wu Lin
Affiliation:Zhu Hong Wu Lin Zhu Hong Assoc. Prof., College of Computer Sci. & Tech.,Huazhong Univ. of Sci. & Tech.,Wuhan 430074,China.
Abstract:A method to compress inverted indices with random access capability and high compressibility was proposed. The compressed data were divided into two parts: one part was the counter of the occurrence of the words in sub-areas, the other was the detailed position of the words in these sub-areas. The query process, which can embody the random access capability, was described. The second part could be directly decompressed at certain position according to the data of the first one, and the compressibility and query efficiency were analyzed. The implementation of this compression in full-text information retrieval system of RDBMS(Relational Datbase Management System) was introduced with the storage form of table. The optimization of query algorithm for multi-words was provided. In this implementation, on the one hand the excellent dynamic capability was gained with taking full advantage of RDBMS, on the other hand the demand of storage space was reduced, and query efficiency was enhanced.
Keywords:full-text information retrieval  inverted indices  index compression  integer coding  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号