一种基于聚类树的增量式数据清洗算法 An incremental algorithms of data cleansing based on clustering tree期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种基于聚类树的增量式数据清洗算法

引用本文：	刘芳,何飞.一种基于聚类树的增量式数据清洗算法[J].华中科技大学学报(自然科学版),2005,33(3):46-48.

作者姓名：	刘芳何飞

作者单位：	华中科技大学,计算机科学与技术学院,湖北,武汉,430074

基金项目：	国家“十五”重大科技基金资助项目 (2 0 0 1BA10 2A0 6 11) .

摘要：	研究了在数据模式与匹配规则不变的前提下 ,数据集动态增加时近似重复记录的识别问题 ,提出了一种基于聚类树的增量式数据清洗算法IACT .该算法通过构建聚类树先对记录进行分区 ,然后在划分的区域内进行相似度的计算识别出近似重复记录 ,从而完成了增量式相似重复记录的检测 .实验结果证明了IACT算法在无损精度的情况下 ,在效率上优于多趟邻近排序 (MPN)算法 .
关键词：	数据清洗近似重复记录聚类树
文章编号：	1671-4512(2005)03-0046-03
修稿时间：	2004年7月16日
An incremental algorithms of data cleansing based on clustering tree

Liu Fang,He Fei.An incremental algorithms of data cleansing based on clustering tree[J].JOURNAL OF HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY.NATURE SCIENCE,2005,33(3):46-48.

Authors:	Liu Fang He Fei

Institution:	Liu Fang He Fei Liu Fang Dr., College of Computer Sci. & Tech.,Huazhong Univ. of Sci.& Tech.,Wuhan 430074,China.

Abstract:	This paper studied the problem of detecting approximately duplicate records while receiving increments of data with no changes in data schema and matching rule set, and presented an incremental algorithm IACT (Incremental Algorithms based on Clustering Trees for data cleansing). IACT divided the data records into a few areas and computed their similarity to identify the approximately duplicate records to accomplish the data cleansing task in the partitioned areas through creating clustering tree. Compared with the algorithm MPN, the experimental result proves that IACT algorithm is more effective while possessed of the same precision.

Keywords:	data cleansing approximately duplicate record clustering tree
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏