首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于动态双重前缀的模糊相似性连接算法
引用本文:于长永,王雯函,温秀静,赵宇海.基于动态双重前缀的模糊相似性连接算法[J].东北大学学报(自然科学版),2022,43(3):321-327.
作者姓名:于长永  王雯函  温秀静  赵宇海
作者单位:(东北大学 计算机科学与工程学院, 辽宁 沈阳110169)
摘    要:针对相似性连接问题, 提出了动态双重前缀的模糊相似性连接算法.与之前的算法不同的是,本文采用双重前缀,即在查找候选以及构建索引时使用不同的前缀来提高过滤效率,并在此基础上进行了优化.首先通过取各个前缀生成的候选集合的交集来缩小候选集合;其次提出最大区分任选前缀,利用此前缀进行预验证来减少最终进入到验证过程的候选对,以此来减少连接时间.并且在三个真实数据集上进行实验,将本文算法与Silkmoth算法以及MF-Join算法进行比较,结果表明所提算法可以生成更小的候选集集合并且需要更少的连接时间.

关 键 词:相似性连接  任选前缀  候选  前缀过滤  验证过程  
修稿时间:2021-09-07

Fuzzy Similarity Join Algorithm Based on Dynamic Double Prefixes
YU Chang-yong,WANG Wen-han,WEN Xiu-jing,ZHAO Yu-hai.Fuzzy Similarity Join Algorithm Based on Dynamic Double Prefixes[J].Journal of Northeastern University(Natural Science),2022,43(3):321-327.
Authors:YU Chang-yong  WANG Wen-han  WEN Xiu-jing  ZHAO Yu-hai
Institution:School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China.
Abstract:Focusing on the similarity join problem, a fuzzy similarity join algorithm was proposed based on dynamic double. The difference from the previous algorithms is that double prefixes are introduced, which improves the filtering efficiency when searching for candidates and building indexes due to the differences of prefixes. On this basis, optimization is realized. First, the candidate set is narrowed by taking the intersection of the candidate sets generated by each prefix. Afterwards, the maximum distinguishing arbitrary-selected prefix is proposed, and this prefix is used for pre-verification to reduce the final candidate pairs that enter the verification process, thereby reducing the join time. Experiments are conducted on three real datasets, and the proposed algorithm is compared with the Silkmoth and MF-Join. The results show that the proposed algorithm can generate a smaller set of candidate set and requires less join time.
Keywords:similarity join  arbitrary-selected prefix  candidate  prefix filter  verification process  
本文献已被 万方数据 等数据库收录!
点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《东北大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号