首页 | 本学科首页   官方微博 | 高级检索  
     

基于距离的相似最近邻搜索算法研究
引用本文:姜大光,孙贺娟,易军凯. 基于距离的相似最近邻搜索算法研究[J]. 北京化工大学学报(自然科学版), 2017, 44(5): 94-98. DOI: 10.13543/j.bhxbzr.2017.05.015
作者姓名:姜大光  孙贺娟  易军凯
作者单位:北京化工大学信息科学与技术学院,北京,100029;北京化工大学信息科学与技术学院,北京,100029;北京化工大学信息科学与技术学院,北京,100029
摘    要:为了提高相似最近邻搜索(ANN)算法的精度,提出了一种在度量空间下基于距离的相似最近邻搜索算法-优化的VP森林(OVF)算法。在传统VP树(VT)算法的基础上,首先采用改进的选择优势点的方法,通过从数据集采样优势点候选集,对其进行评估,选取其中区分度大的点作为优势点;然后提出构建多棵VP树的新方法,改进距离优势点远的子树中最近邻不紧凑问题;接着提出使用优先队列与剪枝搜索方法结合的新搜索方法查找最近邻,减少了很多不必要的距离计算。最后通过实验结果表明,本文方法在数据维度、数据集大小、返回不同邻居个数、不同的距离函数及建树个数方面精度有了很大的提高。

关 键 词:相似最近邻搜索(ANN)算法  VP树  优化的VP森林(OVF)算法  剪枝方法
收稿时间:2016-11-21

Approximate nearest neighbor searching algorithm based on distance
JIANG DaGuang,SUN HeJuan,YI JunKai. Approximate nearest neighbor searching algorithm based on distance[J]. Journal of Beijing University of Chemical Technology, 2017, 44(5): 94-98. DOI: 10.13543/j.bhxbzr.2017.05.015
Authors:JIANG DaGuang  SUN HeJuan  YI JunKai
Affiliation:College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
Abstract:In order to improve the accuracy of approximate nearest neighbor (ANN) algorithms,this paper proposes an approximate nearest neighbor algorithm based on distance in metric space—the optimized vantage point (VP) forest algorithm (OVF).Starting from the traditional VP tree algorithm,we first adopted an improved method for selecting vantage points,evaluated the vantage point candidate set sampled from the data set,and selected the point which had the biggest degree of differentiation as the vantage point.We then put forward a new method of building multiple VP trees,overcoming the problem of the nearest neighbors not being compact in the sub-tree,which further distances the vantage point.Finally,we proposed a new search method to find the nearest neighbors by combining a priority queue and a pruning method,which significantly reduces unnecessary distance calculations.The experimental results show that the algorithm proposed in this paper gives a large enhancement in precision in terms of the data dimension,the size of the data set,the return of different numbers of neighbors,the different distance functions,and building different number of trees.
Keywords:approximate nearest neighbor searching (ANN)  vantage point (VP) tree  the optimized VP forest (OVF)  pruning method
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号