首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于预聚类的潜在语义分析模型文献检索研究
引用本文:和晓萍,李迪,王米利,马学松,周卫红.基于预聚类的潜在语义分析模型文献检索研究[J].云南民族大学学报(自然科学版),2015(3):257-260.
作者姓名:和晓萍  李迪  王米利  马学松  周卫红
作者单位:云南民族大学数学与计算机科学学院
基金项目:国家民委科研项目(12YNZ008);云南省教育厅科学研究基金(2012Y315);云南民族大学青年基金(11QN08)
摘    要:提出一种基于预聚类的潜在语义文献检索算法.首先,对待检索文档集进行预聚类,在潜在语义分析方法的基础上采用k-means聚类算法,寻找出各聚类簇的中心点;其次,在检索时,通过计算查询向量与各聚类簇中心点的相似度来进行检索.此方法有效解决了现有潜在语义文献检索算法在检索时需耗费大量时间计算查询向量与各文本向量之间的相似度的不足.另外还针对文献检索的特点,重新给出特征权重计算方法.实验结果表明,该方法缩短了检索的时间,提高了检索的效率.

关 键 词:潜在语义分析  文献检索  奇异值分解  k-means

A new pre-clustering-based latent semantic analysis algorithm for document retrieval
HE Xiao-ping;LI Di;WANG Mi-li;MA Xue-song;ZHOU Wei-hong.A new pre-clustering-based latent semantic analysis algorithm for document retrieval[J].Journal of Yunnan Nationalities University:Natural Sciences Edition,2015(3):257-260.
Authors:HE Xiao-ping;LI Di;WANG Mi-li;MA Xue-song;ZHOU Wei-hong
Institution:HE Xiao-ping;LI Di;WANG Mi-li;MA Xue-song;ZHOU Wei-hong;School of Mathematics and Computer Science,Yunnan Minzu University;
Abstract:This paper proposes a pre-clustering-based latent semantic analysis algorithm for document retrieval.It first clusters the documents using k-means clustering based on the latent semantic analysis,finds out the central point of each cluster,and then calculates the similarity between the query vector and each cluster's central points for retrieval. The algorithm can solve the problem of time-consuming computation of the similarity between the query vector and each text vector in the traditional latent semantic algorithm for document retrieval. In view of the characteristics of document retrieval,it proposes a new method for calculating the feature weights. The results of the experiment show that the new algorithm can reduce the search time,and improve the retrieval efficiency.
Keywords:latent semantic analysis  document retrieval  singular value decomposition  k-means
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号