首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于约束的半监督聚类查询扩展方法
引用本文:杨静,刘宁,张键沛.一种基于约束的半监督聚类查询扩展方法[J].中国科技论文在线,2013(10):994-997.
作者姓名:杨静  刘宁  张键沛
作者单位:哈尔滨工程大学计算机科学与技术学院,哈尔滨150001
基金项目:国家自然科学基金资助项目(61073041,61073043);黑龙江省自然科学基金资助项目(F200901);高等学校博士学科点专项科研基金资助项目(20112304110011,20122304110012)
摘    要:针对伪相关反馈模型反馈文档信息质量差和扩展词选择不适产生的漂移现象等问题,提出了一种基于约束的半监督聚类查询扩展方法。该方法对初检结果的前k个文档进行人工标注,分成相关文档与不相关文档两类;并利用一种半监督聚类算法对初检结果的前”个文档进行分析,提取出与查询相关的文档作为反馈文档。该方法通过对少量标注文档与查询相关性的学习,能够较准确地估计出大量未知文档与查询的相关性,提高反馈文档的质量,从而有效提高检索的查全率和查准率。实验结果表明,该方法比传统的伪相关反馈和基于无监督聚类的伪相关反馈有更优的检索性能。

关 键 词:信息检索  查询扩展  约束聚类  半监督聚类  伪相关反馈

A query expansion method based on constrained semi-supervised clustering
Yang Jing,Liu Ning,Zhang Jianpei.A query expansion method based on constrained semi-supervised clustering[J].Sciencepaper Online,2013(10):994-997.
Authors:Yang Jing  Liu Ning  Zhang Jianpei
Institution:(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China)
Abstract:Given that the quality of feedback documents of pseudo relevance feedback model is poor and expansion terms are select- ed inappropriatdy, the new query often drifts from the original query. We propose a query expansion method based on constrain- ed semi-supervised clustering. It marks the top k documents of the initial retrieval set in advance and divides them into relevant documents and irrelevant documents; it analyzes the top n documents using a semi-supervised clustering algorithm to find relevant documents used as feedback documents. The algorithm could more accurately estimate the correlation between a large number of unknown documents and query by learning from a small amount of documents that are known to us, thus improving the quality of the feedback information. The experimental results show that the proposed method outperforms both pseudo-relevance feedback and query-likelihood language model.
Keywords:information retrieval  query expansion  constrained clustering  semi-supervised clustering  pseudo-relevance feedback
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号