首页 | 本学科首页   官方微博 | 高级检索  
     

基于句向量的文本相似度计算方法
引用本文:刘继明,于敏敏,袁野. 基于句向量的文本相似度计算方法[J]. 科学技术与工程, 2020, 20(17): 6950-6955
作者姓名:刘继明  于敏敏  袁野
作者单位:重庆邮电大学经济管理学院电子商务与现代物流重点实验室,重庆 400065;重庆邮电大学经济管理学院电子商务与现代物流重点实验室,重庆 400065;重庆邮电大学经济管理学院电子商务与现代物流重点实验室,重庆 400065
摘    要:为进一步提高文本相似度计算的准确性,提出基于句向量的文本相似函数(part of speech and order smooth inverse frequency, PO-SIF),从词性和词序方面优化了平滑反频率(smooth inverse frequency, SIF)计算方法,SIF算法的核心是通过加权和去除噪声得到句向量来计算句子相似度。在具体计算时,一方面通过增加词性消减因子调节SIF句向量计算权重参数,获得带有词性信息的句向量,另一方面通过将词序相似度与SIF句向量相似度算法进行线性加权优化句子相似度得分。实验结果表明,增加词性和词序的方法可以提升算法准确率。

关 键 词:平滑逆频率  句向量  词性  词序相似度
收稿时间:2019-09-07
修稿时间:2020-06-13

Research on Computing Method of Text Similarity Based on Sentence Vector
Liu Jiming,Yu Minmin,Yuan Ye. Research on Computing Method of Text Similarity Based on Sentence Vector[J]. Science Technology and Engineering, 2020, 20(17): 6950-6955
Authors:Liu Jiming  Yu Minmin  Yuan Ye
Abstract:To further improve the accuracy of text similarity calculation, this paper presented a text similarity function PO-SIF (Part of speech and Order Smooth Inverse Frequency) based on sentence vectors. The Smooth Inverse Frequency (SIF) calculation method is optimized from the aspects of part of speech and word order. The core of SIF algorithm is to get sentence vectors by weighting and removing noise to calculate sentence similarity. On the one hand, the weight parameters of SIF sentence vectors are adjusted by adding part of speech subtraction factor to obtain sentence vectors with part of speech information. On the other hand, the similarity scores of sentences are optimized by linear weighting based on word order similarity and SIF sentence vector similarity algorithm. The results showed that the method of adding part of speech and word order can improve the accuracy of the algorithm.
Keywords:smoothing  inverse frequency(SIF), sentence  vector, part  of speech, word  order similarity
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号