首页 | 本学科首页   官方微博 | 高级检索  
     

基于POS-CBOW语言模型的相似词分析
引用本文:阮冬茹,潘洪岩,高 凯. 基于POS-CBOW语言模型的相似词分析[J]. 河北科技大学学报, 2015, 36(5): 532-538
作者姓名:阮冬茹  潘洪岩  高 凯
作者单位:;1.河北科技大学信息科学与工程学院
基金项目:河北省社会科学发展研究课题资助项目(2015030344)
摘    要:相似词分析是自然语言处理领域的研究热点之一,在文本分类、机器翻译和信息推荐等领域中具有重要的研究价值和应用意义。针对新浪微博短文本的特点,给出一种带词性的连续词袋模型(POS-CBOW)。该模型在连续词袋模型的基础上加入过滤层和词性标注层,对空间词向量进行优化和词性标注,通过空间词向量的余弦相似度和词性相似度来判别词向量的相似性,并利用统计分析模型筛选出最优相似词集合。实验表明,基于POS-CBOW语言模型的相似词分析算法优于传统CBOW语言模型。

关 键 词:自然语言处理  语言模型  词向量  相似词  POS-CBOW
收稿时间:2015-04-14
修稿时间:2015-06-26

Similar words analysis based on POS-CBOW language model
RUAN Dongru,PAN Hongyan and GAO Kai. Similar words analysis based on POS-CBOW language model[J]. Journal of Hebei University of Science and Technology, 2015, 36(5): 532-538
Authors:RUAN Dongru  PAN Hongyan  GAO Kai
Abstract:Similar words analysis is one of the important aspects in the field of natural language processing, and it has important research and application values in text classification, machine translation and information recommendation. Focusing on the features of Sina Weibo''s short text, this paper presents a language model named as POS-CBOW, which is a kind of continuous bag-of-words language model with the filtering layer and part-of-speech tagging layer. The proposed approach can adjust the word vectors'' similarity according to the cosine similarity and the word vectors'' part-of-speech metrics. It can also filter those similar words set on the base of the statistical analysis model. The experimental result shows that the similar words analysis algorithm based on the proposed POS-CBOW language model is better than that based on the traditional CBOW language model.
Keywords:natural language processing   language model   word vector   similar words   POS-CBOW
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号