一种基于WordNet和Corpus Statistics的语义相似性计算方法 A Semantic Similarity Computing Approach Based on WordNet and Corpus Statistics期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于WordNet和Corpus Statistics的语义相似性计算方法

引用本文：	张东娜,周春光,刘彦斌,郭东伟. 一种基于WordNet和Corpus Statistics的语义相似性计算方法[J]. 吉林大学学报(理学版), 2010, 48(5): 811-816

作者姓名：	张东娜周春光刘彦斌郭东伟

作者单位：	吉林大学计算机科学与技术学院, 长春 130012

基金项目：	国家自然科学基金，国家高技术研究发展计划863项目基金，吉林大学研究生创新基金

摘要：	提出一种新的基于WordNet和文本集语义参数IC的计算方法,通过综合考虑概念在WordNet中语义信息以及数据集中的概率信息,即概念的自信息,同时利用新的参数考虑概念对在WordNet中的共享信息,设计了一种通用的概念语义相似性计算方法,该方法简化了传统语义相似性算法,并解决了语义相似性计算领域的相关问题,可以应用在信息抽取、信息检索、文档分类及本体学习中.领域通用的数据集RB数据实验结果表明,该方法在计算语义相似度问题上有效。
关键词：	语义相似性；布朗词集； IC模式
收稿时间：	2009-12-15
A Semantic Similarity Computing Approach Based on WordNet and Corpus Statistics

ZHANG Dong-na,ZHOU Chun-guang,LIU Yan-bin,GUO Dong-wei. A Semantic Similarity Computing Approach Based on WordNet and Corpus Statistics[J]. Journal of Jilin University: Sci Ed, 2010, 48(5): 811-816

Authors:	ZHANG Dong-na ZHOU Chun-guang LIU Yan-bin GUO Dong-wei

Affiliation:	College of Computer Science and Technology, Jilin University, Chan gchun 130012, China

Abstract:	We first proposed a new method calculating semantic similarity parameter information content. The new algorithm is based on the conceptsemantic information in the knowledge base called WordNet and the probability in the corpus called self information. Then, considering the existing algorithmsare all domain related and the calculating processes are complicated, we proposed a universal method based on corpus statistics and WordNet calculating semantic similarity which can be used in information extraction, information retrieval, document clustering and ontology learning. The proposed method makes a substantial improvement experimenting on the benchmark data set R&B concept pairs.

Keywords:	semantic similarity of concepts Brown corpus information content method
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
	点击此处可从《吉林大学学报(理学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏