首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多特征融合的同名专家消歧方法研究
引用本文:曾健荣,张仰森,王思远,黄改娟,崔佳,马欢.基于多特征融合的同名专家消歧方法研究[J].北京大学学报(自然科学版),2020,56(4):607-613.
作者姓名:曾健荣  张仰森  王思远  黄改娟  崔佳  马欢
作者单位:1. 北京信息科技大学智能信息处理研究所, 北京 100101 2. 国家经济安全预警工程北京实验室, 北京 100044 3. 国家计算机网络与信息安全管理中心, 北京 100029
基金项目:国家自然科学基金(61772081)和促进高校内涵发展–研究生科技创新项目(5121911044)资助
摘    要:针对专家库构建过程中出现的同名歧义现象, 提出一种基于多特征融合的同名专家消歧方法。从中国知网(CNKI)数据源中获取专家的论文信息, 抽取论文的标题、摘要、关键词、作者单位和合作者等关键信息, 并将其作为属性特征, 构建特征表示模型, 进而定义同名专家之间的相似度计算函数。根据计算得到的相似度, 将同名消歧问题转化为聚类问题。利用近邻传播聚类算法进行聚类, 解决同名消歧问题。在采集的专家论文数据上的实验表明, 基于多特征融合的同名专家消歧方法的准确率可达92%, 取得良好的消歧效果。

关 键 词:多特征融合  同名消歧  专家库  聚类算法  数据采集  
收稿时间:2019-07-17

Research on Expert Disambiguation of Same Name Based on Multi-feature Fusion
ZENG Jianrong,ZHANG Yangsen,WANG Siyuan,HUANG Gaijuan,CUI Jia,MA Huan.Research on Expert Disambiguation of Same Name Based on Multi-feature Fusion[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2020,56(4):607-613.
Authors:ZENG Jianrong  ZHANG Yangsen  WANG Siyuan  HUANG Gaijuan  CUI Jia  MA Huan
Institution:1. Intelligent Information Processing Laboratory of Beijing Information and Technology University, Beijing 100101 2. Beijing Laboratory of National Economic Security Early-warning Engineering, Beijing 100044 3. National Computer Network and Information Security Management Center, Beijing 100029
Abstract:According to the expert ambiguity with the same name in the process of building expert database, an expert disambiguation method based on multi-feature fusion is proposed. The paper information of experts is obtained from data sources such as CNKI. Key information (title, abstract, keyword, affiliation and collaborator) is extracted. The feature representation model is constructed with these information as attribute features. The similarity calculation function between experts of the same name is defined. According to the similarity, the problem of disambiguation of the same name is transformed into clustering problem. Affinity propagation clustering algorithm is used to solve the problem of homonymy disambiguation. Experiments on the collected expert papers show that the accuracy of the same-name expert disambiguation method based on multi-feature fusion can reach 92%, and good disambiguation results are achieved.
Keywords:multi-feature fusion  homonymy disambiguation  expert database  clustering algorithm  data collection  
本文献已被 CNKI 等数据库收录!
点击此处可从《北京大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号