首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于同义词词林的中文文本主题词提取
引用本文:程涛,施水才,王霞,吕学强.基于同义词词林的中文文本主题词提取[J].广西师范大学学报(自然科学版),2007,25(2):145-148.
作者姓名:程涛  施水才  王霞  吕学强
作者单位:1. 北京信息科技大学,中文信息处理研究中心,北京,100101
2. 抚顺市第十五中学,辽宁,抚顺,113006
基金项目:国家自然科学基金资助项目(60272084),北京市教育委员会科技发展计划重点项目(KZ200310772013),北京市教委项目(KM200510772008,KM200610772008)
摘    要:中文文本主题词的提取可以浓缩一篇文章,可以提炼一个中文网页,还可以帮助实现网上广告与网页的精确匹配。提出了一种基于同义词词林的中文文本主题词提取方法,不仅考虑了传统的影响主题词语权重的因素,还考虑到了同义词、相关词以及下位词的出现对于词语权重的影响。实验表明,用该方法对中文文本

关 键 词:主题词提取  同义词词林  权值  同义词
文章编号:1001-6600(2007)02-0145-04
收稿时间:2006-12-18
修稿时间:2006-12-18

Thematic Words Extracting from Chinese Text Based on Tongyici Cilin
CHENG Tao,SHI Shui-cai,WANG Xia,L Xue-qiang.Thematic Words Extracting from Chinese Text Based on Tongyici Cilin[J].Journal of Guangxi Normal University(Natural Science Edition),2007,25(2):145-148.
Authors:CHENG Tao  SHI Shui-cai  WANG Xia  L Xue-qiang
Institution:1. Chinese Information Processing Research Center,Beijing Information Science and Technology University, Beijing 100101 ,China;2. Fushun No. 15 Middle School ,Fushun 113006 ,China
Abstract:Thematic words extraction from a Chinese text not only can concentrate an article,but also can extract main ideas from a Chinese Web and help to achieve precise matching between online advertisement and a webpage.The paper presents a method of thematic words extraction based on Tongyici Cilin.The method not only has taken traditional factors affecting the weight of a thematic word into account,but also has considered the factors such as the appearance of relevant words,synonymy and lower words.Experiments have confirmed that the accuracy rate of thematic word extraction from a Chinese text can reach 83.25% using this method.
Keywords:thematic words extraction  tongyici cilin  weight  synonymy
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号