首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于局部与全局信息的自动文摘算法
引用本文:王萌,王晓荣,李春贵,唐培和.基于局部与全局信息的自动文摘算法[J].广西科学院学报,2007,23(4):226-228.
作者姓名:王萌  王晓荣  李春贵  唐培和
作者单位:广西工学院计算机工程系,广西柳州,545006
基金项目:国家自然科学基金 , 广西教育厅科研项目 , 广西工学院博士、硕士基金
摘    要:采用平均特征词频率策略计算特征词权重,用快速n-grims算法对各特征词所处的概念体进行加权,用一种改进的K-means聚类算法进行段落聚类,提出一种基于局部与全局信息的自动文摘算法并给出算法评估.该算法不仅能够自适应获得k值,而且有效防止了初始点的随机选择对聚类结果的影响.评测结果表明该算法对经济类和科技类文章的准确率和召回率都明显高于新闻类和文学类文章,利用机器文摘进行分类的准确率明显高于使用原文本进行分类.该算法所得到的文摘,在各项指标上都优于传统方法生成的文摘.

关 键 词:K-means  n-grims  段落聚类  自然语言理解
文章编号:1002-7378(2007)04-0226-03
收稿时间:2007-09-10

Research of Automatic Summarization Based on Local and Global Information of Sentences
WANG Meng,WANG Xiao-rong,LI Chun-gui and TANG Pei-he.Research of Automatic Summarization Based on Local and Global Information of Sentences[J].Journal of Guangxi Academy of Sciences,2007,23(4):226-228.
Authors:WANG Meng  WANG Xiao-rong  LI Chun-gui and TANG Pei-he
Institution:Department of Computer Engineering, Guangxi University of Technology, Liuzhou, Guangxi, 545006, China,Department of Computer Engineering, Guangxi University of Technology, Liuzhou, Guangxi, 545006, China,Department of Computer Engineering, Guangxi University of Technology, Liuzhou, Guangxi, 545006, China and Department of Computer Engineering, Guangxi University of Technology, Liuzhou, Guangxi, 545006, China
Abstract:The idea of our approach is to exploit both the local and global properties of sentences.In order to obtain local property,we use a term weighting scheme that employs average term frequency in a document as the normalization factor.And a fast algorithm for matching N-grams is uesd to optimize term weighting.The method can obtain an improved K-means method to cluster paragraphs,and discovers thematic areas according to clustering results.Furthermore,it integrates local and global property to produce summarization.And experiments do prove that it is feasible to use the method to develop a domain automatic abstracting system,which is valuable for further study.
Keywords:K-means  n-grims  paragraph clustering  natural language understanding
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《广西科学院学报》浏览原始摘要信息
点击此处可从《广西科学院学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号