首页 | 本学科首页   官方微博 | 高级检索  
     

Hedge Trimmer句子压缩技术的算法实现及改进
引用本文:景秀丽. Hedge Trimmer句子压缩技术的算法实现及改进[J]. 沈阳师范大学学报(自然科学版), 2012, 30(4): 519-524
作者姓名:景秀丽
作者单位:东北财经大学研究生院,辽宁大连116025;沈阳师范大学科信软件学院,沈阳110034
基金项目:国家自然科学基金资助项目,辽宁省教育厅高等学校科学研究项目,辽宁省高等教育学会"十二五"高等教育科研课题
摘    要:压缩技术旨在模拟人类的文本概括和信息提取能力。句子压缩技术是自动生成能够保留原句核心内容的,合乎语法的,语义连贯的简短句子。文章分析了英文句子压缩技术中基于句法分析的Hedge Trimmer压缩技术,讨论了相关压缩理论,探索其压缩过程并用类C语言进行算法实现。提出了好的压缩句应该至少满足以下3个标准:第一是保留原句的核心内容,第二是具有正确的语法,第三是压缩长度合理。在算法的评估工作中,从DUC 2003语料库中选取了624个原始句子和对应的人工压缩句,与Hedge Trimmer压缩算法自动生成的压缩句进行对照分析。发现5种压缩效果不理想的情况,分析其原因并提出了改进策略。最后,通过实例对改进算法生成的压缩句和原来算法生成的压缩句进行对比评估,证明了改良算法能够获得更理想的压缩句。在英文句子压缩领域,改良的Hedge Trimmer句子压缩算法值得推广和应用。

关 键 词:句子压缩  Hedge  Trimmer算法  评估  改进

Algorithm realization and improvement of Hedge Trimmer sentence compression technology
JING Xiu-li. Algorithm realization and improvement of Hedge Trimmer sentence compression technology[J]. Journal of Shenyang Normal University(Natural Science Edition), 2012, 30(4): 519-524
Authors:JING Xiu-li
Affiliation:JING Xiu-li1,2(1.Graduate School,Dongbei University of Finance and Economics,Dalian 116025,China; 2.Software College,Shenyang Normal University,Shenyang 110034,China)
Abstract:Compression technology aims to simulate document summarization and information retrieval abilities of human. Sentence compression technology generates automatically short sentences which Capture the salient information of original sentences in a grammatically and semantically coherent way. The paper analyzes the Hedge Trimmer compression technology which is a kind of syntax-based technology of English sentence compression, discusses the compression theory and explores the compression process with the algorithm implementation in C-like language. The paper proposes that good compression should as least meets the following three standards: Firstly, it retains the main idea of the original sentence; secondly, it is grammatical; and thirdly, it is reasonable in length. In the evaluation work, we choose 624 original sentences and manual compression ones in the DUC 2003 corpus. Then we evaluate the automatic compression sentences produced by the Hedge Trimmer algorithm through comparison with original and manual ones. We find five situations, in which automatic compression sentences are not ideal. We analyze the causes and propose the improving strategies. At last, comparing the new automatic compression sentences with the old ones, we refine the algorithm to produce better compression sentences. The improved Hedge Trimmer sentence compression algorithm is ideal and could be popularized and applied in English sentence compression area.
Keywords:sentence compression  Hedge Trimmer algorithm  evaluation  improvement
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号