基于统计的中文关键短语自动抽取 Research on Statistics- Based Automatic Extraction of Chinese Keyphrase期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于统计的中文关键短语自动抽取

引用本文：	张永刚,梁颖红,颜振祥,姚建民.基于统计的中文关键短语自动抽取[J].江南大学学报(自然科学版),2010,9(1):26-29.

作者姓名：	张永刚梁颖红颜振祥姚建民

作者单位：	1. 苏州大学,计算机学院,江苏,苏州,215006 2. 江苏省现代企业信息化应用支撑软件工程技术研究开发中心,江苏,苏州,215104

基金项目：	国家自然科学基金项目(60970057);;江苏省现代企业信息化应用支撑软件开发中心开放基金项目(SX200907)

摘要：	用统计的方法从单文本中自动抽取关键短语。在实验中验证了频度、首位置作为特征的有效性。用各种方法过滤非法词串，综合短语位置和统计特征对候选短语进行权重计算，并依据关键短语分布规律选择关键短语。另外，通过分析关键短语分布特点为Ⅳ元短语在过滤、按比例选择方面提供了依据。获得了比较好的实验结果：TOP5精确率21．80％，召回率28．27％，F-measure25％；TOP10精确率17．10％，召回率44．50％，F-measure30．80％。
关键词：	关键短语抽取文本特征互信息 N元短语
Research on Statistics- Based Automatic Extraction of Chinese Keyphrase

ZHANG Yong-gang,LIANG Ying-hong,YAN Zhen-xiang,YAO Jian-min.Research on Statistics- Based Automatic Extraction of Chinese Keyphrase[J].Journal of Southern Yangtze University:Natural Science Edition,2010,9(1):26-29.

Authors:	ZHANG Yong-gang LIANG Ying-hong YAN Zhen-xiang YAO Jian-min

Institution:	1.Jiangsu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise;Suzhou 215104;China;2.School of Computer Science and Technology;Soochow University;Suzhou 215006;China

Abstract:	A statistics-based approach is proposed for automatically extracting keyphrases from Chinese scientific documents.Term frequency and first occurence are valid in the approach.Several filtering methods are utilized to filter invalid terms.Text feature and statistic information are combined to select keyphrases.The final keyphrases are ouputed on the basis of the actual distribution of keyphrases.Keypharases distribution information provides some experiment proof for N-gram filteration and output by proportio...

Keywords:	keywordphrase extraction text feature MI N-gram
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏