中文文本分类相关算法的研究与实现 Research and Implementation of Related Algorithm of Chinese Text Categorization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

中文文本分类相关算法的研究与实现

引用本文：	徐沛娟,李雄飞,惠玥,张桂林. 中文文本分类相关算法的研究与实现[J]. 吉林大学学报(理学版), 2009, 47(4): 790-794

作者姓名：	徐沛娟李雄飞惠玥张桂林

作者单位：	吉林大学,计算机科学与技术学院,长春,130012;吉林大学,计算机科学与技术学院,长春,130012;吉林大学,计算机科学与技术学院,长春,130012;吉林大学,计算机科学与技术学院,长春,130012

基金项目：	国家自然科学基金，"十一五"国家科技支撑计划重大项目基金

摘要：	通过对分词歧义处理情况的分析, 提出一种基于上下文的双向扫描分词算法, 对分词词典进行改进, 将词组短语的固定搭配引入词典中. 讨论了特征项的选择及权重的设定, 并引进χ²统计量参与项的权值计算, 解决了目前通用TF-IDF加权法的不足, 同时提出了项打分分类算法, 提高了特征项对于文本分类的有效性. 实验结果表明, 改进后的权重计算方法性能更优越.
关键词：	文本分类上下文双向扫描向量空间模型权重特征选择
收稿时间：	2009-01-14
Research and Implementation of Related Algorithm of Chinese Text Categorization

XU Pei-juan,LI Xiong-fei,HUI Yue,ZHANG Gui-lin. Research and Implementation of Related Algorithm of Chinese Text Categorization[J]. Journal of Jilin University: Sci Ed, 2009, 47(4): 790-794

Authors:	XU Pei-juan LI Xiong-fei HUI Yue ZHANG Gui-lin

Affiliation:	College of Computer Science and Technology, Jilin University, Changchun 130012, China

Abstract:	On the basis of the analysis of the process of dealing with the Chinese word segmentation ambiguity,this paper covers bidirectional scan word segmentation algorithm based on the context.In order to improve the word segmentation dictionary,the authors put the fixed phrase into the dictionary and discussed the feature selection and the weighting schema enactment in detail.In order to solve the problem of general TF-IDF weighting schema at present,we took statistics into consideration,and meanwhile put up the ...

Keywords:	text categorization context bidirectional scan vector space model weighting schema feature selection
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
	点击此处可从《吉林大学学报(理学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏