基于随机森林的文本分类模型研究 Automatic text classification model based on random forest期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于随机森林的文本分类模型研究

引用本文：	张华伟,王明文,甘丽新.基于随机森林的文本分类模型研究[J].山东大学学报(理学版),2006,41(3):139-143.

作者姓名：	张华伟王明文甘丽新

作者单位：	北京大学计算语言学研究所,北京100871

基金项目：	国家自然科学基金;中国科学院资助项目

摘要：	随着WWW的迅猛发展，文本分类成为处理和组织大量文档数据的关键技术.随机森林模型是决策树的集成，并且由一随机向量决定决策树的构造. 当森林中决策树的数目增大，随机森林的泛化误差将趋向一个上界.将随机森林模型应用于文本分类,在Reuter21578数据集上的实验表明，分类效果比较好，性能比较稳定，将其同C4.5, KNN, SM0, SVM 4种典型的文本分类器进行了比较，结果显示它的分类性能胜于C4.5，同KNN, SMO和SVM方法相当.
关键词：	文本分类随机森林决策树泛化误差
文章编号：	1671-9352（2006）03-0139-05
收稿时间：	2006-03-09
修稿时间：	2006年3月9日
Automatic text classification model based on random forest

SU Qi,XIANG Kun,SUN Bin.Automatic text classification model based on random forest[J].Journal of Shandong University,2006,41(3):139-143.

Authors:	SU Qi XIANG Kun SUN Bin

Institution:	Institute of Computational Linguistics, Peking Univ., Beijing 100871, China

Abstract:	Based on the analysis of the focused-crawling algorithm Shark-Search, an improved Shark-Search algorithm with link clustering is proposed. The new algorithm by several comparable experiments is validated. The results show that it could identify the relevance between link and focused topic more effectively.

Keywords:	Shark-Search algorithm focused crawling link clustering
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
	点击此处可从《山东大学学报(理学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏