基于句子相关度的文本自动分类 Text classification based on sentence correlation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于句子相关度的文本自动分类

引用本文：	张友华,熊范纶.基于句子相关度的文本自动分类[J].中国科学技术大学学报,2006,36(5):540-545.

作者姓名：	张友华熊范纶

作者单位：	1. 中国科学技术大学自动化系,安徽,合肥,230027;中国科学院合肥智能机械研究所,安徽,合肥,230031 2. 中国科学院合肥智能机械研究所,安徽,合肥,230031

摘要：	提出一种基于句子相关度的文本自动分类模型（TCSC）.该模型利用训练样本增量式地自动更新类别语料库,根据句子的位置权值和语料权值计算句子类别相关度,获得用于文本分类的句子相关度矩阵,通过该矩阵实现文档分类.该模型避免了分类阶段待分类文本特别是中文文本的分词,模糊了词的多义问题,且在文本分类的实验中能够达到86%以上的查全率和查准率;随着语料库的不断训练和调整,分类性能还可以进一步提高,具有简单实现的特点.
关键词：	文本分类语料库相关度矩阵句权
文章编号：	0253-2778(2006)05-0540-06
收稿时间：	09 13 2004 12:00AM
修稿时间：	01 17 2005 12:00AM
Text classification based on sentence correlation

ZHANG You-hua,XIONG Fan-lun.Text classification based on sentence correlation[J].Journal of University of Science and Technology of China,2006,36(5):540-545.

Authors:	ZHANG You-hua XIONG Fan-lun

Institution:	1. Department of Automation, University of Science and Technology of China, Hefei 230027, China; 2. Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China

Abstract:	A text category model based on sentence correlation(TCSC) was presented,which incrementally updates category corpus with the training documents automatically.Then,category correlation was obtained by means of sentence position weight and corpus item weight to achieve correlation matrix for text classification.This model avoids the problem of word segmentation in Chinese documents and lowers the effect of words with multiple meanings in the phase of classification.Experimental results show that the recall and precision of this model reached of over 86%,and can be improved by updating corpus.This model can also be implemented easily in programming.

Keywords:	text-classification corpus sentence correlation matrix sentence weight
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏