基于词向量空间模型的中文文本分类方法 Method of Chinese text categorization based on the word vector space model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于词向量空间模型的中文文本分类方法

引用本文：	胡学钢,董学春,谢飞. 基于词向量空间模型的中文文本分类方法[J]. 合肥工业大学学报(自然科学版), 2007, 30(10)

作者姓名：	胡学钢董学春谢飞

作者单位：	合肥工业大学,计算机与信息学院,安徽,合肥,230009;合肥工业大学,计算机与信息学院,安徽,合肥,230009;安徽省池州市96161部队12分队,安徽,池州,247100

基金项目：	安徽省自然科学基金资助项目(050420207)

摘要：	大多文本分类方法是基于向量空间模型的,基于这一模型的文本向量维数较高,导致分类器效率难以提高。针对这一不足,该文提出基于词向量空间模型的文本分类方法。其主要思想是把文本的特征词表示成空间向量,通过训练得到词-类别支持度矩阵,根据待分文本的词和词-类别支持度矩阵计算文本与类别的相似度。实验证明,这一分类方法取得了较高的分类精度和分类效率。
关键词：	文本分类向量空间模型 K-最近邻居词向量空间模型
Method of Chinese text categorization based on the word vector space model

HU Xue-gang,DONG Xue-chun,XIE Fei. Method of Chinese text categorization based on the word vector space model[J]. Journal of Hefei University of Technology(Natural Science), 2007, 30(10)

Authors:	HU Xue-gang DONG Xue-chun XIE Fei

Abstract:	Most of the methods of text categorization are based on the vector space model,but the high dimension of document vectors based on the model leads to difficulty in improving efficiency of the classifier.In view of the defect,a method of Chinese text categorization based on the word vector space model is presented in this paper.The characteristic words of a text are defined as space vectors,and the word-class supporting matrix can be gotten by training,and then the characteristic words and the word-class supporting matrix are used for computing text similarity.Experiment shows that the presented method has higher precision and efficiency.

Keywords:	text categorization vector space model K-nearest neighbor word vector space model
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏