首页 | 本学科首页   官方微博 | 高级检索  
     检索      

文本分类系统关键技术
引用本文:谢科,张辉,陈鹏,庞斌.文本分类系统关键技术[J].广西师范大学学报(自然科学版),2007,25(2):123-126.
作者姓名:谢科  张辉  陈鹏  庞斌
作者单位:北京航空航天大学,软件开发环境国家重点实验室,北京,100083
基金项目:国家科技基础条件平台门户应用系统建设基金资助项目(2005DKA63901)
摘    要:从自然语言的角度考虑词性选择,同时从统计学角度考虑删除文档频率过低的特征词,从而避免产生维数灾难,通过考查类别本身特征和类别之间的关系来提取类别特征向量,采用传统夹角余弦公式考查文本与类别的相似度,实现一种过程简单,易于理解且分类效果不错的文本分类系统。

关 键 词:文本分类  夹角余弦  向量空间模型  特征
文章编号:1001-6600(2007)02-0123-04
收稿时间:2006-12-15
修稿时间:2006-12-15

Key Technologies of Document Classification System
XIE Ke,ZHANG Hui,CHEN Peng,PANG Bin.Key Technologies of Document Classification System[J].Journal of Guangxi Normal University(Natural Science Edition),2007,25(2):123-126.
Authors:XIE Ke  ZHANG Hui  CHEN Peng  PANG Bin
Institution:National Key Laboratory of Software Development Environment,Beijing University of Aeronautics and Astronautics, Beijing 100083 ,China
Abstract:From the view of natural language,POS selection is carried out;and from the view of statistics,the low frequency features are deleted.The relationship between the categories and their own features are examined to extract category features vector.Angle Cosine formula was used to examine the similarity between document and the category.A document classification system which is relatively simple,easily understood and works well has been presented.
Keywords:document classification  Angle Cosine  vector space model  feature
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号