首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Fisher线性判别式的层次文档分类
引用本文:徐敏,张丽萍,朱梧槚.基于Fisher线性判别式的层次文档分类[J].南京理工大学学报(自然科学版),2005,29(4):460-463.
作者姓名:徐敏  张丽萍  朱梧槚
作者单位:南京航空航天大学,信息科学与技术学院,江苏,南京,210016;南京航空航天大学,理学院,江苏,南京,210016
基金项目:“973”国家重点基础研究发展规划项目(G1999032701)
摘    要:将文档按照主题进行层次分类,利用Fisher线性判别式的思想来提取每一类的正特征词和负特征词,给出基于Fisher线性判别式的层次文档分类算法(HDCF)。HDCF不仅克服一般层次分类算法中假定特征词之间必须满足独立性的条件,而且能处理一个文档涉及多个类的分类问题。在实验中,采用召全率和准确率2个指标与其它算法进行比较,结果表明:HDCF的效果好于其它算法。

关 键 词:特征选择  正特征词  负特征词  Fisher线性判别式  层次文档分类
文章编号:1005-9830(2005)04-0460-04
收稿时间:2004-06-01
修稿时间:2004年6月1日

Hierarchical Document Categorization Based on Fisher Linear Discriminant
XU Min,ZHANG Li-ping,ZHU Wu-jia.Hierarchical Document Categorization Based on Fisher Linear Discriminant[J].Journal of Nanjing University of Science and Technology(Nature Science),2005,29(4):460-463.
Authors:XU Min  ZHANG Li-ping  ZHU Wu-jia
Abstract:To categorize documents hierarchically according to their topics,the thought of Fisher linear discriminant is utilized to get positive feature words and negative feature words in each category, and the algorithm of a hierarchical document categorization is given based on Fisher linear discriminant (HDCF). The algorithm overcomes the assumption that the feature words appear independently in documents and deals with the problem of a document involving more than one category. With comparision with other algorithms by using the measure of recall and precision in experiments, the results show HDCF is more effective than others.
Keywords:feature selection  positive feature words  negative feature words  Fisher linear dicriminant  hierarchical document categorization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号