首页 | 本学科首页   官方微博 | 高级检索  
     


Lazy learner text categorization algorithm based on embedded feature selection
Authors:Yan Peng  Zheng Xuefeng  Zhu Jianyong  Xiao Yunhong
Affiliation:1. Information Engineering School, Univ. Science and Technology Beijing, Beijing 100083, P. R. China;China State Information Center, Beijing 100045, P. R. China
2. Information Engineering School, Univ. Science and Technology Beijing, Beijing 100083, P. R. China
3. China State Information Center, Beijing 100045, P. R. China
Abstract:To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
Keywords:machine learning  text categorization  embedded feature selection  lazy learner  cosine similarity
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《系统工程与电子技术(英文版)》浏览原始摘要信息
点击此处可从《系统工程与电子技术(英文版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号