Lazy learner text categorization algorithm based on embedded feature selection |
| |
Authors: | Yan Peng Zheng Xuefeng Zhu Jianyong Xiao Yunhong |
| |
Affiliation: | 1. Information Engineering School, Univ. Science and Technology Beijing, Beijing 100083, P. R. China;China State Information Center, Beijing 100045, P. R. China 2. Information Engineering School, Univ. Science and Technology Beijing, Beijing 100083, P. R. China 3. China State Information Center, Beijing 100045, P. R. China |
| |
Abstract: | To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms. |
| |
Keywords: | machine learning text categorization embedded feature selection lazy learner cosine similarity |
本文献已被 维普 万方数据 等数据库收录! |
| 点击此处可从《系统工程与电子技术(英文版)》浏览原始摘要信息 |
|
点击此处可从《系统工程与电子技术(英文版)》下载免费的PDF全文 |
|