首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Non-Independent Term Selection for Chinese Text Categorization
Authors:LI Jingyang  SUN Maosong
Institution:LI Jingyang,SUN Maosong Department of Computer Science , Technology,Tsinghua University,Beijing 100084,China
Abstract:Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams),which results in very slow training and working of modern high-performance classifiers.This study assumes that this high-dimensionality problem is related to the redundancy in the term set,which cannot be solved by traditional term selection methods.A greedy algorithm framework named "non-independent term selection" is presented,which reduces the redundancy according to string-level correlations.Several preliminary implementations of this idea are demonstrated.Experiment results show that a good tradeoff can be reached between the performance and the size of the term set.
Keywords:Chinese text categorization  term selection  dimentionality
本文献已被 CNKI 万方数据 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号