首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于主题特征的Email分类研究
引用本文:于琨,蔡庆生.基于主题特征的Email分类研究[J].中国科学技术大学学报,2006,36(5):535-539.
作者姓名:于琨  蔡庆生
作者单位:中国科学技术大学计算机科学技术系,安徽,合肥,230027
基金项目:国家自然科学基金;中国科学院资助项目
摘    要:针对基于词语特征的Email分类综合性能(F-score)较低的问题,提出一种基于主题特征的Email分类方法.该方法利用领域知识及统计信息,从Email的词语特征空间中提取主题特征,并利用提取出的主题特征实现Email分类.通过对1080封Email进行分类测试,结果表明,由于主题特征能够更加准确地表达Email的主题思想,因此,与基于词语特征的分类方法相比,该方法在针对Email的全文及标题实现分类时,将平均F-score分别提高了13.16%和17.16%,从而使平均F-score提高到72.37%,基本可以满足实际应用的需求.

关 键 词:Email分类  主题特征  词语特征
文章编号:0253-2778(2006)05-0535-05
收稿时间:01 25 2005 12:00AM
修稿时间:09 23 2005 12:00AM

Email classification based on topic feature
YU Kun,CAI Qing-sheng.Email classification based on topic feature[J].Journal of University of Science and Technology of China,2006,36(5):535-539.
Authors:YU Kun  CAI Qing-sheng
Institution:Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027,China
Abstract:To solve the low F-score problem of word-feature-based Email classification approach, an Email classification approach based on topic features was presented. The approach extracted domain topic feature and statistical topic feature by domain knowledge and statistical information from feature space respectively, and then performed Email classification with extracted topic features. Experimental results based on 1 080 Emails show that compared with the classification approach based on word features, this approach improved the average F-score by 13.16% in Email classification based on body and subject and 17.16% in Email classification based on subject, respectively, thus achieving as high as 72.37% average F-score in Email classification, which can meet the requirement of applications.
Keywords:Email classification  topic feature  word feature
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号