首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多视角特征融合的中文垃圾微博过滤
引用本文:于然,刘春阳,靳小龙,王元卓,程学旗.基于多视角特征融合的中文垃圾微博过滤[J].山东大学学报(自然科学版),2013(11):53-58.
作者姓名:于然  刘春阳  靳小龙  王元卓  程学旗
作者单位:[1]中国科学院计算技术研究所网络数据科学与工程研究中心,北京100190 [2]中国科学院大学,北京100190 [3]国家计算机网络应急技术处理协调中心,北京100029
基金项目:国家重点基础研究发展计划(“九七三”计划)项目(2012CB316303,2012BAH39804);高技术研究发展计划(“八六三”计划)项目(2012AA011003);国家自然科学基金重点资助项目(60933005,61232010);国家自然科学基金面上项目(61173(Y34);国家242项目(2012F124)
摘    要:微博中隐含着舆论热点等与特定话题相关的有价值的信息。因此,针对微博数据分析(如话题发现等)的工作成了当前的研究热点。由于微博内容和形式的高度自由,使得相关的研究工作面临着垃圾数据噪声大、有用数据提取难的问题。然而,目前针对非公共话题的中文垃圾微博过滤尚无有效方法。提出一种基于多视角特征融合的垃圾微博过滤方法。该方法首先从微博的结构和内容两个视角建立规则,再与微博文本分词结果进行融合构造复合特征,并以此对垃圾微博进行过滤。通过在真实数据集上的实验表明多视角融合的特征使得过滤效果有明显提升。

关 键 词:垃圾微博过滤  特征选择  多视角特征融合

Chinese spam microblog filtering based on the fusion of multi- angle features
YU Ran,LIU Chun-yang,JIN Xiao-long,WANG Yuan-zhuo,CHENG Xue-qi.Chinese spam microblog filtering based on the fusion of multi- angle features[J].Journal of Shandong University(Natural Science Edition),2013(11):53-58.
Authors:YU Ran  LIU Chun-yang  JIN Xiao-long  WANG Yuan-zhuo  CHENG Xue-qi
Institution:1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2. University of Chinese Academy of Sciences, Beijing 100190, China; National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China)
Abstract:As microblog contains valuable information, data analysis on microblog such as topic detection has become a research hotspot. Due to the high flexibility of microblog's content and form, noisy data is a big challenge for microblog analysis. Therefore, no effective method has been developed for non-public topic Chinese spam microblog filtering until now. To fill this gap, a new method was proposed to fuse multi-angle features extracted from both the content and struc- ture of microblog. The fused features were then employed for filtering spam microblog with classifiers. Experiments on real data demonstrate that the fusion of multi-angle features can effectively improve the performance of spam filtering.
Keywords:spare microblog filtering  feature selection  multi-angle features fusion
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号