首页 | 本学科首页   官方微博 | 高级检索  
     

大数据环境下的不确定数据流在线分类算法
引用本文:吕艳霞,王翠荣,王聪,于长永. 大数据环境下的不确定数据流在线分类算法[J]. 东北大学学报(自然科学版), 2016, 37(9): 1245-1249. DOI: 10.12068/j.issn.1005-3026.2016.09.007
作者姓名:吕艳霞  王翠荣  王聪  于长永
作者单位:(东北大学 信息科学与工程学院, 辽宁 沈阳110819)
基金项目:国家自然科学基金资助项目(61300195); 河北省自然科学基金资助项目(F2014501078); 辽宁省教育厅科学研究资助项目(L2013099); 东北大学秦皇岛分校科研基金资助项目(XNK201402).
摘    要:在大数据环境下,由于隐私保护、数据丢失等原因,数据普遍存在不确定性;数据流系统中数据不断地到达系统,只扫描一遍且不能一次性全部获得;所以要构建一个增量分类模型来处理不确定数据流分类.本文基于VFDT算法提出了WBVFDTu算法,该算法在学习和分类阶段都可快速而有效地分析不确定信息.在学习期间,采用Hoeffding分解定理构造决策树模型;在分类期间,在决策树的叶子节点利用加权贝叶斯分类算法提高模型的分类准确率和算法的执行效率.最终证明该算法能够非常快速地学习不确定数据流,提高分类的准确率.

关 键 词:不确定数据流  加权贝叶斯  VFDT  分类算法  大数据  

Online Classification Algorithm for Uncertain Data Stream in Big Data
LYU Yan-xia,WANG Cui-rong,WANG Cong,YU Chang-yong. Online Classification Algorithm for Uncertain Data Stream in Big Data[J]. Journal of Northeastern University(Natural Science), 2016, 37(9): 1245-1249. DOI: 10.12068/j.issn.1005-3026.2016.09.007
Authors:LYU Yan-xia  WANG Cui-rong  WANG Cong  YU Chang-yong
Affiliation:School of Information Science & Engineering, Northeastern University, Shenyang 110819, China.
Abstract:Under the background of big data, there exist data uncertainties due to privacy protection, data loss and so on. In data stream system, data arrive at continuously and cannot be obtained all. In addition, all the inforation cannot be aquired with only one scan. Therefore, an incremental classification model is constructed to deal with uncertain data stream classification. The weighted Bayes based on VFDT (very fast decision tree) for uncertain data stream—WBVFDTu on the basis of VFDT algorithm is presented in the paper. The uncertain information can be analysed quickly and effectively in both the learning stage and classification stage. In the learning stage, a decision tree model for uncertain data stream is quickly constructed by using Hoeffding bound theory. In the classification stage, the weighted Bayes classifier in the tree leaves is used to improve the performance of the classification. Experimental results show that the proposed algorithm can very quickly learn uncertain data stream and improve the classification performance of the model.
Keywords:uncertain data stream  weighted Bayes  VFDT(very fast decision tree)  classification algorithm  big data  
本文献已被 CNKI 等数据库收录!
点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《东北大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号