首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于机器学习的内核恶意程序检测研究与实现
引用本文:田东海,魏行,张博,郁裕磊,李家硕,马锐.基于机器学习的内核恶意程序检测研究与实现[J].北京理工大学学报,2020,40(12):1295-1301.
作者姓名:田东海  魏行  张博  郁裕磊  李家硕  马锐
作者单位:1. 北京理工大学 计算机学院, 软件安全工程技术北京市重点实验室, 北京 100081;
基金项目:国家重点研发计划资助项目(2016QY07X1404);国家自然科学基金资助项目(61602035);山西省军民融合软件工程技术研究中心开放基金资助项目
摘    要:随着计算机科学的发展,世界对计算机的依赖越来越强,计算机安全也越来越重要,恶意代码是计算机安全面临的最大敌人.针对传统的恶意代码检测和分析技术在现在已经无法满足需求的问题,提出使用机器学习并应用新的分类特征来识别恶意程序,并且对他们进行初级的家族分类,指出以往机器学习在恶意代码检测和分类上的不足,筛选出更好的区分特征.首先使用了n-gram算法来优化恶意代码反汇编代码中的操作码特征,然后使用词袋模型和TF-IDF算法优化API调用特征,最后编程实现模型并使用数据集进行了模型的训练和测试.实验中使用决策树算法的模型的分类准确率上达到了87.41%,使用随机森林算法的模型的分类准确率上达到了90.06%,实验结果表明提出的特征相比以往在恶意代码检测分类上应用的特征有着更好的效果. 

关 键 词:恶意代码分类    随机森林    决策树    操作码    API
收稿时间:2019/10/11 0:00:00

Research and Implementation of Kernel Malicious Code Detection Based on Machine Learning
TIAN Dong-hai,WEI Hang,ZHANG Bo,YU Yu-lei,LI Jia-suo,MA Rui.Research and Implementation of Kernel Malicious Code Detection Based on Machine Learning[J].Journal of Beijing Institute of Technology(Natural Science Edition),2020,40(12):1295-1301.
Authors:TIAN Dong-hai  WEI Hang  ZHANG Bo  YU Yu-lei  LI Jia-suo  MA Rui
Institution:1. Beijing Key Laboratory of Software Security Engineering Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;2. Shanxi Military and Civilian Integration Software Engineering Technology Research Center, Taiyuan, Shanxi 030051, China
Abstract:With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.
Keywords:malicious code classification  random forest  decision tree  opcode  API
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号