基于最长频繁序列挖掘的恶意代码检测 Malware Detection Based on Longest Frequent API Sequence期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于最长频繁序列挖掘的恶意代码检测

引用本文：	黄琨茗,张磊,赵奎,刘亮.基于最长频繁序列挖掘的恶意代码检测[J].四川大学学报(自然科学版),2020,57(4):681-688.

作者姓名：	黄琨茗张磊赵奎刘亮

作者单位：	四川大学网络空间安全学院,成都610065;四川大学网络空间安全学院,成都610065;四川大学网络空间安全学院,成都610065;四川大学网络空间安全学院,成都610065

基金项目：	省自然科学基金,国家高技术研究发展计划

摘要：	基于动态API序列挖掘的恶意代码检测方法未考虑不同类别恶意代码之间的行为差别,导致代表恶意行为的恶意序列挖掘效果不佳,其恶意代码检测效率较低.本文引入面向目标的关联挖掘技术,提出一种最长频繁序列挖掘算法,挖掘最长频繁序列作为特征用于恶意代码检测.首先,该方法提取样本文件的动态API序列并进行预处理;然后,使用最长频繁序列挖掘算法挖掘多个类别的最长频繁序列集合;最后,使用挖掘的最长频繁序列集合构造词袋模型,根据该词袋模型将样本文件的动态API序列转化为向量,使用随机森林算法构造分类器检测恶意代码.本文采用阿里云提供的数据集进行实验,恶意代码检测的准确率和AUC(Area Under Curve)值分别达到了95.6%和0.99,结果表明,本文所提出的方法能有效地检测恶意代码.
关键词：	恶意代码最长频繁序列序列挖掘词袋模型随机森林算法
收稿时间：	2019/10/17 0:00:00
修稿时间：	2019/12/31 0:00:00
Malware Detection Based on Longest Frequent API Sequence

HUANG Kun-Ming,ZHANG Lei,ZHAO Kui and LIU Liang.Malware Detection Based on Longest Frequent API Sequence[J].Journal of Sichuan University (Natural Science Edition),2020,57(4):681-688.

Authors:	HUANG Kun-Ming ZHANG Lei ZHAO Kui and LIU Liang

Abstract:	Existing malware detection methods based on dynamic API sequence mining do not consider the behavior differences between different types of malware, resulting in low efficiency of malicious code detection. In this paper, an object oriented association mining technology is introduced, and a malware detection method is proposed based on the longest frequent sequence mining algorithm of the same category. First, the method extracts the dynamic API sequences of sample files and preprocesses them; then, the longest frequent sequence mining algorithm is used to mine the longest frequent sequence sets of multiple categories; finally, the longest frequent sequence set is used to construct the word bag model, according to the word bag model, the dynamic API sequences of sample files are transformed into vectors, so that the longest frequent sequence mining algorithm can be used to mine the longest frequent sequence sets of multiple categories. Random forest algorithm is used to construct classifier to detect malicious code. In this paper, we use the data set provided by the Aliyun Security Algorithms Challenge. The accuracy rate and AUC of malware detection are 95.6% and 0.99, respectively. The results show that the proposed method can effectively detect the malware.

Keywords:	malware detection longest frequent API sequence sequence mining Bag-of-word malware detection
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏