首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于文本嵌入特征表示的恶意软件家族分类
引用本文:张涛,王俊峰.基于文本嵌入特征表示的恶意软件家族分类[J].四川大学学报(自然科学版),2019,56(3):441-449.
作者姓名:张涛  王俊峰
作者单位:四川大学,四川大学
基金项目:国家重点研发计划项目(2016YFB0800605、2016QY06X1205);装备预研教育部联合基金(6141A02033304、6141A02011607);四川省重点研发计划项目(18ZDYF3867、18ZDYF2039)
摘    要:自动化、高效率和细粒度是恶意软件检测与分类领域目前面临的主要挑战.随着深度学习在图像处理、语音识别和自然语言处理等领域的成功应用,其在一定程度上缓解了传统分析方法在人力和时间成本上的巨大压力.因此本文提出一种自动、高效且细粒度的恶意软件分析方法-mal2vec,其将每个恶意软件看成是一个具有丰富行为语义信息的文本,文本的内容由恶意软件动态执行时的API序列构成,采用经典的神经概率模型Doc2Vec对文本集进行训练学习.实验结果表明,与Rieck~(1])等人的分类效果相比,本文方法得到的效果有明显提升.特别的,不同于其他深度学习的方法,本文方法能够抽取模型训练的中间结果进行显式表示,这种显式的中间结果表示具有可解释性,可以让我们从细粒度层面分析恶意软件家族的行为模式.

关 键 词:恶意软件  分类  文本嵌入  神经概率语言模型
收稿时间:2018/8/9 0:00:00
修稿时间:2018/12/15 0:00:00

Malware family classification based on text embedding feature representation
zhangtao and wang junfeng.Malware family classification based on text embedding feature representation[J].Journal of Sichuan University (Natural Science Edition),2019,56(3):441-449.
Authors:zhangtao and wang junfeng
Institution:Sichuan University,Sichuan University
Abstract:Automation, efficiency, and granularity are major challenges in the area of malware detection and classification. With the successful application of deep learning in the fields of image processing, speech recognition and natural language processing, it has alleviated the enormous pressure of traditional analysis methods on manpower and time cost to some extent. This paper describes mal2vec: an automatic, efficient and fine grained malware analysis method, which treats each malware as a text with rich behavioral semantic information. The content of the text is composed of API sequences when malware is dynamically executed. We use the classical neural probability model Doc2Vec to train the text set. The experimental results show that the effect of this paper is significantly improved compared with the classification effect of Rieck et al. In particular, unlike other methods of deep learning, this method can extract the intermediate results of model training for explicit representation. This explicit intermediate result is interpretable and allows us to analyze the behavior patterns of the malware family from a fine grained level.
Keywords:Malware  Classification  Text Embedding  NNLM
本文献已被 CNKI 等数据库收录!
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号