首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于生物信息的未知二进制协议聚类方法
引用本文:丛培鑫,李晓慧,王俊峰.基于生物信息的未知二进制协议聚类方法[J].四川大学学报(自然科学版),2022,59(3):032004-76.
作者姓名:丛培鑫  李晓慧  王俊峰
作者单位:四川大学计算机学院,四川大学网络空间安全学院,四川大学计算机学院
基金项目:基础加强计划重点项目(2019-JCJQ-ZD-113)
摘    要:协议聚类是协议逆向工程技术中非常重要的一步,针对二进制协议更加透明且满足的协议种类更加广泛的特点,提出了一种基于基因和蛋白质生物信息的二进制协议聚类方法,能够从原始序列角度对大量协议直接进行聚类.本文方法首先将原始二进制报文转化成四进制基因形式,使用快速聚类方法计算碱基两两组合的k-seed值生成距离矩阵,并用UPGMA计算最小距离生成树得到初始分簇;其次,将每一簇四进制协议报文转化成十六进制蛋白质链,得到序列更有语义的方式并采用基于改进mBed算法的聚类方法将其进行高精度聚类.通过对已知和未知协议单纯和混合场景下的测试表明,该方法能够对二进制协议实现高效并且高准确率的聚类,具有较高的应用价值.

关 键 词:未知协议  二进制协议  生物信息学  多序列比对
收稿时间:2021/11/30 0:00:00
修稿时间:2022/1/17 0:00:00

Unknown binary protocol clustering method based on biological information
CONG Pei-Xin,LI Xiao-Hui and WANG Jun-Feng.Unknown binary protocol clustering method based on biological information[J].Journal of Sichuan University (Natural Science Edition),2022,59(3):032004-76.
Authors:CONG Pei-Xin  LI Xiao-Hui and WANG Jun-Feng
Institution:College of Computer Science, Sichuan University,School of Cyber Science and Engineering, Sichuan University,College of Computer Science, Sichuan University
Abstract:Protocol clustering is a very important step in protocol reverse engineering technology. Aiming at the characteristics of binary protocols that are more transparent and satisfying a wider range of protocols, a binary protocol clustering method based on genetic and protein biological information is proposed, which can learn from the original sequence Angle to cluster protocols directly. The method firstly converts the original binary message into a quaternary gene form, uses the fast clustering method to calculate the k seed value of the base pairwise combination to generate a distance matrix, and uses UPGMA to calculate the minimum distance spanning tree to obtain the initial cluster; A cluster of quaternary protocol messages is converted into a hexadecimal protein chain, and the sequence is obtained in a more semantic way. The clustering method based on the improved mBed algorithm is used to cluster them with high precision. Tests under pure and mixed scenarios of known and unknown protocols show that this method can achieve efficient and high accuracy clustering of binary protocols, and has high application value.
Keywords:Unknown protocol  binary protocol  bioinformatics  multiple sequence alignment
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号