首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Spark日志整合与FCM-DNN的网络流量分析算法
引用本文:李 腾,郭晓东,胡宇鹏,李 振.Spark日志整合与FCM-DNN的网络流量分析算法[J].福州大学学报(自然科学版),2023,51(5):677-683.
作者姓名:李 腾  郭晓东  胡宇鹏  李 振
作者单位:山东大学信息化工作办公室 山东 济南,山东大学信息化工作办公室 山东 济南;,山东大学软件学院 山东 济南,山东大学信息化工作办公室 山东 济南
基金项目:国家自然科学基金资助项目(62276155);山东省自然科学基金资助项目(ZR2021MF040)
摘    要:为了解决网络设备类型划分粒度粗,导致网络流量无法准确分类的问题,提出了一种基于Spark日志集成与FCM-DNN的流量分析算法。首先,该方法使用Spark集成会话日志以获取可分析的结构化数据;然后对同一网站的行为数据进行聚类,提取网站的多类簇特征集合,以解决单个会话连接特征维度较少、特征相似且不平衡的问题;最后,构建DNN网络,将统一化后的聚类特征与原始特征结合进行训练,并从聚类分组长度和损失函数等多个方面进行算法优化。仿真实验结果表明,对于特征较少的会话日志数据,该算法有效提高了网站分类的准确性,同时在保留学生上网特征的前提下将日志压缩了700倍,从而节省了存储开销。

关 键 词:内存计算引擎  日志整合  网站行为聚类  多类簇特征生成  全连接神经网络
收稿时间:2023/10/4 0:00:00
修稿时间:2023/10/13 0:00:00

Network flow traffic analysis algorithm based on Spark log integration and FCM-DNN
LI Teng,GUO Xiaodong,HU Yupeng,LI Zhen.Network flow traffic analysis algorithm based on Spark log integration and FCM-DNN[J].Journal of Fuzhou University(Natural Science Edition),2023,51(5):677-683.
Authors:LI Teng  GUO Xiaodong  HU Yupeng  LI Zhen
Institution:Informatization Office,Shandong University,Jinan,Informatization Office,Shandong University,Jinan;China;School of Software,Shandong University,Jinan;China,School of Software,Shandong University,Jinan,Informatization Office,Shandong University,Jinan
Abstract:To address the challenge of accurately classifying network traffic, attributed to the limited granularity in categorizing network devices, a novel traffic analysis algorithm leveraging Spark log integration and FCM-DNN is proposed. Firstly, the method employs Spark to consolidate session logs, yielding structured and analyzable data. Subsequently, clustering is applied to group behavior data from the same website, thereby extracting a multi-cluster feature collection. This approach mitigates issues stemming from insufficient connection feature dimensions and imbalanced and similar features of a single session. Finally, the method constructs a DNN network and combine the unified cluster features with original features for training, optimizing the algorithm across various dimensions, such as cluster grouping length and loss functions. Simulation experiments demonstrate that, even with session log data containing fewer features, our algorithm significantly enhances website classification accuracy and reduces storage overhead by compressing logs by a factor of 700, while retaining critical student online features.
Keywords:spark  log integration  website behavior clustering  multi-cluster feature collection  dnn
点击此处可从《福州大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《福州大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号