首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进编辑距离和LCS的同源性检测技术
引用本文:刘云龙.基于改进编辑距离和LCS的同源性检测技术[J].北京理工大学学报,2017,37(2):168-174.
作者姓名:刘云龙
作者单位:工业和信息化部计算机与微电子发展研究中心,北京,100048
基金项目:电子信息产业发展基金资助项目(工信部财函[2011]506号)
摘    要:传统基于Token的同源性检测算法存在代码变体结构化信息定位困难、模块提取、识别能力差、同源性度量精度低的问题.为此,提出了一种基于改进编辑距离和LCS(longest common sequence)的结构化识别同源性检测技术.在编辑距离(edit distance)计算中,引入交换算子,提高模块内部同源性度量精度.在LCS算法中,引入相似模块度量的最小尺寸监测机制和代码行间最大动态相关性度量,提供代码结构边界划分、模块行关联、代码有效结构化信息抽取的能力.实验证明,该方法是一种有效的基于结构化信息的同源性检测技术,其随机抽样检测结果的准确率、召回率及F值均有较优表现,且稳定性较好. 

关 键 词:同源性检测  编辑距离  最长公共字串  结构化信息  代码变体
收稿时间:2014/4/21 0:00:00

A Homology Detection Technology Based on Improved Edit Distance and LCS
LIU Yun-long.A Homology Detection Technology Based on Improved Edit Distance and LCS[J].Journal of Beijing Institute of Technology(Natural Science Edition),2017,37(2):168-174.
Authors:LIU Yun-long
Institution:Research Center for Computer and Microelectronics Industry Development, Beijing 100048, China
Abstract:Because some problems existed in traditional token-based algorithm for homology detection in structured information location, module identification, module extraction and high precision homology measure for code variants, a structured recognition homology detection technology was proposed based on an improved edit distance algorithm and improved longest common sequence (LCS) algorithm. In the edit distance calculation, the exchange operator was introduced to improve the measurement accuracy of internal homology modules. In the LCS algorithm, a minimum size monitoring mechanism and line maximum dynamic correlation measure were introduced for similar modules, which offered the ability of code structure boundary division, module line association and structured information extraction. Experiments show that the structure information based algorithm is effective and stable for code homology detection, and the results of random sampling detection show its better performances in precision, recall rate and F values. Experiments show that the algorithm utilizing structure information for code homology detection is effective and stable, and the results of random sampling detection have better performances in precision, recall rate and F values.
Keywords:homology detection  edit distance  longest common sequence  structured information  code variants
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号