首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于软件历史仓库和抽象语法树的相似缺陷识别方法
引用本文:龚丹,王甜甜,苏小红,董美含.基于软件历史仓库和抽象语法树的相似缺陷识别方法[J].系统工程与电子技术,2020,42(10):2399-2408.
作者姓名:龚丹  王甜甜  苏小红  董美含
作者单位:1. 哈尔滨工业大学计算机科学与技术学院, 黑龙江 哈尔滨 1500012. 哈尔滨华德学院计算机科学与技术系, 黑龙江 哈尔滨 150001
基金项目:国家自然科学基金(61672191);“十三五”国家重点研发计划(2017YFC0702204)
摘    要:软件开发过程中,软件开发人员常常通过搜索软件历史仓库(historical software repository, HSR),再经复制/粘贴以实现软件复用。HSR中会保存被复用的代码的缺陷及修复信息,辅助处理相似缺陷。基于此,提出一种基于HSR挖掘的相似缺陷识别方法。首先,基于变更日志的分析,从HSR中提取出已知缺陷的模块,建立bug模块库。然后,采用基于抽象语法树(abstract syntax tree, AST)的相似代码检测方法,识别待测试软件与bug模块库中相似的代码,并借助HSR中保存的相应缺陷及修复信息,完成待测试软件中可能包含潜在缺陷的模块的识别。同时,为提高相似代码的识别精度,优化基于AST的代码特征度量。在18个C程序、164对克隆代码上进行实验,结果表明所提方法能够识别出全部相似代码且性能优于已有工具。在人工构建的bug模块库上验证了代码相似性对相似缺陷识别的作用。最后,在8个真实的大型C项目上进行验证,平均缺陷召回率达到94%,表明挖掘HSR可以有效地为跨项目传播的相似代码提供缺陷理解支持。

关 键 词:软件复用  软件历史仓库  克隆代码  相似缺陷  抽象语法树  
收稿时间:2020-01-29

Identification method of similar bugs based on historical software repository and abstract syntax tree
Dan GONG,Tiantian WANG,Xiaohong SU,Meihan DONG.Identification method of similar bugs based on historical software repository and abstract syntax tree[J].System Engineering and Electronics,2020,42(10):2399-2408.
Authors:Dan GONG  Tiantian WANG  Xiaohong SU  Meihan DONG
Institution:1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China2. Department of Computer Science and Technology, Harbin Huade University, Harbin 150001, China
Abstract:In the process of software development, software developers often search the historical software repository (HSR), and then copy/paste the code required to realize software reuse. Bugs and the fixing information of the reused codes are stored in the HSR, which can assist in dealing with the similar bugs. Therefore, a similar bug identification method based on HSR mining is proposed. Firstly, based on the analysis of the change log, the modules with known bugs are extracted from the HSR, then the bug module library is established. Then, the similarity code detection method based on abstract syntax tree (AST) is used to identify the similar code both in the software to be tested and the bug module library. With the help of the corresponding bug and the fix information stored in the HSR, the module that may contain potential bugs in the software to be tested is identified. At the same time, in order to improve the recognition accuracy of the similar codes, the code feature measurement based on AST is optimized. The experimental results on 18 C programs and 164 clone codes show that the proposed method can identify all the similar codes and its performance is better than the existing tools. The effect of code similarity on similar bug identification is verified on the manually built bug module library. Finally, an empirical study on 8 large real-world C projects is proceeded. The average bug recall rate is 94%, which, shows that mining HSR can effectively support bug understanding on circumstance of the similar codes spreading across projects.
Keywords:software reuse  historical software repository (HSR)  clone code  similar bug  abstract syntax tree (AST)  
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号