Code Clone Detection Based on Token Semantics
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Traditional token-based clone detection methods utilize the serialization characteristics of code strings to quickly detect clones in large code repositories. However, compared with the methods based on the abstract syntax tree (AST) and program dependency graph (PDG), traditional methods can hardly detect code clones with large text differences due to the lack of syntax and semantic information. Therefore, this study proposes a token-based clone detection method with semantic information. First, AST is analyzed, and the semantic information of tokens located at the leaf nodes is abstracted using the AST path. Then, a low-cost index is established on the tokens for function names and type roles to quickly filter valid candidate clone fragments. Finally, the similarity between code blocks is judged using the tokens with semantic information. The experimental results on the public large-scale dataset BigCloneBench reveal that this method significantly outperforms the mainstream methods, including NiCad, Deckard, and CCAligner in Moderately Type-3 and Weakly Type-3/Type-4 clones with low text similarity while requiring less detection time on large code repositories.

    Reference
    Related
    Cited by
Get Citation

王文杰,徐云.基于Token语义构建的代码克隆检测.计算机系统应用,2022,31(11):60-67

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:February 24,2022
  • Revised:March 28,2022
  • Adopted:
  • Online: July 14,2022
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063