随着Web应用的功能日趋复杂, 其安全问题不容乐观, Web应用安全性测试成为软件测试领域的研究重点之一. 漏洞报告旨在记录Web应用安全问题, 辅助Web应用测试, 提升其安全性与质量. 然而, 如何自动识别漏洞报告中的关键信息, 复现漏洞, 仍是当前的研究难点. 为此, 本文提出一种自动化的漏洞报告理解和漏洞复现方法, 首先, 依据漏洞报告的特点, 归纳其语法依存模式, 并结合依存句法分析技术, 解析漏洞描述, 提取漏洞触发的关键信息. 其次, 不同于常规自然语言描述, Web漏洞的攻击负载通常是非法字符串, 大多以代码片段的形式存在, 为此, 本文针对攻击负载, 设计提取规则, 完善漏洞报告中攻击负载的提取. 在此基础上, 考虑漏洞报告与Web应用文本描述不同但语义相近, 提出基于语义相似度的漏洞复现脚本自动生成方法, 实现Web应用漏洞的自动复现. 为验证本文方法的有效性, 从漏洞收集平台Exploit-db的300余个Web应用项目中收集了400份漏洞报告, 归纳出其语法依存模式; 并针对23个开源Web应用涉及的26份真实漏洞报告进行漏洞复现实验, 结果表明本文方法可有效提取漏洞报告的关键信息, 并据此生成可行测试脚本, 复现漏洞, 有效减少人工操作, 提升漏洞复现效率.
As Web applications become increasingly complex, their security issues happen frequently. Web application security testing has become one of the research priorities in the field of software testing. Vulnerability reports aim to document Web application security issues and assist Web application testing to improve its security and quality. However, how to automatically identify the key information in vulnerability reports and reproduce the vulnerability is still a research challenge. To this end, this study proposes an automatic approach to comprehend vulnerability reports and reproduce the vulnerability. Firstly, based on the characteristics of vulnerability reports, the study summarizes their grammar dependency patterns and combines them with dependency syntactic parsing techniques to parse vulnerability descriptions and extract key information about vulnerability triggers. Secondly, unlike conventional natural language descriptions, the payload of Web vulnerability is usually an illegal string, mostly in the form of a code fragment. For this reason, the study designs extraction rules for the payload solely to improve the extraction of vulnerability reports. On this basis, considering that vulnerability reports and Web application text descriptions are different but semantically similar, the study proposes a semantic similarity-based method to achieve the automatic reproduction of Web application vulnerability. To verify the effectiveness of this study, 400 vulnerability reports are collected from more than 300 Web application projects in the vulnerability collection platform Exploit-db, and their grammar dependency patterns are summarized. A total of 26 real vulnerability reports involving 23 open-source Web applications are used for vulnerability reproduction experiments. The results show that the proposed method can effectively extract key information from vulnerability reports and generate feasible test scripts to reproduce vulnerability, reducing manual operations, and improving the efficiency of vulnerability reproduction.