为了提高软件的可靠性, 软件缺陷预测已经成为软件工程领域中一个重要的研究方向. 传统的软件缺陷预测方法主要是设计静态代码度量, 并用机器学习分类器来预测代码的缺陷概率. 但是, 静态代码度量未能充分考虑到潜藏在代码中的语义特征. 根据这种状况, 本文提出了一种基于深度卷积神经网络的软件缺陷预测模型. 首先, 从源代码的抽象语法树中选择合适的结点提取表征向量, 并构建字典将其映射为整数向量以方便输入到卷积神经网络. 然后, 基于GoogLeNet设计卷积神经网络, 利用卷积神经网络的深度挖掘数据的能力, 充分挖掘出特征中的语法语义特征. 另外, 模型使用了随机过采样的方法来处理数据分类不均衡问题, 并在网络中使用丢弃法来防止模型过拟合. 最后, 用Promise上的历史工程数据来测试模型, 并以AUC和F1-measure为指标与其他3种方法进行了比较, 实验结果显示本文提出的模型在软件缺陷预测性能上得到了一定的提升.
In order to improve the reliability of software, software defect prediction has become an important research direction in the field of software engineering. Traditional software defect prediction methods mainly design static code metrics and use machine learning classifiers to predict the defect probability of the code. However, the static code metrics do not fully consider the semantic features hidden in the code. According to this situation, this study proposes a software defect prediction model based on convolutional neural network. First, extract the characterization vectors from the appropriate nodes in the abstract syntax tree of the source code, and construct a dictionary to map them to integer vectors to facilitate input to the convolutional neural network. Then, a convolutional neural network is designed based on GoogLeNet, and the ability of the convolutional neural network to deeply mine data is used to fully mine the grammatical and semantic features of the features. In addition, this model uses the method of random oversampling to deal with the imbalance of data, and uses the method dropout in the network to prevent the model from overfitting. Finally, the historical engineering database on Promise is used to test the model, and AUC and F1-measure are used as indicators to compare with the other three methods. The results show that the proposed model has a certain improvement in software defect prediction performance.