Abstract:Facial expression recognition is easy to lose a lot of useful feature information during feature extraction and cannot extract more comprehensive facial expression features. In view of these problems, a multi-scale feature fusion network model (DS-EfficientNet) is proposed. The model includes a deep network and a shallow network. The shallow network is used to extract the detailed texture information of facial expressions, and the deep network is used to extract the global information of expressions. An attention mechanism is added to the shallow network to enhance the ability to extract shallow detail information. Finally, feature fusion is performed on channels, and the network can extract more abundant facial expression information after the fusion. In order to reduce the model parameters and improve the generalization performance of the model, the fully connected layer is replaced by a global average pooling layer, and batch normalization is added. The method proposed in this study is tested on Fer2013 and CK+, and the recognition accuracy reaches 73.47% and 98.84%. Experiments show that this method can extract more abundant facial expression information, and the model has a strong generalization ability.