2022, 31(11):1-9. DOI: 10.15888/j.cnki.csa.008802
Abstract:As new deep learning models, graph neural networks are widely used in graph data and promote various applications, such as recommendation systems, social networks, and knowledge graphs. Most existing heterogeneous graph neural models usually predefine multiple metapaths to capture composite relationships in heterogeneous graphs. However, some models usually consider one metapath during the feature aggregation, leading to models only learning neighbor structure but ignoring the global correlation of multiple matapaths. Others omit intermediate nodes and edges along the metapath, which means models cannot learn the semantic information in each metapath. To address those limitations, this study proposes a new model named metapath-based graph Transformer neural network (MaGTNN). Specifically, MaGTNN first samples heterogeneous graph as metapath-based multi-relation graph and then uses the proposed position encoder and edge encoder to capture the semantic information in a metapath. Subsequently, all the matapath-based neighbor information is aggregated to the target node through their similarity, which is calculated by the improved graph Transformer layer. Extensive experiments on three real-world heterogeneous graph datasets for node classification and node clustering show that MaGTNN achieves more accurate prediction results than state-of-the-art baselines.
2022, 31(11):10-20. DOI: 10.15888/j.cnki.csa.008786
Abstract:In the engineering field, operators need to face complex information interfaces with unevenly distributed stimuli and perform related interactive tasks. Visual attention allocation of operators has been proved to be closely related to task performance. However, the potential connection between visual attention allocation by multi-priority stimuli based on different information allocation strategies and task performance in complex interfaces requires further investigation. In this study, task performance and visual behavior of operators under different load conditions are studied on the basis of the experiment of the multi-priority attention allocation strategy. The experimental results indicate that the differential allocation strategy and information priority division improve the task performance, and the visual behavior differs significantly under different allocation strategies and priorities and is affected by mental loads. This conclusion can provide a reference for the design and optimization of human-computer interfaces and thus improve the task performance of operators.
2022, 31(11):21-30. DOI: 10.15888/j.cnki.csa.008822
Abstract:Using traditional k-anonymization techniques to achieve privacy protection in social networks is faced with problems such as single clustering criterion and under-utilization of data and information in the graph. To solve this problem, this study proposes an anonymization technique measuring the similarity of the node 1-neighbor graph based on the Kullback-Leibler divergence (SNKL). The original graph node set is divided according to the similarity of node 1-neighbor graph distribution, and the graph is modified according to the divided classes so that the modified graph satisfies k-anonymity. On this basis, the anonymous release of the graph is implemented. The experimental results show that compared with the HIGA method, the SNKL method reduces the amount of change in the clustering coefficients by 17.3% on average. Moreover, the overlap ratio between the importance nodes of the generated anonymous graph and those of the original graph is maintained at more than 95%. In addition to protecting privacy effectively, the proposed method can significantly reduce the changes brought to the structural information in the original graph.
2022, 31(11):31-48. DOI: 10.15888/j.cnki.csa.008811
Abstract:Speech emotion recognition (SER) plays an extremely important role in the process of human-computer interaction (HCI), which has attracted much attention in recent years. At present, most SER approaches are mainly trained and tested on a single emotion corpus. In practical applications, however, the training set and testing set may come from different emotion corpora. Due to the huge difference in the distribution of different emotion corpora, the cross-corpus recognition performance achieved by most SER methods is unsatisfactory. To address this issue, many researchers have started focusing on the studies of cross-corpus SER methods in recent years. This study systematically reviews the research status and progress of cross-corpus SER methods in recent years. In particular, the application of the newly developed deep learning techniques on cross-corpus SER tasks is analyzed and summarized. Firstly, the emotion corpora commonly used in SER are introduced. Then, on the basis of deep learning techniques, the research progress of existing cross-corpus SER methods based on hand-designed features and deep features is summarized and compared from the perspectives of supervised, unsupervised, and semi-supervised learning. Finally, the challenges and opportunities in the field of cross-corpus SER are discussed and predicted.
2022, 31(11):49-59. DOI: 10.15888/j.cnki.csa.008847
Abstract:Air pollution is an important factor affecting public health, and air quality prediction is the key to air pollution early warning and a hot research topic in the fields of environmental science, statistics, and computer science in recent years. This study reviews the research status and progress of air quality prediction methods, with a special focus on the systematical analysis and summarization of the applications of the newly-emerged deep learning methods in recent years in air quality prediction. Specifically, the evolution process of air quality prediction methods and air pollution datasets are outlined. After the traditional air quality prediction methods are described, the progress of existing deep learning-based air quality prediction methods is analyzed and compared in detail from the perspectives of temporal information, temporal-spatial information, and attention mechanisms. Finally, the development trend of air quality prediction methods is summarized and predicted.
2022, 31(11):60-67. DOI: 10.15888/j.cnki.csa.008783
Abstract:Traditional token-based clone detection methods utilize the serialization characteristics of code strings to quickly detect clones in large code repositories. However, compared with the methods based on the abstract syntax tree (AST) and program dependency graph (PDG), traditional methods can hardly detect code clones with large text differences due to the lack of syntax and semantic information. Therefore, this study proposes a token-based clone detection method with semantic information. First, AST is analyzed, and the semantic information of tokens located at the leaf nodes is abstracted using the AST path. Then, a low-cost index is established on the tokens for function names and type roles to quickly filter valid candidate clone fragments. Finally, the similarity between code blocks is judged using the tokens with semantic information. The experimental results on the public large-scale dataset BigCloneBench reveal that this method significantly outperforms the mainstream methods, including NiCad, Deckard, and CCAligner in Moderately Type-3 and Weakly Type-3/Type-4 clones with low text similarity while requiring less detection time on large code repositories.
2022, 31(11):68-78. DOI: 10.15888/j.cnki.csa.008765
Abstract:With the rapid development of the Internet of Things (IoT), the number of IoT devices has grown exponentially, which is accompanied by the increasing attention to IoT security. Generally, IoT devices adopt software attestation to verify the integrity of the software environment, so that system integrity tampering caused by the execution of malicious software can be detected timely. However, the existing software attestation suffers from poor performance in the synchronous attestation of massive IoT devices and the difficulty in extending the general IoT communication protocol. To address these problems, this study proposes a lightweight asynchronous integrity monitoring scheme. The scheme extends the security authentication message of software attestation on the general message queuing telemetry transport (MQTT) protocol and asynchronously pushes the integrity information of devices. It improves not only the security of IoT systems but also the efficiency of integrity attestation and verification. The following three security functions are realized: device integrity measurement in a kernel module; lightweight authentication extension of device identity and integrity based on MQTT; asynchronous integrity monitoring based on MQTT extension protocol. This scheme can resist common software attestation attacks and MQTT protocol attacks and has the characteristics of lightweight asynchronous software attestation and general MQTT security extension. The experimental results of the prototype system of IoT authentication based on MQTT show the high performance of the integrity measurement of IoT nodes, MQTT protocol connection authentication and PUBLISH message authentication, which can meet the application requirements of integrity monitoring of massive IoT devices.
2022, 31(11):79-90. DOI: 10.15888/j.cnki.csa.008799
Abstract:Multi-modal knowledge graph (MMKG) is a new research hotspot in artificial intelligence in recent years. This study provides a construction method for multi-modal domain knowledge graphs to solve the problem that the domain knowledge system of computer science is large and decentralized. Specifically, a systematic MMKG is constructed by crawling the relevant multi-modal data of computer science. However, the construction of an MMKG needs a lot of manpower and material resources. In response, this study trains a model of joint extraction of entities and relations based on the LEBERT model and relation extraction rules and ultimately implements an MMKG of the computer science domain that can automatically extract relation triples.
2022, 31(11):91-99. DOI: 10.15888/j.cnki.csa.008796
Abstract:Smart city is a new type of smart city transformation under the ternary organic integration of social space, physical space, and information system. It uses a new generation of information technology to optimize the city system, enhance the city’s quality and comprehensive competitiveness, and achieve sustainable development. In recent years, although the construction of smart cities has also been attached increasing importance, it has not been as smooth as the smart transformation of other fields. The construction process is still faced with many problems, which restricts the city’s development to a certain extent. This study takes the urban-scale Happiness Forest Belt building as an example, outlines a smart operation management and control platform developed for this building, and presents a condensed reference technical framework for a smart platform to meet its needs in aspects of digitization, visualization, smartness, and open development frameworks. Finally, some suggestions are proposed on the basis of this experience for constructing an intelligent city operation management and control platform, and its potential critical technical requirements are put forward.
2022, 31(11):100-110. DOI: 10.15888/j.cnki.csa.008798
Abstract:Discipline construction is the core of the development of colleges and universities. With the deepening and strengthening of discipline construction in colleges and universities, the information on discipline construction increases continuously. Nevertheless, the results of discipline construction can not be effectively managed in the manner of discrete document organization, which is not conducive to subsequent analysis and evaluation. To solve this problem, this study focuses on the construction and further application of discipline construction-oriented knowledge graphs. For this purpose, events are extracted from discipline construction texts by the BERT-BiLSTM-CRF model, and related knowledge is supplemented by the crawler. Then, the property graph model is selected to store knowledge, and a preliminary discipline construction-oriented knowledge graph is thereby built. Subsequently, this knowledge graph is availed to build a visualization system for discipline construction, and the minimum Steiner tree algorithm is adopted for the application of intelligent question answering. Finally, the validity of the proposed method is verified by experimental analysis of the methods of discipline construction-oriented event extraction and intelligent question answering.
2022, 31(11):111-119. DOI: 10.15888/j.cnki.csa.008770
Abstract:To solve the problems of missing feature extraction by convolutional neural network and insufficient multi-feature extraction of a gesture, this study proposes a static gesture recognition method based on a residual double attention module and a cross-level feature fusion module. The designed residual double attention module can enhance the low-level features extracted by a ResNet50 network, effectively learn the key information, update the weight, and improve the attention to high-level features. Then, the cross-level feature fusion module fuses the high-level and low-level features in different stages to enrich the semantic and location information between different levels in the high-level feature map. Finally, the Softmax classifier of the fully connected layer is used to classify and recognize the gesture image. The experiment is carried out on the American sign language (ASL) dataset. The average recognition accuracy is 99.68%, which is 2.52% higher than that of the basic ResNet50 network. The results show that the proposed method can fully extract and reuse gesture features and effectively improve the recognition accuracy of gesture images.
2022, 31(11):120-129. DOI: 10.15888/j.cnki.csa.008791
Abstract:The current operation mode of tower cranes has the problems such as high safety risks and low utilization of operators on sites. To solve the problems, a remote control system of tower cranes based on 5G MEC is proposed, which can freely access multiple tower cranes and multiple clients in different geographical positions and perform comprehensive management and control. It ensures that the low time delay based on 5G communications can be realized at the application level by modular design and targeted strategies for forwarding control of status data, control data, and media stream data, which provides a reference for distributed remote control of multiple devices and clients.
2022, 31(11):130-138. DOI: 10.15888/j.cnki.csa.008780
Abstract:OpenCL is an open source and free heterogeneous computing framework, which is widely used in architecture processors. HXDSP is a domestic DSP chip independently developed by the 38th Research Institute of China Electronic Technology Corporation. To solve the scheduling difficulties and insufficient hardware utilization of the HXDSP heterogeneous computing platform, this work studies the task scheduling system of OpenCL during operation. The automatic task graph extraction method during the operation of OpenCL is designed, and the classic static scheduling algorithm HEFT is improved by the combination of the hardware characteristics of HXDSP and the execution model characteristics of OpenCL. Thus, a heterogeneous dual-granularity earliest finish time (HDGEFT) scheduling algorithm is proposed, and experiments are designed on the HXDSP heterogeneous computing platform for verification. The experimental results reveal that the specially designed scheduling algorithm has great advantages in execution efficiency.
2022, 31(11):139-147. DOI: 10.15888/j.cnki.csa.008794
Abstract:In recent years, industry-university-research cooperation has become an important factor promoting industrial upgrading and economic development. Industry-university-research services are conducive to quickly integrating multiple resources, improving innovation efficiency, and enhancing the comprehensive competitiveness of enterprises. Governments at all levels have also launched various policies to support industry-university-research cooperation. However, a wide variety of enterprises, as an important part of such cooperation, find it difficult to collect and organize befitting supporting policies efficiently during the application process. Therefore, this study employs the technology of artificial intelligence-based text analysis to design and implement a policy matching system for industry-university-research services. This system can analyze and preprocess various policies from different sources, thereby enabling enterprises to quickly obtain supporting policies that match their specific conditions and ultimately saving manpower for the enterprises and improving their efficiency in applying for a project.
2022, 31(11):148-156. DOI: 10.15888/j.cnki.csa.008801
Abstract:Although deep reinforcement learning can solve many complex control problems, it needs to pay the cost of a large number of interactions with the environment, which is a major challenge for deep reinforcement learning. One of the reasons for this problem is that it is difficult for an agent to extract effective features from a high-dimensional complex input only by relying on the loss of value function. As a result, the agent has an insufficient understanding of the state and cannot correctly assign value to the state. Therefore, this study proposes a regularized predictive representation learning (RPRL) method combining forward state prediction and latent space constraint to make agents know the environment and improve the sample efficiency of reinforcement learning. The method helps agents to learn and extract state features from high-dimensional visual observations to improve the sample efficiency of reinforcement learning. The forward state transfer loss is used as the auxiliary loss so that the features learned by agents contain dynamic information related to environmental transition. At the same time, the state representation of latent space is regularized on the basis of forward prediction, which further helps the agent to learn the smooth and regular representation of the high-dimensional input. In DeepMind Control (DMControl) environment, the proposed method achieves better performance than other model-based methods and model-free methods with representation learning.
2022, 31(11):157-166. DOI: 10.15888/j.cnki.csa.008793
Abstract:In view of a large quantity of parameters in the Inception-v3 network, this study proposes an effective gesture image recognition method, which can meet the needs of high-precision gesture recognition with few model parameters. In this study, the structure of Inception-v3 is used to redesign the Inception module of the original Inception-v3 to reduce the number and difficulty of learning parameters, and with the residual connection, the integrity of information is protected while the network degradation is prevented. The attention mechanism module is introduced to make the model focus on useful information and dilute useless information, and to a certain extent, it also prevents the overfitting of the model. Moreover, the feature fusion is carried out between the up-sampling and the low-level feature in the model, and the fused feature has better discrimination than the original input feature, which further improves the accuracy of the model. The experimental results indicate that the quantity of the parameters in the improved Inception-v3 network is only 1.65 M, and it has higher accuracy and faster convergence speed. Then, the ASL sign language dataset and the Bangladesh sign language dataset are jumbled separately, and the training set and validation set are divided at a ratio of 4:1. The recognition rates of the improved Inception-v3 on the ASL sign language dataset and Bangladesh sign language dataset are 100% and 95.33%, respectively.
2022, 31(11):167-174. DOI: 10.15888/j.cnki.csa.008804
Abstract:In terms of the problems such as haze residues and color distortion in existing dehazing methods, this study takes advantage of a generative adversarial network in reconstructing image super-resolution and proposes an image dehazing algorithm based on channel attention and conditional generative adversarial network (CGAN-ECA). Specifically, the network is based on the encoder-decoder structure. The generator is designed with the multi-scale residual block (MRBlk) and efficient channel attention (ECA) to expand the receptive field, extract multi-scale features, dynamically adjust the weights of different channels, and improve the utilization rate of features. In addition, the Markovian discriminator (PatchGAN) is used to evaluate images and improve the accuracy in identifying images. At the same time, a content loss is added into the loss function to reduce pixel-level and feature-level losses of dehazing images, retain more image details, and achieve high-quality image dehazing. The test results based on the public dataset RESIDE show that compared with DCP, AOD-Net, DehazeNet, and GCANet models, the proposed model increases the peak signal to noise ratio (PSNR) and the structural similarity index (SSIM) by 36.36% and 8.80%, respectively, and color distortion and haze residue are solved. Therefore, CGAN-ECA is an effective method for image dehazing.
2022, 31(11):175-183. DOI: 10.15888/j.cnki.csa.008813
Abstract:Different from the laboratory environment, the scenes of facial expression images in real life are complex, and local occlusion, the most common problem, will cause a significant change in the facial appearance. As a result, the global feature extracted by a model contains redundant information unrelated to emotions, which reduces the discrimination of the model. Considering this problem, a facial expression recognition method combining contrastive learning and the channel-spatial attention mechanism is proposed in this study, which learns local salient emotion features and pays attention to the relationship between local features and global features. Firstly, contrastive learning is introduced. A new positive and negative sample selection strategy is designed through a specific data augmentation method, and a large amount of easily accessible unlabeled emotion data is pre-trained to learn the representation with occlusion-aware ability. Then, the representation is transferred to the downstream facial expression recognition task to improve recognition performance. In the downstream task, the expression analysis of each face image is transformed into the emotion detection of multiple local regions. The fine-grained attention maps of different local regions of a face are learned using the channel-spatial attention mechanism, and the weighted features are fused to weaken the noise effect caused by the occlusion content. Finally, the constraint loss for joint training is proposed to optimize the final fusion feature for classification. The experimental results indicate that the proposed method achieves comparable results to existing state-of-the-art methods on both public non-occluded facial expression datasets (RAF-DB and FER2013) and synthetic occluded facial expression datasets.
2022, 31(11):184-191. DOI: 10.15888/j.cnki.csa.008803
Abstract:Considering strong noise interference and difficult shadow detection in high-resolution remote sensing images of high-rise buildings, this study proposes a shadow detection method for remote sensing images of high-rise buildings, which is based on the combination of improved threshold segmentation and residual attention networks. Firstly, a threshold segmentation model is built by the improved maximum inter-class and minimum intra-class threshold segmentation algorithm, and on the basis of the connected domain characteristics and end-point positional constraint relationships between contours, the Euclidean metric algorithm is used to repair the broken contours for the shadow contours. Then, the generative adversarial network (GAN) model is used to expand the misjudgment data set. Finally, the residual network is improved, and the attention mechanism is added to the feature map for global feature fusion. In different scenes, the proposed method is compared with the radiation model, histogram threshold segmentation, color model-based shadow detection method, support vector machine (SVM), visual geometry group (VGG) network, Inception, and classification network of residual networks, and the proposed method has a comprehensive misjudgment rate and missed detection rate of 2.1% and 1.5%, respectively. The results reveal that the proposed algorithm can better complete the segmentation and detection of shadow areas, which is conducive to saving human and material resources and assisting staff with their work such as interpreting remote sensing information and establishing remote sensing archives. The proposed method has practical value.
2022, 31(11):192-198. DOI: 10.15888/j.cnki.csa.008807
Abstract:Different from ordinary object detection tasks, the difficulty of detecting tile surface defects lies in the detection of unconventional size objects, such as small-sized objects and objects with large aspect ratios. To solve these two problems, this study proposes a new type of tile surface defect detection algorithm based on improved Cascade R-CNN. To improve the detection ability for small defects, the model in this study uses the lateral connection structure to fuse the semantic information of the upper and lower layers and applies the dilated convolution with switchable dilation rates to increase the receptive field of the model. To improve the detection ability for defects with large aspect ratios, the proposed model introduces an offset field on the standard convolution to better extract the object feature information. In addition, the model adjusts the size and length of the pre-selected anchor box in the Cascade R-CNN framework. The experimental results show that on the dataset collected from the tile factory, the mean average precision (mAP) of the proposed algorithm reaches 73.5%, which is 9.7% higher than that of the Cascade R-CNN model before improvement. The experimental code of this study is available at: https://github.com/mashibin/Ceramic-tile-defect-detection.
2022, 31(11):199-206. DOI: 10.15888/j.cnki.csa.008760
Abstract:Solving expensive optimization problems is often accompanied by computational cost disasters. To reduce the number of real evaluations of the objective function, this study uses the ordinal prediction method in the selection of candidate solutions in evolutionary algorithms. The relative quality of candidate solutions is directly obtained through classification prediction, which avoids the need to establish an accurate surrogate model for the objective function. In addition, a reduction method for the ordinal sample set is designed to reduce the redundancy of the ordinal sample set and improve the training efficiency of the ordinal prediction model. Next, the ordinal prediction is combined with the genetic algorithm. The simulation experiments of the ordinal prediction-assisted genetic algorithm on the expensive optimization test function show that the ordinal prediction method can effectively reduce the computational cost of solving expensive optimization problems.
2022, 31(11):207-214. DOI: 10.15888/j.cnki.csa.008775
Abstract:Although the SemBERT model is an improved version of the BERT model, it has two obvious defects. One is its poor ability to obtain vector representation. The other is that it directly uses conventional features to classify tasks without considering the category of the features. A new feature reorganization network is proposed to address these two defects. This model adds a self-attention mechanism into the SemBERT model and obtains better vector representation with an external feature reorganization mechanism. Feature weights are also reassigned. Experimental data show that the F1 score of the new method on the Microsoft Research Paraphrase Corpus (MRPC) dataset is one percentage point higher than that of the classical SemBERT model. The proposed model has significantly improved performance on small datasets, and it outperforms most of the current outstanding models.
2022, 31(11):215-222. DOI: 10.15888/j.cnki.csa.008773
Abstract:The research on the recognition of abnormal human behavior in video surveillance systems is of great significance. As traditional algorithms are easily affected by the environment and have poor timeliness and accuracy, an abnormal behavior recognition algorithm based on skeleton sequence extraction is proposed. Firstly, the improved YOLOv3 network is used to detect targets and is combined with the RT-MDNet algorithm to track them for target trajectories. Then, the OpenPose model is employed to extract the skeleton sequence of targets in the trajectories. Finally, the spatiotemporal graph convolutional network combined with clustering is applied to recognize the abnormal behavior of the targets. The experimental results indicate that the proposed algorithm has a processing speed of 18.25 fps and recognition accuracy of 94% under a complex background of light changes, which can accurately identify the abnormal behavior of various targets in real time.
2022, 31(11):223-229. DOI: 10.15888/j.cnki.csa.008781
Abstract:For lower complexity of the load sequence, the empirical mode decomposition (EMD) method is used to obtain different components. For shorter training time and a smaller cumulative error caused by component forecasting one by one, the components are reconstructed into high-frequency and low-frequency ones according to the zero-crossing rate of the components. The high-frequency components of the load are forecasted by the temporal convolutional network (TCN) model, whereas the low-frequency ones are forecasted by the extreme learning machine (ELM). The proposed EMD-TCN-ELM model is compared with three individual models TCN, ELM, and long short-term memory (LSTM) and three mixed models EMD-TCN, EMD-ELM, and EMD-LSTM through experiments, and its mean absolute percentage error (MAPE) is reduced by 0.538%, 1.866%, 1.191%, 0.026%, 1.559%, and 0.323%, respectively. The forecasting accuracy of the proposed model is also the highest. Additionally, the proposed model has the shortest training time among the top three models in forecasting accuracy. The above results verify the superiority of the proposed model in load forecasting accuracy and training time.
2022, 31(11):230-237. DOI: 10.15888/j.cnki.csa.008789
Abstract:Traditional fire warning methods have low detection accuracy and cannot give early warnings in time before the fire starts. Therefore, this study proposes an early fire warning algorithm based on deep learning. Firstly, an infrared thermal imager is used to collect infrared images in a specific scenario for dataset construction. Secondly, the improved YOLOv4 algorithm is applied for training, and the network weights are obtained. The convolutional attention module is introduced after the three output feature layers of the backbone network to improve the ability of the network to extract key information. Convolutional layers are added to the backbone network and path aggregation network to promote feature extraction capability. Finally, the proposed intelligent fire detection (IFD) algorithm is employed to process the predicted image and evaluate the fire hazard according to the score. The experimental results reveal that the mAP of the improved YOLOv4 algorithm on the dataset reaches 98.31%, which is 2.7% higher than that of the original YOLOv4 algorithm, and the FPS is 37.1 f/s; the accuracy of the IFD algorithm is 93%, and its false detection rate is 3.2%. The proposed early fire warning algorithm has the advantages of high detection accuracy and timely warnings when there is no fire.
2022, 31(11):238-245. DOI: 10.15888/j.cnki.csa.008820
Abstract:The generation of text adversarial samples is of great significance for studying the vulnerability of deep learning-based natural language processing (NLP) systems and improving the robustness of such systems. This work studies the important steps in the generation of word-level adversarial samples and the search for replacement words. Considering the problems of premature convergence and poor effectiveness of existing algorithms, a text adversarial sample generation method is proposed, which is based on an improved artificial bee colony (ABC) search algorithm. Firstly, the search space of the words to be replaced is obtained by the screening of the sememe annotations of the words in the HowNet database. Then, the improved ABC algorithm is employed to search and locate the replacement words for the generation of high-quality text adversarial samples. Finally, attack tests are conducted on two text classification datasets for a comparison with the current mainstream text classification models based on deep neural networks (DNNs). The results demonstrate that compared with the existing text adversarial sample generation methods, the proposed method can mislead the text classification system with a higher success rate of attack and preserve semantic and grammatical correctness to a larger extent.
2022, 31(11):246-253. DOI: 10.15888/j.cnki.csa.008745
Abstract:The traditional generative model ignores the important clues provided by key words in the process of abstract generation, which leads to the loss of key word information, and the generated abstract cannot agree with the original text well. In this study, an abstract generation method is proposed, which takes the pointer-generator network as the framework and integrates BERT pretraining model and key word information.?Firstly, the TextRank algorithm and the sequence model based on the attention mechanism are used to extract key words from the original text, and thus the generated key words can contain more information about the original text.?Secondly, the key word attention is added to the attention mechanism of the pointer-generator network to guide the generation of an abstract.?In addition, we use the double-pointer copy mechanism to replace the copy mechanism of the pointer-generator network and thus improve the coverage of the copy mechanism. The results on LCSTS data sets reveal that the designed model can contain more key information and improve the accuracy and readability of generated abstracts.
2022, 31(11):254-260. DOI: 10.15888/j.cnki.csa.008768
Abstract:In order to solve the problem of low accuracy caused by large classification loss in the lightweight target detection algorithm, a method of detecting the location and classification of the target with double detection heads is proposed. In the algorithm, the convolution head is used to detect the position, and the full connector is used to detect the classification. In the classification detection, after the feature map passes through the convolution layer, the feature map of the fused position regression branch is processed through the full connection layer. A grouping full connection method is proposed to further reduce the amount of calculation in the full connection layer. The algorithm is trained in VOC datasets. The results show that the classification loss of the improved model is significantly reduced, and the detection accuracy of the lightweight target detection algorithm is effectively improved. The accuracy of the algorithm on the VOC test set has reached 70.08% mAP.
2022, 31(11):261-267. DOI: 10.15888/j.cnki.csa.008795
Abstract:Skeleton data is compact and robust to environmental conditions for hand gesture recognition. Recent studies of skeleton-based hand gesture recognition often use deep neural networks to extract spatial and temporal information. However, these methods are likely to have problems such as complicated computation and a large number of model parameters. To solve this problem, this study presents a lightweight and efficient hand gesture recognition model. It uses two spatial geometric features calculated from skeleton sequences and automatically learned motion trajectory features to achieve hand gesture classification with convolutional networks alone as its backbone network. The proposed model has a minimum number of parameters as small as 0.16M and a maximum computational complexity of 0.03 GFLOPs. This method is also evaluated on two public datasets, where it outperforms the other methods that use skeleton modality as input.
2022, 31(11):268-274. DOI: 10.15888/j.cnki.csa.008788
Abstract:Crowd behavior recognition has important application value in public safety and other fields. Existing studies have considered the influence of such factors on crowd behavior as crowd emotions, crowd types, crowd density, and social and cultural backgrounds of crowds separately, but few models comprehensively consider these factors, which limits model performance. This study comprehensively considers the correlation between the physical features, social features, emotional and personality features, and cultural background features of the crowd and the influence of the combination of these factors on crowd behavior. As a result, a crowd behavior recognition model that integrates multiple features and time series is proposed. The model uses two parallel network layers to deal with the influence of multi-feature correlation and time-series dependence on crowd behavior separately. Meanwhile, the network layer fuses the structural causal model (SCM) and the causal graph network (CGN) based on the graph neural network (GNN) to improve the interpretability of the model. The experiments on the motion and emotion dataset (MED) and the comparison with other state-of-the-art models demonstrate that the proposed method can successfully identify crowd behavior and outperform the state-of-the-art methods.
2022, 31(11):275-281. DOI: 10.15888/j.cnki.csa.008818
Abstract:The vehicle routing problem (VRP) exists extensively in the modern logistics industry, which is an NP-hard problem in combinatorial optimization. Affected by factors such as diverse customer demand and road traffic restrictions, VRP becomes more complex, and it can hardly be solved by the traditional combinatorial optimization methods and operations research methods. In this study, a common VRP with time windows is studied. The waiting time of vehicles is reduced by the adjustment to the priority of customers according to the parameters of time windows. On this basis, several common heuristic algorithms are improved, and 56 common VRPs are tested. The experimental results reveal that the improved savings algorithm can produce good results for capacitated VRPs, and the improved insertion method has superior performance in VRPs with time windows. In addition, the improved heuristic algorithms can make the total distance better than the known optimal value when using more vehicles on the four test cases.
2022, 31(11):282-289. DOI: 10.15888/j.cnki.csa.008831
Abstract:Considering the impact of sudden railway damage on train operation, a single-track railway scheduling model is built on the basis of the train operation scheduling theory, and a two-echelon train scheduling algorithm with emergency handling capacity is designed. In the first stage, the running speed of the train in the section is adjusted, and in the second stage, the dwell time of the train is adjusted. Three effective search operators, an adaptive update rule and particle swarm optimization algorithm are combined to solve the single track railway train scheduling problem with the train delay rate as the optimization objective. The proposed algorithm is tested and compared with other algorithms under the same experimental conditions, and the emergency test proves the effectiveness of the proposed algorithm.
2022, 31(11):290-295. DOI: 10.15888/j.cnki.csa.008751
Abstract:Although the attribution explanation method based on Shapley value can quantify the interpretation results more accurately, the excessive computational complexity seriously affects the practicality of this method. In this study, we introduce the k-dimensional (KD) tree to reorganize the predicted data of the model to be explained, insert virtual nodes into the KD tree so that it meets the application conditions of the TreeSHAP algorithm, and then propose the KDSHAP method. This method lifts the restriction that the TreeSHAP algorithm can only explain tree models and broadens the efficiency of the algorithm in calculating Shapley value to the explanation of all black-box models without compromising calculation accuracy. The reliability of the KDSHAP method and its applicability in interpreting high-dimensional input models are analyzed through experimental comparisons.
2022, 31(11):296-308. DOI: 10.15888/j.cnki.csa.008779
Abstract:This study mainly analyzes the sentiment of user reviews on hotels by investigating the attitudes of users toward hotel configuration and service to help hotels improve the quality of personalized service. Specifically, a pretraining model based on the BiLSTM neural network is built and compared with traditional machine learning algorithms. The experimental results reveal that the analysis accuracy of support vector machines (SVMs) is more stable compared with that of naive Bayes, while the prediction accuracy using the pretraining model is slightly improved compared with that of the previous two. Moreover, an extended dictionary of sentiment, with the basic dictionary as the main part, is constructed for reviews on hotels, and the weights of negatives are weakened to reduce the impact on the classification of sentences with opposite meanings. The basic dictionary and the extended dictionary are used to classify the sentiment of the same corpus obtained, and the comparison of the results indicates that with the extended dictionary, the accuracy of the positive classification and negative classification is 86% and 84%, respectively. This indicates that the classification effect of the extended dictionary is better than that of the basic dictionary.
2022, 31(11):309-319. DOI: 10.15888/j.cnki.csa.008792
Abstract:In recent years, compute-intensive and time delay-sensitive applications such as AR/VR, online games, and 4K/8K ultra-high-resolution videos have been emerging. Due to the limitations of their hardware conditions, some mobile devices are unable to calculate such applications under the time-delay requirements, and running such applications will consume huge energy and reduce the endurance of mobile devices. To solve this problem, this study proposes an edge computing offloading and resource allocation scheme in a Wi-Fi network with the coordination of multiple access points (APs). Firstly, the genetic algorithm is utilized to determine the task offloading decision of users. Then, the Hungarian algorithm is used to allocate communication resources to users with task offloading. Finally, according to the time-delay limit of task processing, the computing resources of mobile edge computing (MEC) servers are allocated to the users with task offloading. The simulations reveal that the proposed task offloading and resource allocation scheme can effectively reduce the energy consumption of mobile devices on the premise of meeting the time-delay limit of task processing.
2022, 31(11):320-329. DOI: 10.15888/j.cnki.csa.008809
Abstract:In high-performance computing, the huge communication overhead has become one of the main bottlenecks in the improvement of its computing power, and the optimization of communication performance has always been an important challenge. For the communication optimization task, this study proposes a method based on in-network computing technology to reduce the communication overhead. In the Ethernet-based supercomputing environment, this method utilizes the RoCEv2 protocol, programmable switches, and OpenMPI to offload reduction computation to programmable switches, and it supports the two communication modes of Node and Socket. The collective communication benchmark test and the OpenFOAM application test are carried out in a real supercomputing environment. The experimental results indicate that when the number of server nodes reaches a certain scale, compared with the traditional host communication, this method shows better performance improvement in both Node and Socket modes, with the performance in the collective communication benchmark test improved by about 10%–30% and the overall application performance in the application-level test improved by about 1%–5%.
2022, 31(11):330-338. DOI: 10.15888/j.cnki.csa.008787
Abstract:As Bayesian deep learning (BDL) combines the complementary advantages of the Bayesian method and deep learning (DL), it becomes a powerful tool for uncertainty modeling and inference of complex problems. In this study, a BDL framework based on t distribution and the cyclic stochastic gradient Hamiltonian Monte Carlo sampling algorithm is constructed, and a measure of uncertainty is given in view of data uncertainty and model uncertainty. To verify the validity and applicability of the framework, this study constructs corresponding BDL models based on the artificial neural network (ANN), convolutional neural network (CNN), and recurrent neural network (RNN) separately and applies these models to the prediction of 15 global stock indices. The empirical results reveal that 1) the framework is applicable under ANN, CNN, and RNN, and the prediction effect of all indices is excellent; 2) in terms of prediction accuracy and applicability, the BDL models based on t distribution have significant advantages over those based on normal distribution; 3) the MAE under a given uncertainty threshold is better than the original MAE, which indicates that the measure of uncertainty defined in this study is effective and is of great significance to uncertainty modeling. In view of the advantages of the BDL framework in forecasting accuracy, easy to expand and providing measurement of forecasting uncertainty, it has a broad application prospect in finance and other fields with complex data characteristics.
2022, 31(11):339-348. DOI: 10.15888/j.cnki.csa.008763
Abstract:To study the electric field effect of conductive droplets with low conductivity, an electrohydrodynamic atomization (EHDA) solver based on the leaky dielectric model and the volume of fluid (VOF) method is designed by the computational fluid dynamics (CFD) software OpenFOAM. The numerical results are compared with Taylor’s analytical values, and the simulation results predict the deformation ways of droplets and the mode of circumfluence inside and outside the droplets. It is found that under the action of an external electric field, the droplets will become “prolate” or “oblate” and form stable circumfluence inside, and they only undergo deformation without any macroscopic motion. As the intensity of the electric field increases, the deformation of the droplets also intensifies. In the case of small deformation, the simulated values are consistent with the analytical values, which verifies the correctness of the numerical method. When the droplet deformation is considerable, the simulation results start to deviate from the theoretical values, which is consistent with the experimental observations. In addition, the effect of the change in conductivity on droplet deformation is also apparent, while the evolution of the dielectric constant ratio has a less pronounced impact on droplet deformation.
2022, 31(11):349-357. DOI: 10.15888/j.cnki.csa.008778
Abstract:Text matching is one of the key techniques in natural language understanding, and its task is to determine the similarity of two texts. In recent years, with the development of pre-trained models, text-matching techniques based on pre-trained language models have been widely used. However, these text matching models still face the challenges of poor generalization ability in a particular domain and weak robustness in semantic matching. Therefore, this study proposes an incremental pre-training and adversarial training method for low-frequency words to improve the effect of the text matching model. The incremental pre-training of low-frequency words in the source domain helps the model migrate to the target domain and enhances the generalization ability of the model. Additionally, various adversarial training methods for low-frequency words are tried to improve the model’s adaptability to word-level perturbations and the robustness of the model. The experimental results on the LCQMC dataset and the text-matching dataset in the real estate domain indicate that incremental pre-training, adversarial training, and the combination of the two approaches can significantly improve the text matching results.
2022, 31(11):358-364. DOI: 10.15888/j.cnki.csa.008774
Abstract:Lattice quantum chromodynamics (Lattice QCD) is an important theory and method to study the interaction between microscopic particles such as quarks and gluons. By discretizing the spacetime into a four-dimensional structural grid and defining the basic field quantity of QCD on the grid, researchers can use a numerical simulation method to study hadron interactions and properties from the first principle. However, the computation in this process is time-consuming, and large-scale parallel computing is required. The fundamental module of the Lattice QCD computation is the Lattice QCD solver which is the main hot spot of the program running. This work studies the realization and optimization of Lattice QCD solver from a domestic heterogeneous computing platform and proposes a design method of Lattice QCD solver, which realizes BiCGSTAB solver and significantly reduces the iteration numbers. With the odd/even pre-processing technology, the study reduces the computing scale of the problem and optimizes the Dslash module’s memory access in terms of the characteristics of a domestic heterogeneous accelerator. Experimental tests show that the speedup ratio of the solver is about 30 times higher than that of the unoptimized one, which provides a useful reference for the performance optimization of Lattice QCD software of domestic heterogeneous supercomputers.
2022, 31(11):365-372. DOI: 10.15888/j.cnki.csa.008790
Abstract:Monocular depth estimation is a fundamental problem in computer vision, and the patch-match and plane-regularization network (P2Net) is one of the most advanced unsupervised monocular depth estimation methods. As the nearest neighbor interpolation algorithm, the upsampling method adopted by the depth prediction network of P2Net, has a relatively simple calculation process, the predicted depth maps have a poor generation quality. Therefore, the residual upsampling structure based on multiple upsampling algorithms is constructed in this study to replace the upsampling layer of the original network for more feature information and higher integrity of the object structure. The experimental results on the NYU-Depth V2 dataset reveal that compared with the original network, the improved P2Net based on the transposed convolution, bilinear interpolation, and PixelShuffle can reduce the root mean square error (RMSE) by 2.25%, 2.73%, and 3.05%, respectively. The residual upsampling structure in this study improves the generation quality of the predicted depth maps and reduces the prediction error.
2022, 31(11):373-379. DOI: 10.15888/j.cnki.csa.008766
Abstract:Probabilistic matrix factorization model, making personalized item recommendations according to a user’s historical interaction information, is one of the classic methods in collaborative filtering. Under the assumption of the traditional matrix factorization model, the similarities among different users cannot be used, and prediction is often inaccurate when outliers occur. A clustering-based probabilistic matrix factorization model with category-related conjugate prior distribution is built with user clustering information. Its parameters are regularized by changing the form of the conjugate prior distribution. Through variational inference, the explicit expressions of variational parameters are theoretically derived, and corresponding rating prediction algorithms are thereby established. Both simulation and real datasets show that the prediction performance of the proposed model is better than that of the benchmark model, and it can provide realistic explanations for users’ rating behavior.
2022, 31(11):380-386. DOI: 10.15888/j.cnki.csa.008797
Abstract:The core of server cache performance is the cache replacement strategy which directly affects the cache hit ratio. Web cache can solve the problems of network congestion and user access delay and improve server performance. A multi-cache replacement strategy based on spectral clustering is proposed because of the low cache hit ratio of traditional cache replacement algorithms. The strategy uses the circular sliding window mechanism to extract multiple temporal features and access attributes of log files and conducts cluster analysis on the filtered data set through spectral clustering to obtain access prediction results. Multi-cache replacement strategy takes into account the local frequency, global frequency, and resource size of the cache object to eliminate the low-value resources and retain the high-value resources. In comparison with traditional replacement algorithms such as LRU, LFU, RC, and FIFO, the experimental results show that the combination of spectral clustering and multi-cache replacement strategy in this study can effectively improve the cache request hit ratio and byte hit ratio.
2022, 31(11):387-392. DOI: 10.15888/j.cnki.csa.008645
Abstract:As the network expands, the exact algorithm of closeness centrality has low efficiency. In this study, a model based on the learning to rank algorithm (RankNet) is proposed to quickly approximate the closeness centrality rank of complex network nodes. Firstly, the study carries out a correlation analysis to obtain important node indicators positively correlated with the closeness centrality and put them as input features of the model. Subsequently, a subset of nodes in a given network is randomly selected and used for the training sample data of the model. The proposed model is verified by a real aviation network dataset and typical complex network models. The experimental results show that the RankNet-based model not only reduces the computational complexity but also keeps a high accuracy of the approximation. In addition, the ranking performance of the proposed model is significantly superior to that of the benchmark model based on regression learning.
2022, 31(11):393-399. DOI: 10.15888/j.cnki.csa.008819
Abstract:In the research of bird sound recognition, the selection of sound features has a great impact on the accuracy of recognition and classification. To improve the accuracy of bird sound recognition, this study starts with the problem that the traditional Mel frequency cepstral coefficient (MFCC) characterizes the high-frequency information in bird sound insufficiently. Feature fusion of MFCC based on Fisher criterion and inverted MFCC (IMFCC) is proposed to obtain a new feature parameter MFCC-IMFCC that can be applied to bird sound recognition to improve the characterization of the high-frequency information in bird sound. Meanwhile, the penalty factor C and the kernel parameter g in the support vector machine (SVM) are optimized by a genetic algorithm (GA), and a GA-SVM classification model is trained. Experiments show that under the same conditions, the recognition rate of the MFCC-IMFCC approach is higher than that of the MFCC one.