2023, 32(7):1-10. DOI: 10.15888/j.cnki.csa.009119
Abstract:Source code summarization is designed to automatically generate precise summarization for natural language, so as to help developers better understand and maintain source code. Traditional research methods generate source code summaries by using information retrieval techniques, which select corresponding words from the original source code or adapt summaries of similar code snippets; recent research adopts machine translation methods and generates summaries of code snippets by selecting the encoder-decoder neural network model. However, there are two main problems in existing summarization generation methods: on the one hand, the neural network-based method is more friendly to the high-frequency words appearing in the code snippets, but it tends to weaken the processing of low-frequency words; on the other hand, programming languages ??are highly structured, so source code cannot simply be treated as serialized text, or otherwise, it will lead to loss of contextual structure information. Therefore, in order to solve the problem of low-frequency words, a retrieval-based neural machine translation approach is proposed. Similar code snippets retrieved from the training set are used to enhance the neural network model. In addition, to learn the structured semantic information of code snippets, this study proposes a structured-guided Transformer, which encodes structural information of codes through an attention mechanism. The experimental results show that the model has significant advantages over the deep learning model generated by the current cutting-edge code summarization in processing low-frequency words and structured semantics.
2023, 32(7):11-22. DOI: 10.15888/j.cnki.csa.009153
Abstract:The registration technology of medical three-dimensional (3D) images (such as CT, MRI, etc.) and two-dimensional (2D) images (such as X-ray) has been widely used in clinical diagnosis and surgical planning. The essence of medical image registration is to use an optimization algorithm to find some kind of spatial transformation so that two images are aligned in space and structure. Usually, the registration quality is low in the process of registration due to the problem that the optimization algorithm is not accurate and easy to fall into the local extremum. In order to solve this problem, an improved equilibrium optimizer based on the Logistic-Tent chaos map and Levy flight (LTEO) is proposed. First, in order to solve the problem that the population initialization is easy to be unevenly distributed, and the randomness is too high, the Logistic-Tent chaotic map is introduced to initialize the population, increase the diversity of the population, and make them distribute in the search space as much as possible; second, the iterative function is updated to make the optimization algorithm pay more attention to the global search, improve the convergence speed of the algorithm, and help to find the global optimum solution; third, Levy flight strategy is introduced to disturb the stagnant particles and thus prevent the algorithm from falling into local extremum. Finally, LTEO is used for 2D/3D medical image registration tasks, and the frequent transmission of data in the registration process is optimized to reduce the time consumption of registration. The algorithm is verified by benchmark function tests and clinical registration experiments. The LTEO can effectively improve optimization accuracy and stability and enhance the quality of medical image registration.
2023, 32(7):23-34. DOI: 10.15888/j.cnki.csa.009186
Abstract:Hyperspectral images have multiple bands and a strong correlation between bands, but their spatial texture and geometric information are poorly expressed. The traditional classification model has insufficient extraction of spatial spectral features and large calculation, and its classification performance needs to be improved. To solve this problem, a multi-scale and multi-resolution attention feature fusion convolution network (WTCAN) based on the wavelet transform is proposed. The concept of wavelet transform is applied to decompose the spectral band four times, and the hierarchical extraction of spectral features can reduce the calculation amount. The network has designed the spatial information extraction module and introduced the pyramid attention mechanism. By designing the reverse jump connection network structure, it uses multiple scales to obtain the spatial position features and enhances the expression ability of spatial texture, which can effectively improve the defects of traditional 2D-CNN feature extraction, such as single scale and the ignoring of spatial texture details. The proposed WTCAN model is experimented on the hyperspectral datasets with different spatial resolutions—Indian Pines (IP), WHU_Hi_HanChuan (HanChuan), and WHU_ Hi_ HongHu (HongHu) repectively. By comparing the effects of SVM, 2D-CNN, DBMA, DBDA, and HybridSN models, the WTCAN model achieves excellent classification results. The overall classification precision of the three datasets reaches 98.41%, 99.64%, and 99.67% respectively, which can provide a valuable reference for the research on the classification of hyperspectral images.
2023, 32(7):35-46. DOI: 10.15888/j.cnki.csa.009193
Abstract:Currently, the physiological signals in the classification of acrophobia emotions mainly involve electroencephalogram (EEG), electrocardiogram (ECG), and skin electromyography (EMG). However, due to the limitations of EEG acquisition and processing as well as the fusion between multimodal signals, a dynamic weighted decision fusion algorithm based on six peripheral physiological signals is proposed. Firstly, the different levels of acrophobia are induced in the subjects through the virtual reality technology, while six peripheral physiological signals (ECG, BVP, EMG, EDA, SKT, and RESP) are recorded. Secondly, the statistical and event-related features of the signals are extracted to construct a dataset of acrophobia emotions. Thirdly, a dynamic weighted decision fusion algorithm is proposed according to the classification performance, modal, and cross-modal information, so as to effectively integrate multi-modal signals to improve the recognition precision. Finally, the experimental results are compared with previous relevant research, and then verified on the open-source WESAD emotion dataset. The conclusions show that multi-modal peripheral physiological signals are conducive to enhancing the classification performance of acrophobia emotions, and the proposed dynamic weighted decision fusion algorithm significantly improves both the classification performance and model robustness.
2023, 32(7):47-56. DOI: 10.15888/j.cnki.csa.009192
Abstract:The dialogue system that introduces structured knowledge has attracted widespread attention as it can generate more fluent and diverse dialogue replies. However, previous studies only focus on entities in structured knowledge, ignoring the relation between entities and the integrity of knowledge. In this study, a knowledge-aware conversation generation (KCG) model based on the graph convolutional network is proposed. The semantic information of the entity and relation is captured by the knowledge encoder and the representation of the entity is enhanced by the graph convolutional network. Then, the knowledge selection module is applied to obtain the knowledge selection probability distribution of the entities and relations related to the dialogue context. Finally, the knowledge selection probability distribution is fused with the vocabulary probability distribution so that the decoder can select the knowledge or words. In this study, the experiments are conducted on DuConv, a Chinese public data set. The results show that KCG is superior to the current baseline model in terms of automatic evaluation metrics and can generate more fluent and informative replies.
2023, 32(7):57-64. DOI: 10.15888/j.cnki.csa.009152
Abstract:Performance bugs are defects in codes that slow down program execution. Existing detection tools can only find certain types of performance bugs and require complex program analysis processes. Therefore, they lack generality and need high costs in space and time. Meanwhile, many classical clone detection techniques have been used for general similar code detection, but they can only detect highly similar codes or rely on training datasets, which makes them inapplicable for detecting performance bugs in real-world datasets. To this end, this study proposes a method of using clone detection techniques to find multiple types of performance bugs by constructing code templates with labeled tokens. By labeling tokens with different weights according to their types and frequencies, this method can distinguish tokens’ importance and thus extract key information from codes. Experimental results on real-world projects show that this method can find more types of performance bugs and consume less time than existing tools. Another experiment also proves that this method significantly improves the detection capability of token-based clone detection techniques and is more suitable for finding performance bugs than existing clone detection techniques.
2023, 32(7):65-74. DOI: 10.15888/j.cnki.csa.009057
Abstract:In recent years, due to the rapid development of artificial intelligence in the medical field, the demand for medical images from researchers has been increasing day by day. These medical images often need to be finely annotated before being put into use. Compared with natural images, the data annotation of medical images is more specialized and complex. Therefore, medical images face the problems of low annotation rate and high annotation cost, resulting in the scarcity of labeled samples. Fundus images, as an important medical image, can achieve the screening and primary diagnosis of most ophthalmic diseases such as diabetic retinopathy and glaucoma, but they also face some difficulty in annotation. To address this situation, this study designs and develops an efficient semi-automated annotation system for fundus images, which is innovative in that it can perform semi-automated annotation of multiple eye diseases. Various diseases are predicted based on the fundus images, and the types of prediction results include disease classification and lesion segmentation. The annotator only needs to review and modify the generated prediction results, and this process can greatly reduce the workload of the annotator. In addition, the system includes four modules: user management, project management, image management, and algorithm model management. These four modules enable task assignment in team annotation, visualization of annotation progress data, quick export of annotation results, and other user-friendly functions. The system greatly improves the annotation efficiency and experience of annotators.
2023, 32(7):75-83. DOI: 10.15888/j.cnki.csa.009157
Abstract:In the current research on multi-intention recognition models of natural language processing, information flow is only modeled from intention to slot, and the research on the interactive modeling of information flow from slot to intention is ignored. In addition, the task of intention recognition is easy to be confused, and other intention information is wrongly captured. The quality of contextual semantic feature extraction is poor and needs to be improved. In order to solve these problems, this study optimizes the current advanced typical GL-GIN (global-locally graph interaction network) model, explores the interactive modeling method from slot to intention, and uses the one-way attention layer from slot to intention. Furthermore, the study calculates the attention score from slot to intention, incorporates the attention mechanism, and uses the attention score from slot to intention as the connection weight. As a result, it can propagate and gather intention-related slot information and make the intention focus on the slot information that is relevant to it, so as to realize the bidirectional information flow of the multi-intention recognition model. At the same time, the BERT model is introduced as the coding layer to improve the quality of semantic feature extraction. Experiments show that the effect of this interactive modeling method is significantly improved. Compared with that of the original GL-GIN model, the overall accuracy of the new model on two public datasets (MixATIS and MixSNIPS) is increased by 5.2% and 9%, respectively.
2023, 32(7):84-94. DOI: 10.15888/j.cnki.csa.009167
Abstract:The ground images obtained by the unmanned aerial vehicle (UAV) platform have a high spatial resolution, but they also bring a lot of “interference” to crop classification while providing rich details. In particular, when depth models are used for crop recognition, there are problems such as insufficient edge information extraction and misclassification of similarly textured crops, which results in a poor classification effect. Therefore, a model is constructed by the idea of multi-scale attention feature extraction to effectively extract edge information and improve the accuracy of crop classification. The proposed multi-scale attention network (MSAT) obtains crop information on different scales at the same level through multi-scale block embedding. The multi-scale feature map is mapped into multiple sequences that are fed into the factor attention module independently, which enhances the attention to crop contexts and improves the model’s extraction ability of plot edge information. Moreover, the built-in convolutional relative position encoding of the factor attention module enhances the modeling of local information inside the module and the ability to distinguish similarly textured crops. Finally, the thickness information is extracted upon the fusion of local features and global features. The classification results of rice, sugarcane, corn, bananas, and oranges show that the mean intersection over union (MIoU) and overall accuracy (OA) of the MSAT model reach 0.816 and 98.10%, respectively, which verifies that the fine crop classification method based on high-resolution images is feasible, and the equipment cost is low.
2023, 32(7):95-104. DOI: 10.15888/j.cnki.csa.009178
Abstract:In the Sloan digital sky survey (SDSS), the current object detection algorithm is inefficient in the detection of small-scale astronomical objects due to interference from large and bright astronomical objects. To address this issue, a small-scale astronomical object detection method based on Mask-GAN and improved YOLOv3 is proposed. The method is executed in two steps. The first step is to mask the interfering astronomical objects. A Mask construction algorithm for interfering astronomical objects is designed, which extracts the interfering objects by adaptive threshold segmentation and connectivity domain analysis, and the Mask is constructed by the method of fusing the features of band regions to avoid halo residue and excluding adjacent objects to avoid segmentation errors. Then, a GAN model is built, which is combined with the Mask of interfering astronomical objects to complete the interference masking task. The second step is to input the processed data into the improved YOLOv3 model for small-scale astronomical object detection. C-EfficientNet with an attention mechanism is built as the backbone network of the improved YOLOv3 to strengthen the feature extraction capability and increase the network’s attention to objects. Meanwhile, four effective feature layers are extended, and the method SAt is proposed to increase the weight of shallow feature maps so that the network can better use high-resolution shallow features with more details to detect small-scale astronomical objects. Experiments and analysis show that the average accuracy of the method in detecting small-scale stars and galaxies on the SDSS astronomical dataset reaches 81.16% and 77.89%, respectively, The proposed detection method is better than the classic one and is of certain practical application significance.
2023, 32(7):105-112. DOI: 10.15888/j.cnki.csa.009168
Abstract:Considering the problems caused by insufficient attention to receptive field scale and inadequate extraction of feature channel information in existing super-resolution reconstruction models for optical remote sensing images, this study proposes a new super-resolution reconstruction model for optical remote sensing images, which is based on multi-scale feature extraction and coordinate attention. On the basis of the deep residual network structure, some cascaded multi-scale feature & coordinate attention blocks (MFCABs) are designed in the high-frequency branch of the network to fully explore the high-frequency features of the input low-resolution images. Firstly, the Inception submodule is introduced into MFCABs to capture spatial features under different receptive fields by convolution kernels of different scales. Secondly, the coordinate attention submodule is added after the Inception submodule, and attention is paid to the channel and coordinate dimensions to obtain a better channel attention effect. Finally, the features extracted by each MFCAB are fused in multiple paths to realize the effective fusion of multi-scale spatial information and multi-channel attention information. In the double and triple magnification of the MFCAB model on the NWPU4500 dataset, the PSNR reaches 34.73 dB and 30.12 dB, respectively, which is 0.66 dB and 0.01 dB higher than EDSR. In the double, triple, and quadruple magnification of the model on the AID1600 dataset, the PSNR reaches 34.71 dB, 30.58 dB, and 28.44 dB, respectively, which is 0.09 dB, 0.03 dB, and 0.04 dB higher than EDSR. The experimental results show that the reconstruction effect of this model on the optical remote sensing image datasets is better than the mainstream super-resolution image reconstruction model.
2023, 32(7):113-120. DOI: 10.15888/j.cnki.csa.009158
Abstract:Encryption and dynamic port technology make the traditional traffic classification technology fail to meet the performance requirements of online game identification. In this study, an end-to-end traffic classification model based on auto-encoder dimension reduction is proposed to accurately identify online game traffic. First, the original traffic is preprocessed into a one-dimensional session flow quantity of 784 B, and the encoder is used for unsupervised dimension reduction and removing invalid features. Then, the parallel algorithm of the convolutional neural network and LSTM network is explored and constructed to extract and fuse spatial and temporal features of samples after dimension reduction. Finally, the fusion features are used for classification. When tested on the self-built game traffic dataset and the open dataset, the proposed model achieves an accuracy rate of 97.68% in online game traffic identification. Compared with the traditional end-to-end network traffic classification model, the model designed in this study is more lightweight and practical and can be easily deployed on devices with limited resources.
2023, 32(7):121-128. DOI: 10.15888/j.cnki.csa.009112
Abstract:Human pose estimation based on deep learning is widely used in pose recognition, human-computer interaction, and other fields. In order to improve the detection accuracy of key points of the human body, many networks adopt a model architecture with increasing calculation amount, parameter amount, and complexity, which is impossible to be directly deployed to low-computing devices. To solve the above issues, this study proposes a lightweight method for multi-branch feature attention fusion. The model is based on the HigherHRNet network for lightweight design and training. Specifically, channel splitting and channel shuffling are adopted to solve the information isolation between feature layers after group convolution; the feature generation method of linear operation is used to address the redundancy between different feature layers; the method of fusing attention information is employed to alleviate the accuracy drop caused by lightweight. The training, testing, visualization, and ablation experiments of the model are completed on the MS COCO dataset. The experimental results show that the lightweight method in this study can significantly reduce the calculation amount of human pose estimation under the premise of ensuring intuitive detection accuracy.
2023, 32(7):129-137. DOI: 10.15888/j.cnki.csa.009176
Abstract:Integrated avionics system is an important feature of the new generation of aircraft, and its reliability and stability play a decisive role in the flight and safety of the entire aircraft. As the avionics system should possess high reliability, a distributed cluster redundancy architecture is proposed, and the corresponding redundancy management scheme is designed to tolerate Byzantine errors that may occur after avionics system failure and effectively improve the reliability and fault tolerance of fault-tolerant computers. The proposed redundancy management scheme is optimized by the two schemes of threshold signature and cluster selection to reduce the communication overhead between redundancy computers in the cluster, avoid affecting the real-time performance of the avionics system, and improve the redundancy management efficiency. Through simulation experiments, the results verify that the distributed cluster redundancy management scheme can effectively improve the reliability of the avionics system and enhance Byzantine resilience. Meanwhile, in an n-redundancy avionics system, the system can still operate correctly as long as the number of Byzantine nodes is less than n/3, and the optimization scheme has lower communication and computing costs.
2023, 32(7):138-144. DOI: 10.15888/j.cnki.csa.009171
Abstract:An intelligent robot depalletizing system based on visual positioning is designed to solve the problem that the traditional teaching and playback robot can only perform depalletizing tasks with given positions and fixed trajectories and thus is limited to fixed scenes. The system uses the coordinate transformation of the target pixel center to obtain the corresponding world coordinates. For the problem that the eye-in-hand camera may lead to the inaccurate rotation angle of the target obtained by the image processing algorithm due to the deflection of the camera, it is proposed to use the extrinsic parameter coefficient of the camera to compensate for the rotation angle of the target. Moreover, a depalletization strategy is designed, and the communication guides the robot to automatically perform the depalletization task by grabbing from nearest to farthest without manual intervention. The experimental data shows that the system can grab the target with an unknown position in an unknown work scene, with a position error of 1.1 mm and an angle error of 1.2°, and the time to position the stacking layer is about 1.2 s. The system meets precision and efficiency requirements for depalletizing robots in the industrial scenes.
2023, 32(7):145-154. DOI: 10.15888/j.cnki.csa.009127
Abstract:In construction sites, many high fall accidents have occurred, so it is necessary to wear helmets. An improved algorithm based on YOLOX-s is proposed to deal with missing and omitted detection of small target samples encountered in helmet-wearing condition detection. First, the 160×160 feature layer in the Neck layer is introduced in the backbone feature extraction network for feature fusion, and a detection head for small targets is added; second, the SIoU loss function is used to calculate the loss value, which makes the loss term considered in the training process of the network more comprehensive, and the varifocal loss function is used to calculate the loss value of the confidence level to further reduce the imbalance of the positive and difficult samples in the training process; finally, coordinate attention (CA) mechanism is used to enhance the feature representation of the model. The experimental results show that the optimization of the Neck layer, detection layer, and loss function and the introduction of the CA mechanism lead to better convergence and regression performance of the network during the training process. The mAP value of the improved algorithm is 95.57%, which is 17.11% and 3.59% higher than that of YOLOv3 and the original YOLOX-s algorithm, respectively. The detection speed of the improved algorithm is 54.73 frames/s, which meets the real-time detection speed requirement.
2023, 32(7):155-162. DOI: 10.15888/j.cnki.csa.009132
Abstract:Since it is difficult to assign weights to the importance of safety risk factor indicators in the process of safety risk assessment of informatization systems, this study proposes a safety risk assessment model based on improved D-S evidence theory and fusion weight set with a construction site as the application scenario. Firstly, the safety risk assessment process and elements of the construction site are fully studied, and a safety evaluation system for the construction site is established. Secondly, the D-S synthesis algorithm based on weight assignment and matrix analysis is used to improve the analytic hierarchy process (AHP) method, and the entropy weight method based on data is adopted to calculate the subjective and objective weights of each indicator in the index layer of the evaluation system. Thirdly, the improved D-S evidence fusion algorithm is used to synthesize multi-source evidence to obtain the indicator weights, so as to avoid the one-sidedness of a single assignment and get the optimal comprehensive weight. Finally, the comprehensive evaluation index of the construction site is calculated according to the TOPSIS evaluation algorithm. The analysis shows that the safety risk assessment model based on the improved D-S evidence theory and fusion weight set can effectively assess the safety of construction sites, reduce the uncertainty of assessment results, and improve the credibility of risk assessment results.
2023, 32(7):163-170. DOI: 10.15888/j.cnki.csa.009130
Abstract:In order to effectively solve the problem of target tracking drift or loss in the face of large-scale deformation, complete occlusion, background interference, and other complex scenes, a multi-branch Siamese network target tracking algorithm (SiamMB) is proposed. First, the method of enhancing the network robustness of adjacent frame branches is used to improve the discrimination ability of target features in the search frame and strengthen the robustness of the model. Secondly, the spatial attention network is fused, and different weights are applied to the features of different spatial positions. In addition, the features that are beneficial to target tracking in spatial positions are emphasized, so as to improve the discriminability of the model. Finally, evaluation is performed on OTB2015 and VOT2018 datasets, and the results show that the tracking accuracy and success rate of SiamMB reach 91.8% and 71.8%, respectively, which makes SiamMB more competitive than the current mainstream tracking algorithms.
2023, 32(7):171-178. DOI: 10.15888/j.cnki.csa.009177
Abstract:Adding specific perturbations to images can help generate adversarial samples that mislead deep neural networks to output incorrect results. More powerful attack methods can facilitate research on the security and robustness of network models. The attack methods are divided into white-box and black-box attacks, and the transferability of adversarial samples can be used to attack other black-box ones by the results generated by known models. Attacks based on linear integrated gradients (TAIG-S) can generate highly transferable adversarial samples, but they are affected by noise in the linear path, superimposing pixel gradients that are irrelevant to the prediction results, which limits the success rate of attacks. With guided integrated gradients, the proposed Guided-TAIG method uses adaptive adjustment to correct some pixel values with low absolute values on each segment of the integrated path calculation and finds the starting point of the next step within a certain interval, circumventing the accumulation of meaningless gradient noise. The experiments on the ImageNet dataset show that Guided-TAIG outperforms FGSM, C&W, and TAIG-S for white-box attacks on both CNN and Transformer architecture models, produces smaller perturbations, and has better performance for transferable attacks in the black-box mode. This demonstrates the effectiveness of the proposed method.
2023, 32(7):179-187. DOI: 10.15888/j.cnki.csa.009179
Abstract:As YOLOv3, an algorithm widely used in the field of remote sensing target detection, has insufficient feature expression ability for small targets and a poor detection effect, an improved YOLOv3 algorithm for small target detection is proposed. Firstly, the global context (GC) attention mechanism is introduced, and the feature extraction network and feature pyramid networks (FPN) are improved to enhance the small-target feature extraction ability and detection ability of the model. Secondly, single-scale Retinex (SSR) fusion feature enhancement is applied to the dataset to improve the model’s learning effect of small target features. Finally, the adaptive anchor box optimization (AABO) algorithm is adopted to optimize anchors and better match anchors and targets. The experimental results on the remote sensing dataset RSOD show that the mean average precision (mAP) of the proposed algorithm is 92.5%, which is improved by 10.1% compared with that of the classic YOLOv3 algorithm, and the detection effect of small remote sensing targets is significantly improved.
2023, 32(7):188-194. DOI: 10.15888/j.cnki.csa.009089
Abstract:Intelligent protection for rail vehicles involves the tasks of railway track intrusion detection and driving area segmentation. In the field of deep learning, there are algorithms for each task, but they cannot meet the needs of multi-task situations very well. This algorithm uses a lightweight convolution neural network (CNN) as an encoder to extract the feature map and then sends it to two decoders based on one-stage detection network to complete their respective tasks. Semantic features of different levels and scales are fused in the feature map output by the encoder, which performs pixel-level semantic prediction well and improves the detection and segmentation performance significantly. The equipment using this algorithm will master the recognition, detection, judgment, and tracking of new targets, ensuring the traveling safety of rail vehicles.
2023, 32(7):195-201. DOI: 10.15888/j.cnki.csa.009169
Abstract:The processing of device tasks in the industrial Internet requires a large amount of computing resources, and the tasks with low latency requirements have increased significantly. Edge computing places computing power and other resources on the side close to the demand to provide effective support for task processing. However, due to the limited edge computing resources, the requirements of low latency and high completion rate of the device tasks cannot be satisfied at the same time. It is still a great challenge to determine a reasonable offloading decision and task scheduling. Given the above problem, a deep learning-based dynamic priority task scheduling algorithm DPTSA is proposed in this study. Firstly, the tasks to be processed are selected according to dynamic priority and task scheduling decisions are generated through neural networks. Then, a set of feasible solutions are generated through cross-variance and other operations, and the optimal solutions are screened out and stored in the empirical buffer area. Finally, the neural network parameters are optimized through the empirical buffer samples. The experimental results based on Google’s Brog task scheduling dataset show that DPTSA is superior to the four benchmark algorithms in terms of task waiting time and task completion rate.
2023, 32(7):202-210. DOI: 10.15888/j.cnki.csa.009163
Abstract:To improve the identification accuracy of ordinary neural convolutional networks for tomato leaf disease, a new network based on the multi-scale fusion attention mechanism (MIPSANet) is proposed. The lightweight network is used as the main framework to reduce the network parameters in this network. To increase the depth and width of the network, the Inception structure is added to extract multi-scale feature information of data. Meanwhile, a more elaborate dual attention mechanism, polarized self-attention (PSA), is used in this process as a plug-and-play module to be embedded in the whole model, which improves the expressive power of important feature points. The lightweight PSA modules are also suitable for this model. A full connection layer is added after the convolution for classification. The proposed MIPSANet is applied to conduct experiments on Kaggle public dataset, tomato leaves dataset, with 30 batches of training, achieving an accuracy rate of 91.05%. The results show that this network is strikingly effective in the classification of tomato leaf diseases compared with other networks, which provides some reference value for the network structure and parameter configuration of the classification network.
2023, 32(7):211-218. DOI: 10.15888/j.cnki.csa.009173
Abstract:In the robot visual navigation task of the indoor environment, the detection of the drivable area is an indispensable part, which is the basis for ensuring the realization of the autonomous driving task. At present, many solutions are to detect the drivable area by identifying obstacles in the dataset, which lacks flexibility. Therefore, a drivable area detection method for indoor flat ground such as subway stations is proposed in this study to improve practicability. The classic MobileNetV3 network is applied to classify the collected front images and determine whether they are ground areas. Due to the influences of stickers such as landmarks and arrows on the indoor floor, it is necessary to further judge the non-ground area and distinguish it from conventional three-dimensional obstacles. In this study, the feature point matching between successive frames is adopted to obtain the camera moving distance, and the method of calculating the slope by straight line fitting is used to distinguish between three-dimensional obstacles and plane landmarks. Experiments show that the proposed method can better detect the drivable area in front of the robot and has high practical value.
2023, 32(7):219-225. DOI: 10.15888/j.cnki.csa.009170
Abstract:Due to the differences between development teams and the complexity, uncertainty, and dynamics of development projects, it is difficult to reasonably allocate development tasks. Considering the factors such as the uncertain capability of development teams and the uncertain operation process of development projects, the duration and cost of development projects are evaluated through simulation. The psychological factors of decision-makers are taken into account, and the prospect value of development duration and costs is calculated by the prospect theory. After that, the prospect value is taken as the fitness evaluation index, and a task-allocation optimization algorithm is constructed on the basis of NSGA-III. The case study shows that the optimization algorithms based on NSGA-III, NSGA-II, and MOEA-D can all effectively improve the allocation scheme of development tasks, and the optimization based on NSGA-III is the best.
2023, 32(7):226-239. DOI: 10.15888/j.cnki.csa.009180
Abstract:As cloud computing rapidly develops, container technology, represented by Docker, has been gradually paid attention to. At present, three common container orchestration tools are Kubernetes, Docker Swarm, and Rancher. However, when the total capacity of all working nodes exceeds the limit, the existing container orchestration tools will have problems such as long response time and large resource occupation. Therefore, the least space unused (LSD) algorithm and least recently used and space unused (LRU-SD) algorithm are designed in this study and applied to three kinds of orchestration tools. When the total capacity exceeds the upper limit, the non-working nodes are deleted and new working nodes are added. In practice, the LSD algorithm deletes the working node with the least remaining space, while the LRU-SD algorithm first considers deleting the longest unused node. When there are multiple qualified nodes, the working node with the least remaining space is deleted. In the experiment part, the impacts of different algorithms on three container orchestration tools are analyzed and compared in terms of response time, CPU, and memory. The experimental results show that the LSD algorithm, the LRU-SD algorithm, and the LRU algorithm can not only improve the response time of the orchestration tools but also increase the utilization of resources. At the same time, the LRU-SD algorithm is the most effective in improving CPU utilization.
2023, 32(7):240-250. DOI: 10.15888/j.cnki.csa.009164
Abstract:As an important load-bearing element of cable-stayed bridges, vibration testing of stay cables plays a key role in bridge health monitoring. Under ideal laboratory conditions, the traditional vibration detection algorithm with spatial phase can achieve high-accuracy measurement of structural vibration. However, in practical scenarios, environmental factors such as vehicles, wind excitation, and the angle between the cable and the ground can cause large errors in the measurement results. Therefore, the traditional algorithm is not suitable for cable vibration detection in these cases. To address this problem, this study proposes a cable vibration frequency detection algorithm based on directional adaptive complex steerable filters to precisely measure cable vibration in real scenarios. Firstly, the linear characteristics of the cable are used to detect the location of the cable and determine the main vibration direction of the cable; secondly, according to the vibration direction characteristics of the cable, a directional adaptive complex steerable filter is designed to decompose each frame of the video, so as to obtain the phase and amplitude spectra of the same direction at different scales and enhance the phase of the edge region of the cable. Finally, the spatial phase of each frame is averaged, and the phase sequences are arranged in time order to obtain the main frequency of cable vibration by Fourier transform. By comparing the results with those of acceleration sensors, it is proved that the proposed algorithm is highly robust and can meet the application requirements of bridge cable vibration measurement in real scenarios.
2023, 32(7):251-260. DOI: 10.15888/j.cnki.csa.009159
Abstract:Graph neural networks (GNNs) have attracted widespread attention due to their powerful modeling capabilities, and they are often used to solve node classification tasks on graphs. At this stage, the commonly used model with the graph convolutional network (GCN) as the core solves such problems. However, due to over-fitting and over-smoothing, the deep node embedding representation effect is not positive. Therefore, this study proposes a graph convolutional neural residual networks (GCNRN) model that combines residual connection and self-attention based on GCN kernel to improve the generalization ability of GCN. At the same time, in order to integrate more in-depth information, this study introduces a fusion mechanism, uses fuzzy integral to fuse multiple classifiers, and finally improves the model testing accuracy. In order to verify the superiority of the proposed method, this study uses ogbn-arxiv and commonly used citation datasets to conduct comparative experiments. Compared with many existing models with GCN as the core, the GCNRN model has an average improvement of node classification accuracy by 2% and avoids the traditional over-fitting and over-smoothing phenomena. In addition, the experimental results show that the multi-classifier model with the fusion module based on fuzzy integral has a better classification effect than the traditional fusion method.
2023, 32(7):261-268. DOI: 10.15888/j.cnki.csa.009175
Abstract:For a variety of crop disease and pest images, it is difficult to achieve satisfactory accuracy due to the technical problems of various diseases and pests and similar characteristics of small targets in the natural environment. In this study, a pest detection and identification model, namely YOLOv5-EB that enhances the fusion of local feature and global feature information in the natural background is proposed, and experiments are carried out on the published large-scale pest dataset IP102. The results show that the accuracy of this study is improved by five percentage points compared with the YOLOv5 model. The MLP operation of replacing channel attention in CBAM with one-dimensional convolution is introduced, which optimizes the problem that channel attention is easy to ignore the information interaction in the channel after global processing. Secondly, the Focus operation is replaced by 6×6 convolution to enhance the ability to extract pest features. The experimental results show that the average accuracy of YOLOv5-EB reaches 87% in detecting pests, which not only effectively improves the identification performance of crop pest images but also increases the detection speed compared with Faster R-CNN, EfficientDet, YOLOv3, YOLOv4, and YOLOv5 models. The study reveals that the YOLOv5-EB algorithm meets the accuracy and real-time requirements of target detection of various crop diseases and pests.
2023, 32(7):269-275. DOI: 10.15888/j.cnki.csa.009144
Abstract:Due to the complexity of social media networks, the classification of social media accounts by mono-nature homogeneous information networks causes information loss and has a negative impact on the classification results. To solve this problem, this study proposes a social media account classification method based on heterogeneous graph convolutional attention networks (HGCANA). Specifically, a heterogeneous information network of social media is constructed, and the social media features of the network are extracted. After that, the attention mechanism is introduced to classify and identify social media accounts. The HGCANA method is compared with the existing methods through experiments, and it is proved that the HGCANA method registers better performance in the effective classification of social media accounts.
2023, 32(7):276-283. DOI: 10.15888/j.cnki.csa.009166
Abstract:Graph neural networks have achieved remarkable performance in semi-supervised node classification tasks. Relevant research has shown that graph neural networks are susceptible to perturbations, and there is research studying the adversarial robustness of graph neural networks. However, gradient-based attacks cannot guarantee optimal perturbation. Therefore, an adversarial attack method based on gradient and structure is proposed to enhance the gradient-based perturbation. The method first generates candidate perturbation sets by using first-order optimization of training losses, and then it evaluates the similarity of the candidate sets. Finally, it ranks them according to the evaluation results and selects a fixed-budget modification to achieve the attack. The proposed attack method is evaluated by performing a semi-supervised node classification task on five datasets. Experimental results show that the node classification accuracy decreases significantly when only a small number of perturbations are performed, which indicates that the proposed method significantly outperforms the existing attack methods.
2023, 32(7):284-292. DOI: 10.15888/j.cnki.csa.009139
Abstract:The single target tracking algorithm for siamese networks would encounter complex scenes such as background clutter, the influence of similar objects, and occlusion, which leads to a decrease in the accuracy and success rate of the tracking system. In response, this study proposes a tracking algorithm combining the coordinate attention mechanism and template update, i.e., MobileNet coordinate attention and updating of template SiamRPN (MCUSiamRPN). On the basis of the SiamRPN algorithm, the improved MobileNetV3 is used as the feature extraction network, and the multi-layer feature information is sent to the coordinate attention module to fuse features and enrich semantic information. An adaptive template updating module is designed, which combines the initial template and the template of the current frame to estimate the best template of the next frame for template information updating. The test results on OTB100 and UAV123 data sets show that compared with the benchmark algorithm SiamRPN, the proposed one has precision improved by 5.3% and 3.7% and achieves a success rate increased by 3.7% and 5.2%, respectively, which verifies the effectiveness of the developed algorithm.
2023, 32(7):293-298. DOI: 10.15888/j.cnki.csa.009199
Abstract:Dockerfile defines a set of instructions for building container images, which instruct how the containerized applications should be built. Recent studies have shown that there are quite a lot of quality problems in Dockerfile. This study proposes a new tool, namely Dockerfile Miner (DMiner) to extract implicit rules from high-quality Dockerfile, and these rules will help to improve the quality of Dockerfile. DMiner is mainly divided into three modules, which are responsible for the collection and filtering of Dockerfile, parsing of Dockerfile, and mining and extraction of Dockerfile rules. DMiner parses Dockerfile into a unified sequential representation and uses a sequential rule mining algorithm to extract rules. This tool expands the existing Dockerfile dataset and extracts nine new rules that have not appeared in other work. A large number of experiments on real datasets show that the tool is effective and efficient.
2023, 32(7):299-304. DOI: 10.15888/j.cnki.csa.009172
Abstract:The image masking method based on semantic segmentation is often used to solve the interference problem of moving objects in three-dimensional (3D) reconstruction tasks of static scenes. However, a small number of invalid feature points will be produced when the mask is used to eliminate moving objects. To solve this problem, a method for eliminating moving objects in the dimension of feature points is proposed. The convolutional neural network is used to obtain the moving target information, and the feature point filtering module is constructed. Then, the moving target information is used to filter and update the feature point list for the complete elimination of the moving target. The ground image dataset and aerial image dataset and the processing algorithms of DeepLabV3 and YOLOv4 are used to verify the proposed method. The results show that the moving object elimination method in 3D reconstruction in the feature point dimension can completely eliminate the moving object without generating additional invalid feature points. Compared with the image masking method, the proposed method shortens the point cloud generation time by 13.36% and reduces the reprojection error by 9.93% on average.
2023, 32(7):305-311. DOI: 10.15888/j.cnki.csa.008808
Abstract:PM2.5 is an important indicator for measuring the concentration of air pollutants, and monitoring and predicting its concentration can effectively protect the atmospheric environment and further reduce the harm caused by air pollution. As automatic air quality monitoring stations are constructed on a large scale, the air quality prediction model built by traditional machine learning can no longer meet the current needs. This study proposes a Gaussian-attention prediction model based on the multi-head attention mechanism and Gaussian probability estimation and utilizes the data from a monitoring station in Shenyang for training and tests. Because PM2.5 concentration is affected by other air quality data, this model uses the information alignment of hierarchical time stamps (week, day, and hour) of air quality data as input and extracts the time-series correlation features of different subspaces with the multi-head attention mechanism. More complete and effective feature information is thereby obtained, and prediction results are then acquired by Gaussian likelihood estimation. A comparison with multiple benchmark models is conducted, and the mean squared error (MSE) and mean absolute error (MAE) of the proposed Gaussian-attention prediction model are respectively 21% and 15% lower than that of the DeepAR model. Effectively improving prediction accuracy, the proposed model can accurately predict PM2.5 concentration.