YAN Yue , GUO Xiao-Ran , WANG Tie-Jun , RAO Qiang , WANG Kai-Jie
2023, 32(8):1-18. DOI: 10.15888/j.cnki.csa.009208 CSTR:
Abstract:The question answering (Q&A) system is one of the promising research directions in the field of artificial intelligence and natural language processing. Early Q&A systems can only ask and answer in the form of natural language. In recent years, with the development of multimodal knowledge graphs and multimodal pre-training models, generalized Q&A systems supporting information queries of multiple modes such as text, image, audio, and video have gradually become a new research hotspot, and their display of results in a multimedia manner is more intuitive and comprehensive. This study classifies Q&A systems into three types according to their changing task objects: dedicated Q&A systems, general Q&A systems, and multimodal Q&A systems. The problems faced in the development of these three types of Q&A systems are analyzed, and the key technologies and methods used in each stage are highlighted and summarized. In addition, the industrial applications of Q&A systems are exemplified, and future research directions are prospected.
2023, 32(8):19-30. DOI: 10.15888/j.cnki.csa.009233 CSTR:
Abstract:In the direction of binocular stereo matching in computer vision, deep learning algorithms based on neural networks require scene datasets for training and have poor generalization ability. In order to address these two problems, an iterative optimization algorithm of compatible solutions of deep scenes is proposed based on the ability of neural networks to simulate functions, and the algorithm requires no training on a dataset, with binocular images supervised by each other. The algorithm uses a scene location guessing network to simulate the compatible location space of a deep scene about the current binocular image, and a mutually supervised loss function matched with this network is used to guide the network to iteratively learn on the input binocular image by gradient descent. In addition, the feasible solution in the compatible location space of the deep scene is searched, and the whole process does not require training on the dataset. Comparison experiments with CREStereo, PCW-Net, CFNet, and other algorithms on Middlebury standard dataset images show that this algorithm has an average mismatching rate of 2.52% in non-occluded regions and 7.26% in all regions, which is lower than that of the other algorithms in the comparison experiments.
LIU Fang , LIU Qi , HUANG Mei-Chen , CHANG LI-Juan , WANG Xiao-Hui , ZHAO Ling , TIAN Feng
2023, 32(8):31-41. DOI: 10.15888/j.cnki.csa.009191 CSTR:
Abstract:With the development and cross-integration of big data, 5G, artificial intelligence, CPS, cloud computing, and Internet of Things technologies, the world is developing in the direction of digitization and intelligence. Digital twin is to build a multidimensional virtual model based on the physical entity. The sensor installed on the physical entity feeds back the data in real time, combines the previous historical data and artificial intelligence technology, and finally analyzes and presents them with software. Because digital twin technology can be well integrated with and applied in many advanced concepts, such as industry 4.0, aerospace, smart city, and smart medical, it has become a popular research direction and main driving technology in many industries and has great development space in all walks of life. This study firstly expounds on the basic concept of digital twin technology, sorts out the development of digital twin technology, further clarifies the relationship between digital twin technology and CPS technology, and introduces the research status of digital twin technology. Secondly, it introduces the key technologies of the digital twin, namely multidimensional and multi-scale modeling, twin data management, and virtual presentation. Finally, the application and development direction of digital twin technology in smart factory, smart city, twin medical, and aerospace fields are discussed, and the digital twin application cases of the original stable heating furnace equipment conducted in the smart factory field by the research team in this study are introduced from the perspectives of scheme, characteristics, and key technologies.
GENG Lei , QI Ting-Ting , ZHANG Fang , XIAO Zhi-Tao , LI Yue-Long
2023, 32(8):42-53. DOI: 10.15888/j.cnki.csa.009183 CSTR:
Abstract:The intelligent recognition of infant facial expressions can help caregivers to better pay attention to the physical and mental health of infants. Due to the smooth facial lines and weak sharpness of facial features, the inter-class similarity of infants’ facial expressions is higher than that of adults. To address the problem of high inter-class similarity, this study proposes a multi-scale information fusion network. The network is divided into two stages as a whole. In the first stage, the fusion module is applied to fuse local features with global features in the dual dimensions of both spatial and channel domains to enhance the expression ability of features. In the second stage, the self-adaptive deep centre loss is employed to estimate the weights of fused features based on the attentional mechanism, thus guiding the center loss and promoting the intra-class compactness and inter-class separation of infant expression features. The experimental results show that the multi-scale information fusion network achieves a recognition accuracy of 95.46% in the infant facial expressions dataset, reaching 99.07%, 95.88%, and 95.89% in the three evaluation metrics of AUC, recall, and F1 score respectively. The recognition effectiveness is optimal compared with the existing facial expression recognition networks. The generalization experiments of the multi-scale information fusion network are conducted on the public facial expressions dataset, with an accuracy of 89.87%.
LIN Kai , CHEN Yi-Hua , YU Song-Sen
2023, 32(8):54-66. DOI: 10.15888/j.cnki.csa.009184 CSTR:
Abstract:The existing makeup transfer algorithms are highly effective with rich features, but they seldom take into account the scenarios of the low-resolution input images. When high-resolution images are difficult to obtain, it will be difficult for the existing makeup transfer algorithms to apply and the makeup cannot be fully transferred. In this study, a makeup transfer algorithm applied to low-resolution images is proposed, which uses the feature matrix containing makeup information as prior information and combines the super-resolution network with the makeup transfer network to produce the synergistic effect. The high-resolution makeup transfer results can be delivered even if the input image is a low-resolution one, and the robustness of postures and expressions is improved while the makeup details are fully retained. Since an end-to-end model is adopted to achieve the makeup transfer and super-resolution, a set of joint loss functions are designed, including generative adversarial loss, perceptual loss, cycle consistency loss, makeup loss, and mean square error loss functions. The proposed model attains an advanced level in both qualitative and quantitative experiments on makeup transfer and super-resolution.
CHEN Hai-Hua , HU Zhao-Min , ZHANG Jing-Yao , MA Yue-Hui , WANG Jia-Qi , LIU Zi-Chen
2023, 32(8):67-74. DOI: 10.15888/j.cnki.csa.009141 CSTR:
Abstract:The establishment of a standard growth model for the whole crop life cycle is necessary to guide the acquisition of the best crop “prescription” (accurate decision-making and execution of operations). The intelligent identification technology of crop fertility stages is one of the important technologies to build the standard crop growth model. Under the situation of large-scale crop cultivation in Hulun Buir Dahewan, the traditional method of collecting and identifying crop fertility phenotype data based on manual experience or a single sensor will lead to problems such as limited collection range and low identification efficiency. In order to address the above problems, a series of optimizations are proposed for the overall system. First, in the data collection stage, a complete “air-ground-human” integrated crop phenotype data collection system is proposed in this study. In addition, in the data analysis stage, an improved intelligent identification system of crop fertility stages is proposed based on different crop phenotype data. The proposed identification system can provide real-time and accurate information about the current crop fertility stages, which serves as an excellent basis for establishing a standard growth model for the whole crop life cycle.
DENG Yan , YE Xin-Rong , YU Bin , LUO Hui-Ning
2023, 32(8):75-85. DOI: 10.15888/j.cnki.csa.009188 CSTR:
Abstract:The traditional IoT device management system may have drawbacks such as easy leakage of privacy data and difficulty in grasping the device operation conditions and tracing abnormal events, which poses adverse effects on individuals, enterprises, and even society. Given these problems, the study proposed a blockchain-based IoT device autonomous control and behavior audit scheme. Through blockchain deposition technology, the device information connected to the system is anchored to the blockchain to manage the whole life cycle of the device. Moreover, based on smart contract technology, the integrated autonomous control process including data collection, analysis, and remote control of IoT devices is realized. Finally, the scheme explores the untamperable and traceable features of blockchain to audit the users’ behavior. The analysis results show that the proposed scheme has high security and strong scalability, which has the ability to build a security management architecture for the IoT systems.
LIU Xiang , HU Rui-Min , WANG Hai-Bin
2023, 32(8):86-94. DOI: 10.15888/j.cnki.csa.009182 CSTR:
Abstract:The design and realization of the AI scheduling engine platform based on Kubernetes is introduced in this paper. To tackle the problems of complex service configuration, the unbalanced utilization rate of computing resources of each node in the cluster and the high cost of system operation and maintenance in the current AI scheduling system, this study proposes a solution based on Kubernetes to implement container scheduling and service management. Combined with the requirements of the AI scheduling engine platform, the various modules of the platform are designed from such aspects as function implementation and platform architecture. At the same time, given the problem that Kubernetes cannot perceive GPU resources, Device Plugin is introduced to collect GPU information on each node in the cluster and report it to the scheduler. In addition, as priority algorithms in Kubernetes scheduling strategy only considers the resource utilization rate and balance degree of the node itself, disregarding the differences in the demand of different types of applications for node resources, priority algorithms based on Pearson correlation coefficient (PCC) is put forward. The scheduling of Pod is determined by calculating the complementary degree of container resources demand and node resource utilization rate, thus ensuring the resource balance of each node after the scheduling.
MA Pei-Xin , CHENG Yu , HOU Jian , FAN Qing-Lai
2023, 32(8):95-104. DOI: 10.15888/j.cnki.csa.009200 CSTR:
Abstract:Multi-robot collaborative navigation is currently widely used in search and rescue, logistics, and other fields. Cooperative strategy and target navigation are the main challenges faced by multi-robot collaborative navigation. To improve the cooperative navigation ability of multiple mobile robots in an unknown environment, this study proposes a new hierarchical control cooperative navigation (HCCN) strategy. The high-level target decision layer and low-level target navigation layer are applied to assign a target point to each robot, and the global path planning and local path planning algorithms are adopted to guide the agent to reach the assigned target point without collision. Experimental verification is carried out on the Gazebo platform. The results show that the proposed method can effectively solve the sparse reward problem in cooperative navigation, and the training speed can be improved by at least 16.6%. It has better robustness in different scenarios. It is expected to provide theoretical guidance for further research on multi-robot cooperative navigation and be applied to more real scenarios.
ZHANG Zhen-Yan , SU Hai , YU Song-Sen
2023, 32(8):105-115. DOI: 10.15888/j.cnki.csa.009210 CSTR:
Abstract:The current trademark sub-card processing method is to first carry out text detection, then conduct area classification, and finally split and combine different areas to form a trademark sub-card. This step-by-step processing takes a long time, and the accuracy of the final results will decrease due to the superposition of errors. Therefore, this study proposes a multi-task network model TextCls, which can improve the inference speed and accuracy of the detection and classification modules. TextCls consists of a feature extraction network and two task branches of text detection and regional classification. The text detection branch uses the segmentation network to learn the pixel classification map and then employs pixel aggregation to obtain the text boxes. The pixel classification map is mainly used to learn the information of text and background pixels. The regional classification branch subdivides regional features into Chinese, English, and graphics, focusing on learning the characteristics of different types of regions. Through the shared feature extraction network, the two branches continuously learn pixel information and regional features, and finally the precision of the two tasks is improved. To make up for the lack of text detection datasets for trademark images and verify the effectiveness of TextCls, this study collects and labels a text detection dataset trademark_text (https://github.com/kongbailongtian/trademark_text), which consists of 2000 trademark images. The results show that compared with the optimal text detection algorithm, the text detection branch of TextCls increases the accuracy rate from 94.44% to 95.16%, with the harmonic mean F1 score reaching 92.12%; the F1 score of the regional classification branch also increases from 97.09% to 98.18%.
LIU Jia , LIN Xiao , CHEN Da-Peng , XU Chuang , SHI Hao
2023, 32(8):116-125. DOI: 10.15888/j.cnki.csa.009203 CSTR:
Abstract:Currently, most augmented reality and autonomous driving applications use not only the depth information estimated by the depth network but also the pose information estimated by the pose network. Integrating both the pose network and the depth network into an embedded device can be extremely memory-consuming. In view of this problem, a method of the depth and pose networks sharing feature extractors is proposed to keep the model at a lightweight size. In addition, the depth-separable convolutional lightweight depth network with linear structure allows the network to obtain fewer parameters without losing too much detailed information. Finally, experiments on the KITTI dataset show that compared with the algorithms of the same type, the size of the pose and deep network parameters is only 35.33 MB. At the same time, the average absolute error of the restored depth map is also maintained at 0.129.
CHEN Zong-Nan , YE Yao-Guang , PAN Jia-Hui
2023, 32(8):126-132. DOI: 10.15888/j.cnki.csa.009195 CSTR:
Abstract:The current mainstream image colorization methods include traditional algorithms and deep learning methods. With the development of deep learning models, the grayscale image colorization method based on deep learning can bring better coloring effects, but there is still a loss of details and dull coloring. In order to solve these problems, in this study, the CycleGAN model is applied to the colorization of non-single-category grayscale images, so as to achieve realistic coloring effects on pictures of animals, plants, landscapes, etc. The activation function of the CycleGAN model is improved in terms of model structure, and the PReLU activation function is used in the generator to make the model easier to be trained. This study also uses PatchGAN in the discriminator to improve color details at high resolution in the image. After training on five popular categories of images from the ImageNet dataset, the model’s colorization effect on animals, plants, and landscapes is realistic. In the image evaluation index, the model is 0.603 dB higher than GAN in PSNR, which indicates an improvement of about 2.1%, and it is significantly higher than other models in SSIM, with an improvement of 5.1% in effect. From the perspective of visual perception, the pictures colored by CycleGAN have higher saturation and visual authenticity than models such as VGG and GAN. As a result, the proposed model not only solves the problem of dull coloring but also makes it easier to restore the color details in the picture and avoid the loss of details.
2023, 32(8):133-139. DOI: 10.15888/j.cnki.csa.009206 CSTR:
Abstract:Distributed denial of service (DDoS) attack is a major threat in the field of network security. As a new type of network architecture, the logic centralization and programmability of software defined networking (SDN) provide new ideas for defending against DDoS attacks. This study designs and implements a lightweight DDoS attack detection and mitigation system in SDN. The system uses the entropy detection method and judges the abnormality through the dynamic threshold. If the dynamic threshold is abnormal, the system will use a more accurate decision tree model for detection. Finally, the controller determines the attack source by calculating the packet symmetry rate of the flow and delivers the blocking flow entry. The experimental results show that the system can respond to DDoS attacks in time. It has a high detection success rate and can effectively contain attacks.
SHENG Bei-Na , PAN Xu-Dong , ZHANG Mi
2023, 32(8):140-150. DOI: 10.15888/j.cnki.csa.009207 CSTR:
Abstract:Open-sourced datasets accelerate the development of deep learning, while unauthorized data usage frequently happens. To protect the dataset copyright, this study proposes the dataset watermarking algorithm. The watermark is embedded into the dataset before it is released. When the model is trained on this dataset, the watermark is attached to the model, which allows illegal dataset usage to be traced by verifying whether the watermark exists in a suspect model. However, existing dataset watermarking algorithms cannot provide effective and covert black-box verification under small perturbations. Given this problem, the method of embedding the watermark by a style attribute independent of the image content and label is proposed for the first time in this study, and the perturbation on the original dataset is constrained to avoid the modification of labels. The covertness and validity of the watermark are ensured without introducing the inconsistency between the image content and label or extra surrogate model. In the watermark verification stage, only the prediction results of the suspected model are applied to give the judgment via a hypothesis test. The proposed method is compared with the existing five methods on the CIFAR-10 dataset. The experimental results validate the effectiveness and fidelity of the proposed algorithm. Besides, the ablation experiments conducted in this study verify the necessity of the proposed style refinement module and the effectiveness of the proposed algorithm under various hyper-parameter settings and datasets.
YU Song-Sen , ZHANG Ming-Wei , YANG Huan
2023, 32(8):151-161. DOI: 10.15888/j.cnki.csa.009185 CSTR:
Abstract:The existing detection method of ceramic tile surface defects has the problem of insufficient ability to identify small target defects, and the detection speed needs to be improved. Therefore, this study proposes a ceramic tile surface defect detection method based on improved YOLOv5. Firstly, due to the small size of ceramic tile surface defects, the detection abilities of three target detection head branches of YOLOv5s are compared and analyzed. It is found that the effectiveness of the model that removes the large target detection head and retains only the medium and small target detection heads is optimal. Secondly, to further realize the lightweight of the model, the study applies ghost convolution and C3Ghost modules to replace the ordinary convolution and C3 modules of YOLOv5s in the Backbone network, thus reducing the number of model parameters and the calculation amount. Finally, the coordinate attention mechanism module is added at the end of the Backbone and Neck networks of YOLOv5s to solve the problem of no attention preference in the original model. The proposed method is tested on the Tianchi ceramic tile defect detection dataset. The results show that the mean precision of the improved detection model averages 66%, which is 1.8% higher than the original YOLOv5s model. Besides, the size of the model is only 10.14 MB, and the number of parameters and the calculation amount is reduced by 48.7% and 38.7% respectively compared with the original model.
ZHOU Can , YANG Dong , WEI Song-Jie
2023, 32(8):162-170. DOI: 10.15888/j.cnki.csa.009194 CSTR:
Abstract:Current network traffic data show high-dimensional, polymorphic, and massive characteristics, which is a new challenge for intrusion detection. In order to address the limitations of low detection efficiency and lack of lightweight consideration in traditional intrusion detection models, a lightweight network intrusion detection model incorporating GRU and CNN is proposed. Firstly, redundant features in the dataset are removed by using extremely randomized trees. Secondly, feature extraction is performed by using GRU. By taking into account the long and short-term dependencies in the data, all hidden layer outputs are treated as sequence feature information for the next step; then a lightweight CNN model with structures such as inverse residual, depthwise separable convolution, and dilated convolution are used for spatial feature extraction; a channel attention mechanism is added to accelerate model convergence. Finally, experiments on the CIC-IDS2017 dataset show that the method has excellent detection performance, as well as the advantages of few model parameters, small model size, short training time, and short detection time, which is suitable for intrusion detection of network traffic.
LI Qi-An , LI Jun , CAO Di , ZHANG Ming
2023, 32(8):171-179. DOI: 10.15888/j.cnki.csa.009161 CSTR:
Abstract:In view of the flash point prediction of constant line aviation kerosene, a soft sensor method based on the grey correlation analysis (GRA) and improved whale optimization algorithm (IWOA) is proposed to optimize the extreme learning machine (ELM). GRA is used to calculate the information correlation degree between each auxiliary variable and the variable to be tested. Auxiliary variables are selected as inputs through the experimental method, and then IWOA is used to find the optimal weight threshold for ELM. In the early stage of the algorithm iteration, the improved Tent chaotic mapping is used to initialize the population to make the population distribution more uniform. The adaptive weight is combined with a random difference variation strategy to improve the optimization ability of the algorithm. The effectiveness of the improved algorithm is verified by eight benchmark test functions, and the improved model is proven to be effective in predicting flash points by the actual flash point data of the constant line aviation kerosene in an atmospheric tower of a refinery.
2023, 32(8):180-188. DOI: 10.15888/j.cnki.csa.009181 CSTR:
Abstract:Pallet recognition and positioning is one of the critical problems in unmanned forklift trucks. At present, target detection is mostly used for pallet positioning. However, target detection can only recognize the position of the pallet in the image and cannot obtain the spatial information of the pallet. To solve this problem, this study proposes a pallet positioning method based on target and key point detection with monocular vision, which is applied to detect the pallet and calculate the current dip angle and distance of the pallet. Firstly, target detection is carried out on the pallet. Then, the image will be cropped according to the detection result and input into the key points detection network. Through the detection of the key points and the inherent geometric features of the pallet, the edge adaptive adjustment is designed to obtain the high-precision profile information of the pallet. According to the geometric constraints, a method for calculating the dip angle and distance of the pallet based on contour points is proposed, and the RANSAC algorithm is adopted to improve the precision and stability of the calculation results, thus addressing the problem of pallet positioning. Experiments indicate that the average error of the proposed algorithm is less than 5° in the calculation of dip angle and less than 110 mm in the calculation of horizontal distance. It works well for pallet positioning and is of high practical value.
2023, 32(8):189-197. DOI: 10.15888/j.cnki.csa.009160 CSTR:
Abstract:When the basic Q-learning algorithm is applied to path planning, the randomness of action selection makes the early search efficiency of the algorithm low and the planning time-consuming, and even a complete and feasible path cannot be found. Therefore, a path planning algorithm of robots based on improved ant colony optimization (ACO) and dynamic Q-learning fusion is proposed. The pheromone increment mechanism of the elite ant model and sorting ant model is used, and a new pheromone increment updating method is designed to improve the exploration efficiency of robots. The pheromone matrix of the improved ant colony optimization algorithm is used to assign values to the Q table, so as to reduce the ineffective exploration of the robot at the initial stage. In addition, a dynamic selection strategy is designed to improve the convergence speed and the stability of the algorithm. Finally, different simulation experiments are carried out on two-dimensional static grid maps with different obstacle levels. The results show that the proposed method can effectively reduce the number of iterations and optimization time consumption in the optimization process.
LI Mu , YANG Heng , ZHANG Yi-Lang
2023, 32(8):198-206. DOI: 10.15888/j.cnki.csa.009214 CSTR:
Abstract:The current millimeter-wave radar has a poor detection effect and small detection range when applied to detect multi-person vital signs. In view of these problems, a method for extracting and separating multi-person heart rate and respiration is proposed. First, Capon beam forming technology is used to form null for signals in non-target areas, and phase extraction and phase unwrapping operations are carried out for target areas. Secondly, an adaptive harmonic tracking algorithm is used to filter noise. Finally, the variational mode decomposition method improved by particle swarm optimization and sample entropy (PSO-SE-VMD) is used to decompose the signal to obtain modal components, select appropriate modal components, and extract heart rate and respiration through a short-term autocorrelation algorithm. The experimental results show that the mean square error of heart rate is 5.55 and 3.15 when the included angle is 30° and 60°, respectively, which realizes multi-person detection and effectively improves the detection range.
FAN Hai-Wei , ZHANG Li-Miao , LU Xin-Si-Yu , WANG Shuai
2023, 32(8):207-213. DOI: 10.15888/j.cnki.csa.009209 CSTR:
Abstract:Given the uneven modeling degree between the user and project sides of the recommendation algorithms for knowledge graphs as well as high model complexity, a recommendation algorithm that integrates knowledge graph and lightweight graph convolutional network is proposed. On the user side, neighbor sets are generated based on user similarity, and the interaction records of users and their similar users are iteratively propagated on the knowledge graph for many times to enhance the representation of user features. On the project side, the entity on the knowledge graph is embedded and propagated to mine the project information related to user preferences. Then, the lightweight graph convolutional network is adopted to aggregate neighborhood features to obtain the feature representations of users and projects. At the same time, the attention mechanism is employed to incorporate neighborhood weights into the entities to enhance node embedding representation. Finally, the ratings between the user and the project are predicted. Experiments show that on the Book-Crossing dataset, compared with the optimal baseline, AUC and ACC are improved by 1.8% and 2.3%, respectively. On the Yelp2018 dataset, AUC and ACC are improved by 1.2% and 1.4%, respectively. The results demonstrate that the proposed model has better recommendation performance compared with other benchmark models.
2023, 32(8):214-220. DOI: 10.15888/j.cnki.csa.009202 CSTR:
Abstract:Non-intrusive load decomposition is an important part of the intelligent power consumption system, which can deeply analyze the power consumption information of users and is of great significance to load forecasting, demand side management, and power grid security. This study proposes a non-intrusive load decomposition method based on the improved particle swarm optimization factorial hidden Markov model (IPSO-FHMM). Gaussian mixture model (GMM) is used to cluster the states of individual loads. The total load model is represented by an FHMM. Since the Baum-Welch algorithm tends to converge to the local extremum, the PSO algorithm with linearly decreasing weights is introduced into the parameter training of FHMM. Simulation experiments using the AMPds2 dataset show that the model can effectively improve the decomposition accuracy.
2023, 32(8):221-229. DOI: 10.15888/j.cnki.csa.009197 CSTR:
Abstract:In this study, a new target detection method based on YOLOv5s is introduced to make up for the deficiencies of the current mainstream detection methods in terms of detection precision and missed detection of small target helmet wearing. Firstly, a small target detection layer is added to increase the detection precision of the small target helmet. Secondly, the ShuffleAttention mechanism is introduced. The number of ShuffleAttention groups is reduced from 64 to 16 in this study, which is more conducive to the global extraction of the depth and size of the model. Finally, the SA-BiFPN network structure is added to carry out the bidirectional multi-scale feature fusion to extract more effective feature information. Experiments show that compared with the original YOLOv5s algorithm, the average precision of the improved algorithm is increased by 1.7%, reaching 92.5%. The average precision of the algorithms with and without helmets is increased by 1.9% and 1.4% respectively. The proposed detection algorithm is compared with other target detection algorithms. The experimental results show that the SAB-YOLOv5s algorithm model is only 1.5M larger than the original YOLOv5s algorithm model, which is smaller than other algorithm models. It improves the average precision of target detection, reduces the probability of missing and false detection in small target detection, and achieves accurate and lightweight helmet wearing detection.
ZHENG Hong-Bin , SONG Xiao-Ru , LIU Kang
2023, 32(8):230-237. DOI: 10.15888/j.cnki.csa.009204 CSTR:
Abstract:Traffic sign recognition is a key part of autonomous driving technology. Given the problems of small targets and low recognition accuracy of traffic signs in road scenes, an improved YOLOv5 algorithm is proposed. First, the global attention mechanism (GAM) is introduced into the YOLOv5 model to improve the network’s ability to capture traffic sign features of different scales. Second, the GIoU loss function used in the YOLOv5 algorithm is replaced with the CIoU loss function which is more regressive to optimize the model and improve the recognition accuracy of traffic signs. Finally, the training is carried out on the Tsinghua-Tencent 100K dataset. The experimental results show that the average accuracy of the improved YOLOv5 algorithm for traffic sign recognition is 93.00%, which is 5.72% higher than that of the original one, indicating that the improved algorithm has better recognition performance.
LIU Gao-Hui , WANG Zhuang-Zhuang
2023, 32(8):238-243. DOI: 10.15888/j.cnki.csa.009213 CSTR:
Abstract:In view of complex networks, large computation amount, and high hardware platform requirements in the current process of applying deep learning to realize digital signal modulation and recognition, this study proposes a method of using signal constellation diagram modulation and recognition in the improved MobileNetV3 lightweight neural network. Firstly, the received MPSK and MQAM signals are converted into constellation diagrams, which are extracted from gray images, and gray images are enhanced. The image dataset of constellation diagrams is then constructed, and the cross-layer structure of ResNet is introduced into the MobileNetV3 network. As a result, the phenomenon of vanishing gradient caused by decreasing weight with increasing network layers is solved. Finally, the dataset of constellation diagrams is used to train the weight of the MobileNetV3 lightweight neural network, and then the constellation diagrams are recognized. MobileNetV3 greatly reduces the number of parameters and training time on the premise of ensuring recognition accuracy based on deep convolution separable technology and network architecture search (NAS) technology. For the modulation and recognition of simple signals, the lightweight neural network can effectively simplify the network structure and reduce hardware requirements. The simulation results show that the modulated signals (BPSK, QPSK, 8PSK, 16QAM, and 64QAM) can achieve modulation and recognition with a recognition rate of 99.76%. Compared with traditional networks using deep learning to realize modulation and recognition, the lightweight neural network can significantly reduce the number of network parameters and computational costs.
2023, 32(8):244-249. DOI: 10.15888/j.cnki.csa.009187 CSTR:
Abstract:To solve the problems of difficult decision-making, multiple interference factors, poor real-time performance and the realization of global optimization in maritime search and rescue (SAR) resource scheduling, this study employs an improved non-dominated sorting genetic (NSGA-II) algorithm by taking the Yellow Sea and the Bohai Sea as an example. Firstly, a multi-objective optimization model for maritime SAR resources is built based on AIS and BeiDou data. Secondly, the normal distribution crossover (NDX)-based operator is adopted by the improved NSGA-II algorithm to avoid falling into local optimum on the basis of expanding the search scope, and a complete Pareto solution set for the multi-objective problem is obtained. The comprehensive evaluation method (TOPSIS) is applied to obtain a compromise solution from the Pareto solution set, namely the optimal design of the search and rescue scheduling scheme. Finally, when the constraint factors such as the number of ships and time are considered, the improved NSGA-II algorithm is employed and compared with the NSGA-II and greedy algorithms. The simulations of the resource scheduling are carried out using the data collected from ships in the Yellow Sea and the Bohai Sea. The results show that the algorithm can effectively solve the problem of maritime SAR resource scheduling optimization.
2023, 32(8):250-258. DOI: 10.15888/j.cnki.csa.009196 CSTR:
Abstract:Diversion in severe weather is closely related to the designation of forbidden areas and path planning algorithms. Given the large invalid area in the Graham scanning results in the construction of the diversion environment, this study proposes a delineation method of Graham parallel scanning after the area is divided into blocks. For the sudden occurrence of severe weather and complex environments, the study proposes a dynamic programming method of composite structure conducting intelligent segmentation and ant colony algorithm local search based on incremental D*Lite global planning path. The pheromone updating strategy is improved to solve the shortcomings of slow convergence speed, long time consumed, and tendency to fall into local optimum. The experimental results show that the shape of the flight forbidden areas designated by Graham parallel scanning based on the divided blocks is closer to reality, and the area is reduced to 48.1% of the original one. D*Lite-ACO, an improved ant colony fusion D*Lite dynamic path planning algorithm for composite structures, takes both the global and local area into account and controls the replanning range between the current position and the targeted point. The evaluation metrics in path length, planning time, and iteration range are improved by 1.2%, 40.7%, and 66.7%, respectively.
LI Yue , TANG Dan , XU Yuan-Ping , SUN Min-Jun , CAI Hong-Liang , ZENG Qiong
2023, 32(8):259-268. DOI: 10.15888/j.cnki.csa.009201 CSTR:
Abstract:There are high requirements for the visual quality of the loaded image and the accuracy of the carrier image when information hiding technology is used to transmit information in some one-to-two communication scenarios. In this study, a reversible information hiding scheme of double images is proposed based on the modular function and the pixel value difference (PVD). The range table of the PVD is determined by the modular function and logarithm function, so as to determine the embedding bit of information per unit area and the coefficient of the modular function. The proposed scheme can keep the ratio of the modification amount of pixel value to the embedding bit of information no more than 0.5 even when the embedding bit of information keeps increasing. Therefore, compared with other schemes based on PVD, the proposed scheme has more advantages in the image with a larger difference of pixel pairs. The experimental results show that the scheme has higher PSNR and SSIM than some existing schemes in terms of the quality of the loaded image. In addition, the scheme has excellent performance against static attacks of RS steganography and PDH steganography, and it avoids the complicated solution of overflow problem in most information hiding schemes based on PVD.
2023, 32(8):269-277. DOI: 10.15888/j.cnki.csa.009174 CSTR:
Abstract:In increment learning, as the number of tasks increases, the knowledge learned by the model on the old task is catastrophically forgotten after the model is trained on the new task due to a series of problems such as step-by-step data migration, resulting in the degradation of the model performance on the old task. Given this problem, a class-incremental learning method based on knowledge decoupling is proposed in this study. This method can learn the common and unique knowledge of different tasks hierarchically, combine the two kinds of knowledge dynamically, and apply them to the downstream classification tasks. Besides, the mask strategy of the natural language model is used in replay learning, which prompts the model to quickly recall the knowledge of the previous tasks. In class-incremental experiments on NLP datasets—AGNews, Yelp, Amazon, DBPedia and Yahoo, the proposed method can effectively reduce the forgetting of the model and improve the accuracy and other indicators on various tasks.
2023, 32(8):278-285. DOI: 10.15888/j.cnki.csa.009111 CSTR:
Abstract:Image text messages are ubiquitous in everyday life, and while conveying information, they also bring the problem of information leakage. In recent years, text erasure models have solved this problem very well. However, in industrial scenarios where images are highlighted and non-character areas with high contrast, the models are often susceptible to their influence of attentional drift, thus neglecting the character areas and resulting in unsatisfactory text erasure. In order to overcome this limitation, this study proposes a new text erasure network based on attention. Specifically, an additional feature layer is embedded in the network to score the areas where characters are present in the generated image. At the same time, the study introduces a Gaussian heat map and uses it as the basis for designing a loss function that corrects the model’s attention and guides it to accurate character areas in a supervised manner. Through comparison on four different datasets, the proposed method has better erasure results overall. In addition, the method has the same high flexibility for the text erasure task in the presence of complex backgrounds in images.
CHEN Ke-Xin , QIAO Huan , FANG Ling-Ling
2023, 32(8):286-294. DOI: 10.15888/j.cnki.csa.009190 CSTR:
Abstract:Low efficiency, missed diagnosis and misdiagnosis exist in the manual diagnosis and classification of fundus retinal images. To this end, a convolutional network model based on the attention mechanism SENet and GBDT gradient boosting classification method is proposed to help physicians distinguish the fundus screening results of various diseases and reduce the rate of missed and false detection. Based on the deep learning model, the sampling convolutional network is applied to learn the extracted three characteristics of retinal hemorrhage, optic disc edema and macular degeneration, and the GBDT gradient boosting method is employed for identification and classification. The real clinical data provided by the Third People’s Hospital of Dalian are used to evaluate the performance of the proposed method. The results show that the average accuracy, precision, and recall rates of the model reach 99.27%, 98.35%, and 0.9810 respectively, and the model has certain practical value in the clinical diagnosis of retinal diseases.
XU Xiao-Ping , ZHANG Yong , LIU Guang-Jun , LIU Long
2023, 32(8):295-302. DOI: 10.15888/j.cnki.csa.009224 CSTR:
Abstract:Dust accumulation is one of the main factors of power loss of photovoltaic modules. In view of the characteristics of dust particles and the high cost of using scanning electron microscopy, this study proposes a scheme to identify dust on photovoltaic panels by using the improved ShuffleNetV2 model. On the basis of the ShuffleNetV2 network model, the Mish activation function is used to integrate the better feature information into the neural network; then the mixed depth convolution is used to ensure the richness of feature extraction. Finally, the coordinate attention mechanism module is used to replace the point-by-point convolution of the tail of the right branch of the basic unit in the ShuffleNetV2 model, so as to improve the accuracy and reduce the calculation amount. The experimental results show that the improved ShuffleNetV2 model has higher accuracy and lower complexity than the existing classical model, which effectively proves that the proposed scheme is feasible.
2023, 32(8):303-311. DOI: 10.15888/j.cnki.csa.009189 CSTR:
Abstract:In view of the high energy consumption of data centers, the random dynamics of application task load, and the low latency requirements of users for applications, on the basis of the fog computing system architecture, a container integration method based on advantage actor-critic (A2C) algorithm is proposed to minimize energy consumption and average response time. The method uses checkpoint/recovery technology to migrate containers in real time to achieve resource integration. An end-to-end decision model from data center system state to container integration is constructed, and an adaptive multi-objective reward function is proposed. The gradient-based backpropagation algorithm is used to accelerate the convergence speed of the decision model. Simulation results based on real task load datasets show that the proposed method can effectively reduce energy consumption while ensuring service quality.