Abstract: Long-term object tracking remains a formidable challenge compared to short-term object tracking. However, existing long-term tracking algorithms often perform poorly when faced with challenges such as targets frequently appearing and disappearing, and drastic changes in target appearance. This study proposes a novel, robust, and real-time long-term tracking framework based on local search modules and global search tracking modules. The local search module utilizes the TransT short-term tracker to generate a series of candidate boxes, and the best candidate box is determined through confidence scoring. A novel global search tracking module is developed for global re-detection, based on the Faster R-CNN model, with the introduction of Non-Local operations and multi-level instance feature fusion modules in the RPN and R-CNN stages, aiming to fully exploit target instance-level features. To improve the performance of the global search tracking module, a dual-template update strategy is designed to enhance the robustness of the tracker. By utilizing templates updated at different time points, the tracker can better adapt to target changes. The target presence is determined based on local or global confidence scores, and the local or global search tracking strategy is selected in the next frame. Additionally, the local search module is capable of estimating the position and size of the target. Moreover, a ranking loss function is introduced for the global search tracker, implicitly learning the similarity between region proposals and the original query target. A large number of experiments are conducted on multiple tracking datasets to comprehensively assess the proposed tracking framework. The results consistently demonstrate that the proposed tracking framework achieves satisfactory performance.
Abstract: The uncertainty of neural networks reflects the predictive confidence of deep learning models, enabling timely human intervention in unreliable decision-making, which is crucial for enhancing system safety. However, existing measurement methods often require significant modifications to the model or training process, leading to high implementation complexity. To address this, this study proposes an uncertainty measurement approach utilizing neuron statistical modeling and analysis with activation values within a single forward propagation. An improved kernel density estimation technology is employed to construct neuron activation distributions and stimulate neuron normal operating range. Subsequently, a neighborhood-weighted density estimation method is utilized to calculate anomaly factors, effectively qualifying deviations of test samples from neuron activation distribution. Finally, by statistically combining the anomaly factors of each neuron, the cumulative anomaly factors of the sample provide a new perspective in assessing model uncertainty. Experimental results across multiple public datasets and models visually demonstrate the significant effectiveness of the proposed method in distinguishing between in-domain and out-of-domain samples through visualizing feature maps. Moreover, the method exhibits exceptional performance in out-of-domain detection tasks, with AUROC exceeding other methods across various experimental setups, validating its generality and effectiveness.
Abstract: Hierarchical federated learning (HFL) aims to optimize model performance and maintain data privacy through multi-layered collaborative learning. However, its effectiveness relies on effective incentive mechanisms for participating parties and strategies to address information asymmetry. To address these issues, this study proposes a layered incentive mechanism for protecting the privacy of end devices, edge servers, and cloud servers. At the edge-device layer, edge servers act as intermediaries, using the multi-dimensional contract theory to design a variety of contract items. This encourages end devices to participate in HFL using local data without disclosing the costs of data collection, model training, and model transmission. At the cloud-edge layer, the Stackelberg game models the relationship between unit data reward and data size between a cloud server and edge servers and subsequently transforms it into a Markov process, all while maintaining the confidentiality of the edge servers’ unit profit. Then, multi-agent deep reinforcement learning (MADRL) is used to incrementally approach the Stackelberg equilibrium (SE) while ensuring privacy. Experimental results indicate that the proposed incentive mechanism outperforms traditional approaches, yielding an almost 11% increase in cloud server revenue and an approximately 18 times improvement in the cost-effectiveness gained.
Abstract: The current image denoising algorithms based on deep learning are unable to consider the local and global feature information comprehensively, which in turn affects the image denoising effect at the details. To address this problem, this study proposes a hybrid CNN and Transformer image denoising network (HCT-Net). First, CNN and Transformer coupling block (CTB) is proposed to construct a two-branch structure that integrates convolution and channel self-attention to alleviate the high computational overhead caused by relying solely on the Transformer. At the same time, the attention weights are dynamically allocated so that the network focuses on important feature information. Secondly, the self-attention enhanced convolution module (SAConv) is designed to adopt the progressive combination of modules and nonlinear transformations to attenuate the noise signal interference and identify local features under complex noise levels. Experimental results on six benchmark datasets show that HCT-Net has better feature perception ability than some current advanced denoising methods and can suppress high-frequency noise signals to recover the edge and detail information of images.
Abstract: As a very challenging project in target detection, small target detection is widely distributed in daily life. In video surveillance scenarios, pedestrians’ faces about 20 meters away from the camera can be considered small targets. Due to the possibility of mutual occlusion of faces and their susceptibility to noise and weather, lighting conditions, the performance of existing target detection models on such small targets is inferior to that on medium and large targets. To address these issues, this study proposes an improved YOLOv7 model with a high-resolution detection head and transforms the backbone network based on GhostNetV2. At the same time, the PANet structure is replaced by the BiFPN and SA attention modules combined to enhance the multi-scale feature fusion capability; the original CIoU loss function is improved by combining the Wasserstein distance, reducing the sensitivity of small targets to anchor frame position offset. This study conducts comparative experiments on the public dataset VisDrone2019 and a self-made video surveillance dataset. Results show that the mAP of the improved method proposed in this study improved to 50.1% on the VisDrone2019 dataset and is 1.6 percentage points higher than existing methods on the self-made video surveillance dataset, which effectively improves the ability of small target detection and achieves good real-time performance on the GTX1080Ti.
Abstract: In semantic segmentation tasks, the downsampling process of the encoder can lead to a decrease in resolution, resulting in the loss of spatial information details in the image. As a result, segmentation discontinuity or incorrect segmentation may occur at object edges, which can damage overall segmentation performance. To address the above issues, an image semantic segmentation model EASSNet based on edge features and attention mechanisms is proposed. Firstly, the edge detection operator is used to calculate the edge map of the original image, and edge features are extracted through pooling downsampling and convolution operations. Next, edge features are fused into deep semantic features extracted by the encoder, restoring the spatial detail information of downsampled feature images, and strengthening meaningful information through attention mechanisms to improve the accuracy of object edge segmentation and overall semantic segmentation performance. Finally, EASSNet achieves the average intersection over the union of 85.9% and 76.7% on the PASCAL VOC 2012 and Cityscapes datasets, respectively. Compared with current popular semantic segmentation networks, EASSNet has significant advantages in overall segmentation performance and object edge segmentation.
Abstract: Remote sensing object detection usually faces challenges such as large variations in image scale, small and densely arranged targets, and high aspect ratios, which make it difficult to achieve high-precision oriented object detection. This study proposes a global context attentional feature fusion pyramid network. First, a triple attentional feature fusion module is designed, which can better fuse features with semantic and scale inconsistencies. Then, an intra-layer conditioning method is introduced to improve the module and a global context enhancement network is proposed, which refines deep features containing high-level semantic information to improve the characterization ability. On this basis, a global context attentional feature fusion pyramid network is designed with the idea of global centralized regulation to modulate shallow multi-scale features by using attention-modulated features. Experiments have been conducted on multiple public data sets, and results show that the high-precision evaluation indicators of the proposed network are better than those of the current advanced models.
Abstract: Concrete cracks have negative impacts on the structural load-bearing capacity, durability, and waterproofing. Therefore, early crack detection is of paramount importance. The rapid development of big data and deep learning provides effective methods for intelligent crack detection. To address the issues of imbalanced positive and negative samples, as well as the challenges posed by deep colors and low luminance in crack areas during the crack detection process, this study proposes a crack detection method based on Swin Transformer U-Net (ST-UNet) and target features. This algorithm introduces the CBAM attention mechanism into the network, enabling the network to focus more on the pixel regions in the image that are crucial for crack detection, thereby enhancing the feature representation capability of crack images. The Focal+Dice mixed loss function replaces the single cross-entropy loss function to address the problem of uneven distribution of positive and negative sample images. Additionally, the design of the APSD regularization term optimizes the loss function, addressing the issues of deep colors and low luminance in crack areas and reducing both missed rates and false rates in detection. The results of crack detection show a 22% improvement in IoU and a 17% increase in the Dice index, indicating the effectiveness and feasibility of the algorithm.
Abstract: Single-cell RNA sequencing (scRNA-seq) performs high-throughput sequencing analysis of the transcriptomes at the level of individual cells. Its primary application is to identify cell subpopulations with distinct functions, usually based on cell clustering. However, the high dimensionality, noise, and sparsity of scRNA-seq data make clustering challenging. Traditional clustering methods are inadequate, and most existing single-cell clustering approaches only consider gene expression patterns while ignoring relationships between cells. To address these issues, a self-optimizing single-cell clustering method with contrastive learning and graph neural network (scCLG) is proposed. This method employs an autoencoder to learn cellular feature distribution. First, it begins by constructing a cell-gene graph, which is encoded using a graph neural network to effectively harness information on intercellular relationships. Subgraph sampling and feature masking create augmented views for contrastive learning, further optimizing feature representation. Finally, a self-optimizing strategy is utilized to jointly train the clustering and feature modules, continually refining feature representation and clustering centers for more accurate clustering. Experiments on 10 real scRNA-seq datasets demonstrate that scCLG can learn robust representations of cell features, significantly surpassing other methods in clustering accuracy.
Abstract: Laser point cloud matching is a key factor affecting the accuracy and efficiency of laser SLAM systems. Traditional laser SLAM algorithms cannot effectively distinguish scene structures and result in performance degradation due to poor feature extraction in unstructured scenes. To address this issue, a joint coherent point drift (CPD) adaptive laser SLAM algorithm for complex scenes is proposed, called CPD-LOAM. First, a scene structure identification method combining prejudgment and verification is proposed, in which scene feature variables are introduced to make preliminary judgments on the scene structure. Then, surface curvature is further used to verify the preliminary judgments from the perspective of geometric features, enhancing the accuracy of scene structure identification. In addition, the CPD algorithm is combined for point cloud pre-registration in unstructured scenes, and then the ICP algorithm is used for re-registration to solve the problem of feature degradation in this scene, thereby improving the accuracy and efficiency of point cloud registration. The experimental results show that the proposed scene feature variables and surface curvature can effectively distinguish structure scenes based on the set threshold. The validation results on the public dataset KITTI show that CPD-LOAM reduces the positioning error by 84.47% compared to the LOAM algorithm, and improves the positioning accuracy by 55.88% and 30.52% respectively, compared to the LEGO-LOAM and LIO-SAM algorithms, with higher efficiency and robustness.
Abstract: It is very challenging to estimate the traffic flow before urban road deployment. To solve this problem, this study proposes a new conditional urban traffic generating adversarial network (Curb-GAN) model, which utilizes a conditional generating adversarial network (CGAN) to generate urban traffic flow data. Firstly, the distance relationship and external feature information of each node of the road network are treated as conditions to control the generated results. Secondly, the spatial autocorrelation of the road network is captured by the graph convolutional network (GCN), and the time dependence of traffic in different time slots is captured by the self-attention (SA) mechanism and gated cycle unit (GRU). Finally, the trained generator generates traffic flow data. A large number of experiments on two real spatiotemporal datasets show that the estimation accuracy of the Curb-GAN model is superior to the main baseline methods and can produce more meaningful estimates.
Abstract: As voice conversion technology becomes increasingly prevalent in human-computer interaction, the need for highly expressive speech continues to grow. Currently, voice conversion primarily relies on decoupling acoustic features, emphasizing the decoupling of content and timbre features, but often neglects the emotional features in speech, resulting in insufficient emotional expressiveness in converted audio. To address this problem, this study introduces a novel model for highly expressive voice conversion with multiple mutual information constraints (MMIC-EVC). On top of decoupling content and timbre features, the model incorporates an expressiveness module to capture discourse-level prosody and rhythm features, enabling the conveyance of emotional features. It constrains every encoder to focus on its acoustic embedding by minimizing the variational upper bounds of multiple mutual information between features. Experiments on the CSTR-VCTK and ESD speech datasets indicate that the converted audio of the proposed model achieves a mean opinion score of 3.78 for naturalness and a Mel cepstral distortion of 5.39, significantly outperforming baseline models in the best-worst sensitivity test. The MMIC-EVC model effectively decouples rhythmic and prosodic features, facilitating high expressiveness in voice conversion, and thereby providing a more natural and better user experience in human-computer interaction.
Abstract: Microservices architecture, as an agile and resilient software design paradigm, has been widely applied in the field of software development. However, with the increasing number of microservices, the complexity of the systems rises, and the service quality of the system decreases. Enhancing the quality of online business service under the microservices architecture is a critical challenge. Optimization of service links is a key aspect in addressing this challenge. This work conducts an in-depth study of service links under the microservices architecture and proposes various link analysis methods, including link sampling, link topology generation, strong and weak dependency determination, identification of cyclic calls, and recognition of redundant and ineffective calls. Building upon these methods, the paper implements a series of effective optimization strategies, such as robust testing, disassembling cyclic calls, reducing and merging redundant calls, fault self-healing, and link tracing. These strategies effectively improve the service quality of production and operation system services under the microservices architecture.
Abstract: Few-shot semantic segmentation is a computer vision task that involves segmenting potential object categories in query images with a small number of annotated samples. However, existing methods still face two challenges. Firstly, there is a prototype bias problem, resulting in prototypes having less foreground object information and making it difficult to simulate real category statistics. The other issue is feature degradation, which means that the model only focuses on the current category rather than potential categories. This study proposes a new network based on contrastive prototypes and background mining. The main idea of the network is to enable the model to learn more representative prototypes and identify potential categories from the background. Specifically, a specific class learning branch constructs a large and consistent prototype dictionary and then uses InfoNCE loss to make the prototypes more discriminative. On the other hand, the background mining branch initializes background prototypes and uses an attention mechanism between the constructed background prototypes and the dictionary to mine potential categories. Experimental results on the PASCAL-5i and COCO-20i datasets demonstrate excellent performance of the model. Under the 1-shot setting using the ResNet-50 network, 64.9% and 44.2% are achieved, an improvement of 4.0% and 1.9%, respectively, compared to the baseline model.
Abstract: To address the challenges posed by fixed network architectures and deep network layers, such as incomplete expression of complex scene predictions, high computational costs, and deployment difficulties, this study proposes a new network called wide structure dynamic super-resolution network (W-SDNet). Initially, a residual enhancement block, consisting of shift convolution residual structures, is designed to enhance the capability of extracting hierarchical features for image super-resolution and to reduce computational costs. Next, a wide enhancement module is introduced, employing a dual-branch four-layer parallel structure to extract deep information while using a dynamic network’s gating mechanism to selectively enhance feature expression. This module also utilizes an attention mechanism that integrates edge detection operators to improve the expressiveness of edge details. To prevent interference among components within the wide enhancement block, a refinement block utilizing group convolution and channel splitting is employed. Ultimately, high-quality image reconstruction is achieved through a construction block. Experimental results show that W-SDNet outperforms the existing mainstream algorithms in peak signal-to-noise ratio (PSNR) metrics when zoomed in 4 times on five publicly available test datasets, and the number of parameters in the model is significantly reduced. The results demonstrate the advantages of W-SDNet in terms of complexity, performance, and recovery time of super-resolution reconstruction.
Abstract: Aiming at the problem that traditional machine learning methods are not ideal in terms of effect and time for identifying crop leaf pests and diseases with small samples and multiple categories, this study utilizes an improved ResNet model to realize the recognition of crop pests and diseases. By adding dropout layers, activation function, maximum pooling layer, and attention mechanism, the robustness and feature capturing ability of the model is improved, and the accuracy of pest and disease recognition with a lower number of model parameters is achieved. Firstly, the images obtained from the public dataset Plant Village are preprocessed and enhanced, and the ReLU activation function is replaced by PReLU to solve the problem of neuron necrosis in the part of the ReLU function less than 0. Then, a dropout layer is added before the global average pooling layer, and a reasonable threshold value is set to effectively avoid the occurrence of overfitting and to enhance the robustness of the model. In addition, a maximum pooling layer is added between the dropout and global average pooling layer, which not only expands the receptive field of neurons, but also helps the model to obtain the most significant features of local pests and diseases, reduce the noise effect from image background, and realize secondary feature extraction. Finally, the CBAM attention mechanism is embedded, which makes the model automatically learn the most important channel information in the input feature maps and weight it between the channel and space to better capture the semantic information in the images. Experimental results show that the improved model recognizes the test set with an accuracy of 99.15% with a model parameter count of only 9.13M, which exceeds the accuracy of Xception, InceptionV3, and the original ResNet by 1.01, 0.68, and 0.59 percentage points, respectively, and reduces the model parameter count. This provides a state-of-the-art crop disease recognition deep learning method.
Abstract: According to the characteristics of rural household garbage generation, a multi-objective garbage collection and transportation path optimization model is constructed to minimize transportation cost, vehicle delay penalty cost, and environmental penalty cost, considering the variable collection and transportation cycle of domestic waste classification. The solution space is reconstructed with the combination of random choice method and nearest neighbor method, and the simulated annealing algorithm with variable neighborhood is used to solve the model. Through case simulation and comparative analysis, it can be seen that the proposed model and algorithm have good optimization results in terms of total collection and transportation cost and total distance. Based on the analysis, the results in this study are also superior to the optimal solutions of the classical simulated annealing algorithm and variable neighborhood search algorithm. Compared with the traditional fixed cycle collection and transportation scheme, the model established in this study subtracts the environmental pollution cost and modifies the total cost by more than 110.4%, which can effectively solve the problem of garbage collection and transportation path optimization in rural areas.
Abstract: Accurate estimation of tropical cyclone intensity is the basis of effective intensity prediction and is crucial for disaster forecasting. Current tropical cyclone intensity estimation technology based on deep learning shows superior performance, but there is still a problem of insufficient physical information fusion. Therefore, based on the deep learning framework, this study proposes a physical factor fusion for tropical cyclone intensity estimation model (PF-TCIE) to estimate the intensity of tropical cyclones in the northwest Pacific. PF-TCIE consists of a multi-channel satellite cloud image learning branch and a physical information extraction branch. The multi-channel satellite cloud image learning branch is used to extract tropical cyclone cloud system features, and the physical information extraction branch is used to extract physical factor features to constrain the learning of cloud system features. The data used in this article include Himawari-8 satellite data and ERA-5 reanalysis data. Experimental results show that after introducing multiple channels, the root mean squared error (RMSE) of the model is reduced by 3.7% compared with a single channel. At the same time, the introduction of physical information further reduces the model error by 8.5%. The RMSE of PF-TCIE finally reaches 4.83 m/s, which is better than most deep learning methods.
Abstract: Linear wireless sensor networks (LWSNs) are widely used to monitor key infrastructure in linear topology such as railways and natural gas pipelines, whose reliability is very important, and coverage is an important indicator to measure reliability. Currently, most methods for evaluating the coverage of LWSNs are based on a 0/1 disk sensing model, but in practice, the monitoring reliability of sensors follows a probability distribution with the increase of coverage radius. Therefore, a reliability analysis method based on a probabilistic sensing model is proposed, which can calculate the effective sensing range based on the physical parameters of sensors, thereby improving the accuracy of evaluation. To reduce the size of the system state space, a binary decision tree is used to construct the LWSN system state set. In this study, the failure probability of nodes is assumed to follow a Weibull distribution, and simulation experiments are conducted for different communication radii and sensing ranges. The results show that this method can effectively evaluate the reliability of LWSNs, and the evaluation accuracy is more accurate than the 0/1 disk sensing model.
Abstract: Due to the uncertainty of objects in remote sensing images and significant differences in feature information between different images, existing super-resolution methods yield poor reconstruction results. Therefore, this study proposes an NG-MAT model that combines the Swin Transformer and the N-gram model to achieve super-resolution of remote sensing images. Firstly, multiple attention modules are connected in parallel on the branch of the original Transformer to extract global feature information for activating more pixels. Secondly, the N-gram model from natural language processing is applied to the field of image processing, utilizing a trigram N-gram model to enhance information interaction between windows. The proposed method achieves peak signal-to-noise ratios of 34.68 dB, 31.03 dB, and 28.99 dB at amplification factors of 2, 3, and 4, respectively, and structural similarity indices of 0.926 6, 0.844 4, and 0.773 4 at the same amplification factors on the selected dataset. Experimental results demonstrate that the proposed method outperforms other similar methods in various metrics.
Abstract: In the extraction of roads from high-resolution remote sensing images, problems such as local disconnections and the loss of details are common due to the complex backgrounds and the presence of trees and buildings covering the roads during the image formation process. To solve these problems, this study proposes a road extraction model called MSDANet, based on a multi-scale difference aggregation mechanism. The model has an encoder-decoder structure, using the Res2Net module as the backbone network of the encoder to obtain information with fine-grained and multi-scale features from the images and to expand the receptive field for feature extraction. Additionally, a gated axial guidance module, in conjunction with road morphological features, is applied to highlight the representation of road features and improve the connectivity of long-distance roads in road extraction. Furthermore, a multi-scale difference aggregation module is used between the encoder and decoder to extract and aggregate the different information between shallow and deep features. The aggregated features are then fused with the decoded features through a feature fusion module to facilitate the decoder to accurately restore road features. The proposed method has been evaluated on two high-resolution remote sensing datasets: DeepGlobe and CHN6-CUG. The results show that the F1 score of the MSDANet model is 80.37% and 78.17% respectively, and the IoU is 67.18% and 64.17% respectively. It indicates that the proposed model outperforms the comparison models.
Abstract: This study constructs a named entity recognition (NER) model suitable for the bone-sign interpretations of Han Chang’an City to solve the problem of the inability to classify some bone-sign interpretations due to the lack of key content. The original text of the bone-sign interpretations of Han Chang’an City is used as the dataset, and the begin, inside, outside, end (BIOE) annotation method is utilized to annotate the bone-sign interpretation entities. A multi-feature fusion network (MFFN) model is proposed, which not only considers the structural features of individual characters but also integrates the structural features of character-word combinations to enhance the model’s comprehension of the bone-sign interpretations. The experimental results demonstrate that the MFFN model can better identify the named entities of the bone-sign interpretations of Han Chang’an City and classify the bone-sign interpretations, outperforming existing NER models. This model provides historians and researchers with richer and more precise data support.
Abstract: Knowledge distillation (KD) is a technique that transfers knowledge from a complex model (teacher model) to a simpler model (student model). While many popular distillation methods currently focus on intermediate feature layers, response-based knowledge distillation (RKD) has regained its position among the SOTA models after decoupled knowledge distillation (DKD) was introduced. RKD leverages strong consistency constraints to split classic knowledge distillation into two parts, addressing the issue of high coupling. However, this approach overlooks the significant representation gap caused by the disparity in teacher-student network architectures, leading to the problem where smaller student models cannot effectively learn knowledge from teacher models. To solve this problem, this study proposes a diffusion model to narrow the representation gap between teacher and student models. This model transfers teacher features to train a lightweight diffusion model, which is then used to denoise the student model, thus reducing the representation gap between teacher and student models. Extensive experiments demonstrate that the proposed model achieves significant improvements over baseline models on CIFAR-100 and ImageNet datasets, maintaining good performance even when there is a large gap in teacher-student network architectures.
Abstract: To solve the problems of poor degradation awareness, easy detail loss, and ineffective color cast correction caused by existing underwater image enhancement algorithms, this study proposes a degradation-aware underwater image enhancement network based on wavelet transform. This model mainly contains a degradation representation extraction network based on contrastive learning and an underwater image enhancement network based on multiple-level wavelet transform. Firstly, the degradation representation extraction network uses an encoder and contrastive learning method to extract unique degradation representations from each underwater image. Secondly, a three-level wavelet transform module is built under the principle of multi-level wavelet transform enhancement algorithm, aiming to conduct multi-scale detail and color enhancement in the frequency domain. Lastly, a multiple-level wavelet transform enhancement network is built with three-level wavelet transform modules, and the extracted degradation representations are introduced into this network for better implementing multiple-level wavelet transform enhancement with perceived degradation information. Experimental results show that the proposed algorithm outperforms existing algorithms in color correction and detail enhancement in terms of sharply enhanced results, i.e. structural similarity is improved by 16%, peak signal-to-noise ratio is improved by 9%, and underwater image quality is improved by 14%, making it suitable for underwater image enhancement tasks.
Abstract: To protect data privacy in blockchain cross-chain transactions, this study proposes a privacy protection scheme based on homomorphic encryption. The scheme improves the homomorphic encryption algorithm to support floating-point operations while retaining the additive homomorphic property of the original algorithm, and it supports any number of addition operations to realize the privacy protection of cross-chain transaction amounts. To prevent security threats to transactions posed by mismanagement or loss of the private key with homomorphic encryption, a private key sharing mechanism based on Shamir’s secret sharing algorithm is introduced into the scheme. This mechanism prevents untrustworthy nodes from sending malicious values to recover the private key by adding ECDSA digital signatures to verify the private key share. It also considers the dynamic update of the private key share after a node drops or leaves to prevent node collusion. Security analysis and experimental verification show that the proposed scheme can effectively protect privacy in cross-chain transactions.
Abstract: Dimensionality reduction plays a crucial role in machine learning and pattern recognition. The existing projection-based methods tend to solely utilize distance information or representation relationships among data points to maintain the data structure, which makes it difficult to effectively capture the nonlinear features and complex correlations of data manifolds in high-dimensional space. To address this issue, this study proposes a method: enhanced locality preserving projection with latent sparse representation learning (LPP_SRL). The method not only utilizes distance information to preserve the local structure of the data but also leverages multiple local linear representations to unveil the global nonlinear structure of the data. Moreover, to establish a connection between projection learning and sparse self-representation, this paper employs a novel strategy by replacing the dictionary in sparse self-representation with reconstructed samples from the low-dimensional representation. This approach effectively filters out irrelevant features and noise, thereby better preserving the principal components in the original feature space. Extensive experiments conducted on multiple publicly available benchmark datasets have demonstrated the effectiveness and superiority of the proposed method.
Abstract: This study proposes a novel identification method for OFDM emitters to address the issue of low classification accuracy in traditional methods for specific emitter identification, where subtle fingerprint features of OFDM emitters are affected by data signal components and channel noise. Considering the subcarrier spectrum of the short preamble, this method utilizes the fixed frequency boundary-based empirical wavelet transform (FFB-EWT) and a deep residual network. Initially, the short preamble of OFDM signals is extracted to define fixed boundary conditions based on the frequency intervals of the subcarriers in the short preamble. The boundary values in the frequency domain are then applied to FFB-EWT for signal decomposition to remove the subcarrier components containing preamble information. Subsequently, the signal-to-noise ratio of fingerprint features is enhanced by accumulating the null subcarrier components of adjacent frames. Next, a dual-channel residual network called ResNet18, integrated with a non-local attention module and a channel attention module, is used for feature extraction from IQ data inputs, with classification performed via the Softmax function. Finally, the Oracle public dataset is chosen to validate the feasibility of the method. Experimental results demonstrate that the FFB-EWT method achieves accuracy rates of 98.17% and 89.33% for identifying six different emitters under 6 dB and 0 dB conditions, respectively, proving the effectiveness of the method in environments with low signal-to-noise ratios.
Abstract: This study proposes an improved lightweight pavement disease detection model called pavement disease-YOLOv5s (PD-YOLOv5s) to address the problem of low detection accuracy in pavement disease detection due to diverse disease forms, large-scale differences, and similar background grayscale values. Firstly, the model applies a three-dimensional parameter-free attention mechanism called SimAM to effectively enhance the feature extraction ability of the model in complex environments without increasing the number of model parameters. Secondly, the model integrates the residual block Res2NetBlock to expand its receptive field and improve its feature fusion at a finer granularity level. Finally, the SPD-GSConv module is constructed for downsampling to effectively capture target features of different scales and integrate the extracted features into the model to perform pavement disease classification detection. Experimental results on real pavement disease datasets show that the mean precision of the PD-YOLOv5s model is improved by 4.7% compared to that of the original YOLOv5s model. The parameters of the proposed model are reduced to 6.78M, and the detection speed reaches 53.97 f/s. The PD-YOLOv5s model has superior detection performance while reducing network computing costs, making it valuable for engineering applications in pavement disease detection.
Abstract: In the task of few-shot open-set recognition (FSOSR), effectively distinguishing closed-set from open-set samples presents a notable challenge, especially in cases of sample scarcity. Current approaches exhibit uncertainty in describing boundaries for known class distributions, leading to insufficient discrimination between closed-set and open-set spaces. To tackle this issue, this study introduces a novel method for FSOSR leveraging feature decoupling and openness learning. The primary objective is to employ a feature decoupling module to compel the model to decouple class-specific features and open-set features, thereby accentuating the disparity between unknown and known classes. To achieve effective feature decoupling, an openness learning loss is introduced to facilitate the acquisition of open-set features. By integrating similarity metric values and anti-openness scores as the optimization target, the model is steered towards learning more discriminative feature representations. Experimental results on publicly datasets miniImageNet and tieredImageNet demonstrate that the proposed method substantially enhances the detection rate of unknown class samples while accurately classifying known classes.
Abstract: Cross-modality person re-identification is widely used in intelligent safety monitoring systems, aiming to match visible light images and infrared images of the same person. Due to the inherent modality differences between visible and infrared modalities, cross-modality person re-identification poses significant challenges in practical applications. To alleviate modality differences, researchers have proposed many effective solutions. However, existing methods extract different modality features without corresponding modality information, resulting in insufficient discriminability of the features. To improve the discriminability of the features extracted from models, this study proposes a cross-modality person re-identification method based on attention feature fusion. By designing an efficient feature extraction network and attention feature fusion module, and optimizing multiple loss functions, the fusion and alignment of different modality information can be achieved, thereby promoting the model matching accuracy for persons. Experimental results show that this method achieves great performance on multiple datasets.
Abstract: This study proposes a federated learning algorithm for transient stability in a distributed power system and a Byzantine node detection algorithm to assess the transient stability of various regions in a distributed smart grid and address potential network attacks. In the federated learning framework, each regional power grid independently uses neural networks to assess its transient stability, while the central server integrates the training gradients, provides feedback, and updates them. To improve the security of the framework, the model constructed in this study clusters the updated gradients of each regional power grid to identify outliers, which refer to regional power grids that are under attack, so as to detect Byzantine nodes. Considering the high-dimensional characteristics of gradients, direct clustering will lead to inaccurate distance measurement. Therefore, an autoencoder is trained online to reduce the dimension of the gradients. Density clustering is then performed on the lower-dimensional gradients to select a small number of nodes as a set of Byzantine nodes and permanently eliminate the gradients provided by Byzantine nodes. An example of electromechanical transient simulation for angle stability is used for verification. The results show that this method addresses network attacks while assessing the temporary stability of the power system. Compared with other methods, this method significantly improves the average accuracy and stability, effectively preventing fluctuations in assessment accuracy.
Abstract: Sleep staging is highly important for sleep monitoring and sleep quality assessment. High-precision sleep staging can assist physicians in correctly evaluating sleep quality during clinical diagnosis. Although existing studies on automatic sleep staging have achieved relatively reliable accuracy, there are still problems that need to be solved: (1) How can sleep features be extracted from patients more comprehensively? (2) How can effective rules for sleep state transition be obtained from the captured sleep features? (3) How can multimodal data be effectively utilized to improve classification accuracy? To solve the above problems, this study proposes an automatic sleep staging network based on multi-head self-attention. To extract the modal characteristics of EEG and EOG in sleep stages separately, this network uses a parallel two-stream convolutional neural network structure to process the original EEG and EOG data separately. In addition, the model uses a contextual learning module, which consists of a multi-head self-attention module and a residual network, to capture the multifaceted features of the sequences and to learn the correlation and significance between the sequences. Finally, the model utilizes unidirectional LSTM to learn the transition rules for sleep stages. The results of the sleep staging experiments show that the model proposed in this paper achieves an overall accuracy of 85.7% on the Sleep-EDF dataset, with an MF1 score of 80.6%. Moreover, its accuracy and robustness are better than those of the existing automatic sleep staging methods. This indicates that the proposed model is valuable for automatic sleep staging research.
Abstract: This study proposes a multi-modal deep-level high-confidence fusion tracking algorithm in response to the tracking failure issues caused by changes in target appearance and environment in single-target tracking applications. First, a high-dimensional multi-modal model is constructed utilizing the target’s color model combined with a shape model based on bilinear interpolation HOG features. Then, candidate targets are searched using particle filtering. The challenge posed by model fusion is addressed by scrupulously quantifying a range of confidences in shape and color models. This is followed by the introduction of a high-confidence fusion criterion, which enables a deeply-adaptive, weighted, and balanced fusion with different confidence levels in the multi-modal model. To counter the issue of static model update parameters, a nonlinear, graded balanced update strategy is designed. Upon testing on the OTB-2015 dataset, this algorithm’s average CLE and OS metrics demonstrated superior performance compared to all reference algorithms, with values of 30.57 and 0.609, respectively. Moreover, with an FPS of 15.67, the algorithm fulfills the real-time operation requirements inherent in tracking algorithms under most conditions. Notably, in some common specific scenarios, the accuracy and success rate of the algorithm also outperform the top-tier algorithms in most cases.
Abstract: Ancient Chinese texts are rich in historical and cultural information. Studying entity relationship extraction of such texts and constructing related knowledge graphs play an important role in cultural inheritance. Given the large number of rare Chinese characters, semantic fuzziness, and ambiguity in ancient Chinese texts, the entity relation joint extraction model based on the BERT-ancient-Chinese pre-trained model (JEBAC) is proposed. First of all, BERT-ancient-Chinese pre-trained model integrates the BiLSTM neural network and attention mechanism (BACBA), identifies all subject and object entities in sentences, and provides a basis for joint extraction of relation and object entities. Next, the normalized coding vector of the subject entity is added to the embedding vector of the whole sentence to better understand the semantic features of the subject entity in the sentence. Finally, combined with the sentence vector with the characteristics of the subject entity and the prompt information of the object entity, the relationship and object entity in the sentence are jointly extracted by BACBA to obtain all triple information (subject entity, relationship, and object entity) in the sentence. The performance of Chinese entity relation extraction DuIE2.0 datasets and the classical Chinese entity relation extraction C-CLUE small sample datasets of CCKS 2021 are compared with that of the existing methods. Experimental results show that the proposed method is more effective in extraction performance, with F1 values up to 79.2% and 55.5%, respectively.
Abstract: With the development of GPS positioning technology and mobile Internet, various location-based services (LBS) applications have accumulated a large amount of spatio-textual data with location and text markup. These data are widely used in location selection decision-making scenarios such as marketing and urban planning. The goal of spatio-textual location selection is to mine the optimal locations from a given candidate set to build new facilities to influence the largest number of spatio-textual objects, such as people or vehicles, where the closer the spatial location and the more similar the text, the greater the influence. However, existing solutions not only fail to consider prevalent peer competition in real life but also ignore user evaluation factors for facilities. To make more reasonable location selection decisions in a peer competition environment combined with user ratings, this study proposes a more rational spatio-textual location selection problem, CoSTUR. To solve the limitation in traditional models where objects can only be influenced by a single facility, a threshold that makes a trade-off between the certainty and quantity of facility influence on objects is introduced, which also models the real-world situation in which multiple facilities could simultaneously influence a specific user. Based on the classical competitive equalization model, quantification of competition among facilities with different ratings is achieved. To reduce the high computational cost for large volumes of data, a novel spatio-textual index structure, TaR-tree, is constructed and two pruning strategies based on influence range are designed with a combination of thresholds to achieve two branch-and-bound solutions for spatial connectivity and range queries. Experimental results on real and synthetic datasets demonstrate that the computational efficiency can be improved by nearly one order of magnitude compared to baseline algorithms, verifying the effectiveness of the proposed method.
Abstract: In recent years, image segmentation applications based on convolutional neural networks (CNNs) have been quite extensive, and great progress has been made in feature extraction. However, with convolutional layers increasingly deep, the receptive field is continually enlarged, which makes the model lose local feature information and affects model performance. Using graph convolution network (GCN) to process information on graph data structures preserves local features without losing local information as the layers deepen. This study focuses on combining U-Net (a kind of symmetric full convolutional networks) feature extraction based on CNN structure with GCN-based image segmentation to extract global and local, shallow, and deep multi-scale feature sets for multimodal glioma MR sequence image segmentation. The process can be divided into two stages. Firstly, U-Net is used to extract features from brain multimodal glioma MR sequence images, and multiple pooling layers are used to realize multi-scale feature extraction and up-sampling for feature fusion, in which the bottom layer outputs lower-level features, and the top layer outputs more abstract high-level features. Secondly, the feature map data obtained by U-Net is converted into the graph structure data required by GCN by dilating neighborhood and sparsification, and the image segmentation problem is converted into the graph node classification problem. Lastly, the graph structure data is classified by cosine similarity. Experimental results achieved segmentation accuracy of 0.996 and sensitivity of 0.892 on the BraTS 2018 public database. Compared with other deep learning models, this method, by fusing multi-scale features and using GCN to establish topological connections between high and low level features, ensures that local information is not lost to achieve better segmentation results, which meets the needs of analyzing clinical glioma MR images, and then effectively improves the diagnostic accuracy of gliomas.
Abstract: The knowledge tracing task aims to accurately track students’ knowledge status in real time and predict students’ future performance by analyzing their historical learning data. This study proposes a deep memory network knowledge tracing model incorporating knowledge point-relationships (HRGKT) to address the problem that current research has neglected complex higher-order relationships in the knowledge points covered by the questions. Firstly, HRGKT uses the knowledge point relationship graph to define the relationship information between nodes in the graph, which represents the rich information between knowledge points. GAT is used to obtain higher-order relationships between them. Then, forgetting exists in the learning process, and HRGKT considers four factors affecting knowledge forgetting to track students’ knowledge status more accurately. Finally, based on the experimental comparison results on real online education datasets, HRGKT performs more accurately in tracing students’ knowledge mastery status and has better prediction performance than current knowledge tracing models.
Abstract: Sparse mobile crowdsensing (MCS) is an emerging paradigm that collects data from a subset of sensing areas and then infers data from other areas. However, there is a shortage or uneven distribution of workers when sparse MCS is applied. Therefore, with a limited budget, it is important to prioritize the involvement of the more important workers in data collection. Additionally, many sparse MCS applications require timely data. Consequently, this study considers data freshness, with age of information (AoI) serving as a freshness metric. To address these challenges, a simplified AoI-aware sensing and inference (SASI) framework is proposed in this study. This framework aims to optimize AoI and inference accuracy by selecting suitable workers for data collection under budget constraints and accurately capturing spatiotemporal relationships in sensed data for inference. Moreover, limited budgets and worker availability may result in a reduced volume of data. Thus, methods for streamlining data inference models are also proposed to enhance inference efficiency. Experiments have substantiated the superiority of this framework in practice.
Abstract: In computer vision segmentation, the Transformer-based image segmentation model needs a large amount of image data to achieve the best performance. However, the data volume of medical images is very scarce compared with natural images. Convolution, with its higher inductive bias, is more suitable for medical images. To combine the long-range representation learning of Transformer with the inductive bias of CNN, a residual ConvNeXt module is designed to simulate the design structure of Transformer in this research. The module, composed of deep convolution and point wise convolution, is used to extract feature information, which greatly reduces the number of parameters. The receptive field and feature channel are effectively scaled and expanded to enrich the feature information. In addition, an asymmetric 3D U-shaped network called ASUNet is proposed for the segmentation of brain tumor images. In the asymmetric U-shaped structure, the output features of the last two encoders are connected by residual connection to expand the number of channels. Finally, deep supervision is used in the process of upsampling, which promotes the recovery of semantic information. Experimental results on the BraTS 2020 and FeTS 2021 datasets show that the dice scores of ET, WT, and TC reach 77.08%, 90.83%, 83.41%, and 75.63%, 90.45, 84.21%, respectively. Comparative experiments show that ASUNet can fully compete with Transformer-based models in terms of accuracy while maintaining the simplicity and efficiency of standard convolutional neural networks.
Abstract: Implicit feedback data plays a crucial role in recommender systems, but it often suffers from sparsity and biases, including exposure bias and conformity bias. Existing debiasing methods tend to address only one type of bias, which can impact personalized recommendation effectiveness, or require a expensive debiased dataset as auxiliary information for multiple debiasing. To address this issue, a collaborative filtering recommendation algorithm specifically designed for sparse implicit feedback data, which can simultaneously debias exposure bias and conformity bias, is proposed. The algorithm utilizes the proposed dual inverse propensity weighting method and a contrastive learning auxiliary task to remove the two biases contained in the implicit feedback data which are input into dual-tower autoencoders so that the complete algorithm can estimate users’ preference probability to items. Experimental results demonstrate that the proposed algorithm outperforms comparative algorithms in terms of normalized discounted cumulative gain (NDCG@K), mean average precision (MAP@K), and recall (Recall@K) on publicly available debiased datasets such as Coat and Yahoo!R3.
Abstract: While natural language generation (NLG)-based large language models, represented by ChatGPT, perform well in various natural language processing tasks, their performance in sequence recognition tasks, such as named entity recognition, is somewhat inferior to that of bidirectional encoder representations from Transformer (BERT)-based deep learning models. To address this issue, this study first transforms the existing Chinese named entity recognition problem into a machine reading comprehension problem. A new name entity recognition method based on in-context learning and fine tuning is proposed, thereby enabling NLG-based language models to achieve good results in named entity recognition without changing base model pre-training parameters. Additionally, since named entities are generated by the model rather than classified from original data, there are no boundary issues. To verify the effectiveness of the new framework on named entity recognition tasks, experiments are conducted on some Chinese named entity recognition datasets. On the Resume and Weibo datasets, the F1 scores reach 96.04% and 67.87% respectively, a gain of 0.4 and 2.7 percentage points over the state-of-the-art models, confirming that the new framework can effectively utilize the text generation advantages of NLG-based language models to complete named entity recognition tasks.
Abstract: Considering the characteristics of the adjacent container terminals in the same region, such as similar logistics functions, overlapping cargo hinterlands, severe disorderly competition, and low resource utilization rates, this study focuses on the problem of multiple container terminal tactical berth and yard incorporate integrative scheduling (MCT-TBY-IIS), where the terminals are managed by the same organization and located adjacent to each other. Based on computational logistics, the MCT-TBY-IIS problem is decomposed into two subproblems of moderate coupling: the multi-terminal dynamic and continuous berth allocation problem (MDC-BAP) and the multi-terminal periodic and rolling yard allocation problem (MPR-YAP). This decomposition is achieved by using the multiple knapsack problem, as well as considering berth depth constraints and export containers with transferable terminal options. Subsequently, the hierarchical nesting-oriented two-stage improved imperialist competitive algorithm (HNO-TSI-ICA) is customized to optimize MCT-TBY-IIS under the guidance of problem-oriented exploration. Lastly, with typical examples of multi-terminal joint operations in the southeast coastal region in China, a combination of two algorithms is selected and applied to HNO-TSI-ICA for solving the MCT-TBY-IIS problem: the prosperity and destruction-oriented improved imperialist competitive algorithm with double assimilation, and the binary imperialist competitive algorithm for the 0-1 knapsack problem. Moreover, the structure of the target cost of the storage yard operation subsystem is stable and not affected by the port load or the length of the planning period. Notably, the horizontal transportation cost of containers in the export container area makes the largest contribution to the sub-target cost of storage yard operations, maintaining a stable proportion of 83%. Through the modeling and optimization of MCT-TBY-IIS, it is found that the multi-terminal cooperative operation mode has great potential to help the neighboring multiple terminals in the same organization reduce costs, increase efficiency, and improve the utilization rate of core resources.
Abstract: Cross-project defect prediction (CPDP) has emerged as a crucial research area in software engineering and data mining. Using defective code from other data-rich projects to build prediction models solves the problem of insufficient data during model construction. However, the distribution difference between the code files of source and target projects results in poor cross-project prediction. Most studies adopt the domain adaptation methods to solve this problem, but the existing methods only focus on the influence of conditional or marginal distribution on domain adaptation, ignoring its dynamics. On the other hand, they fail to choose appropriate pseudo-labels. Based on the above two aspects, this study proposes a cross-project defect prediction method based on dynamic distribution alignment and pseudo-label learning (DPLD). Specifically, the proposed method reduces the marginal and conditional distribution differences between projects in the domain alignment and category alignment modules, respectively, by means of the adversarial domain adaptation method. Additionally, it dynamically and quantitatively characterizes the relative importance of the two distributions using dynamic distribution factors. Furthermore, this study proposes a pseudo-label learning method to enhance the accuracy of pseudo-labels as real labels through the geometric similarity between data. Experiments conducted on the PROMISE dataset show that DPLD achieves average improvements of 22.98% and 15.21% in terms of F-measure and AUC, respectively. These results demonstrate the effectiveness of the DPLD method in reducing distribution differences between projects and improving the performance of cross-project defect prediction.
Abstract: To solve the flow shop scheduling problem with limited buffers and machine processing gears (FSSP_LBMPG), this research establishes a mathematical programming model for green flow shops with limited buffers. The model has two objective functions: the minimized values of maximum completion time and processing energy consumption. With buffer capacity as a constraint, the processing speed and energy consumption are coordinated by reasonably selecting machine processing gears. Based on the characteristics of the problem model, an improved dandelion optimization algorithm (IDOA) is proposed. The algorithm first designs a DOA double-layer real-valued encoding mechanism to represent the solution to the problem according to the characteristics of the scheduling problem. By introducing an initialization mechanism, the quality and efficiency of the initial solution are improved. During algorithm iteration, a real-valued crossover strategy and a variable neighborhood search strategy are designed to compensate for the poor local search ability of the original dandelion algorithm and enhance the development capabilities of the improved algorithm. Comparative experiments on designed cases show that the proposed improved algorithm effectively enhances the performance of the original algorithm, thereby verifying the effectiveness and robustness of the improved algorithm.
Abstract: Due to the small inter-class differences and large intra-class differences of fine-grained images, the key to fine-grained image classification tasks is to find subtle differences between categories. Recently, Vision Transformer-based networks mostly focus on mining the most prominent discriminative region features in images. There are two problems with this. Firstly, the network ignores mining classification clues from other discriminative regions, which can easily confuse similar categories. secondly, the structural relationships of images are ignored, resulting in inaccurate extraction of category features. To solve the above problems, this study proposes two modules: dynamic adaptive modulation and structural relationship learning. The dynamic adaptive modulation module forces the network to search for multiple discriminative regions, and then the structural relationship learning module is used to construct structural relationships between discriminative regions. Finally, the graph convolutional network is used to fuse semantic and structural information to obtain predicted classification results. The proposed method achieves testing accuracy of 92.9% and 93.0% on the CUB-200-2011 dataset and NA-Birds dataset, respectively, which is superior to existing state-of-the-art networks.
Abstract: The original artificial fish swarms algorithm (AFSA) has weak global search ability and poor robustness and is easy to fall into local extremum. Given these problems, an adaptive and differential mutation artificial fish swarm algorithm (ADMAFSA) is proposed. Firstly, it utilizes an adaptive vision field and step length strategy to improve the fine search ability of individuals in better areas of the population and enhance the optimization accuracy of the algorithm. Secondly, to explore potential better areas, the opposition-based learning mechanism is introduced into the random behavior of artificial fish swarms. Thereby, the algorithm can get better global searching ability and avoid premature convergence. Finally, inspired by the differential evolution algorithm, a mutation operation is applied to poorly performing artificial fish to increase the diversity of the fish swarm and reduce the possibility of the algorithm falling into the local extremum. To validate the performance of the improved algorithm, the proposed algorithm is tested with six benchmark test functions and eight CEC2019 functions. The experimental results indicate that, compared to other AFSA variants and novel intelligent algorithms, ADMAFSA demonstrates improvements in terms of optimization accuracy and robustness. Furthermore, in designing the train of gears, the optimization effectiveness of the improved algorithm is further proved.
Abstract: The lattice Boltzmann method (LBM) is a computational fluid dynamics (CFD) method based on molecular motion theory. Improving the parallel computing capability of LBM is an important research topic in the high-performance computing field. This article is based on the SW26010Pro processor and achieves multi-level parallelism of LBM through optimization methods such as region decomposition, data reconstruction, double buffering, and vectorization. Based on the above optimization methods, a grid size of 56 million is tested, and the implementation results show that compared to message passing interface (MPI) level parallelism, the average acceleration factor of the collision process reaches 61.737, and that of the migration process reaches 17.3. At the same time, strong expansion testing is conducted on the lid-driven cavity flow case, with a grid size of 1200×1200×1200. Based on 62 000 computing cores, the parallel efficiency of one million cores exceeds 60.5%.
Abstract: As point cloud acquisition technology develops and the demand for 3D applications increases, real-world scenarios require continuous and dynamic updating of the point cloud analysis network with streaming data. This study proposes a dual feature enhancement for the class-incremental 3D point cloud object learning method, which adapts point cloud object classification to scenarios where new category objects keep emerging in newly acquired data through incremental learning. This method proposes a discriminative local enhancement module and knowledge injection network respectively to alleviate new class bias problems in class-incremental learning by studying the characteristics of point cloud data and old class information. Specifically, the discriminative local enhancement module characterizes the various local structural characteristics of 3D point cloud objects by perceiving expressive local features. Subsequently, the importance weights of each local structure are obtained based on the global information of each local structure, enhancing the perception of differential local features and improving the differentiation of new and old class features. Furthermore, the knowledge injection network injects old knowledge from the old model into the feature learning process of the new model. The enhanced hybrid features can effectively mitigate the increased new class bias caused by the lack of old class information. Under the incremental learning experimental settings of the 3D point cloud datasets ModelNet40, ScanObjectNN, ScanNet, and ShapeNet, extensive experiments show that compared with existing state-of-art methods, the method in this study has an average incremental accuracy improvement of 2.03 %, 2.18 %, 1.65 %, and 1.28 % on the four datasets.
Abstract: The image segmentation of surface defects on solid oxide fuel cell (SOFC) is of great significance for the quality inspection of monolithic SOFC. Aiming at the problems of blurred edges and complex backgrounds of surface defect images of monolithic SOFC, this study proposed a self-attention fusion method for SOFC surface defect image segmentation. Firstly, a multi-channel self-attention module is proposed to enhance the inter-channel correlation and improve the channel representation. Secondly, a multi-scale attention fusion module is utilized to further improve the network’s ability to extract defect features at different scales; and finally, a triplet joint loss function is proposed to supervise the training process. Experiments show that the proposed method can effectively extract surface defects of monolithic SOFC while improving network segmentation performance.
Abstract: This study is designed to address the issues of group user authorization management and integrity verification for shared medical data. First, to prevent group users from overstepping their authority, authorization identifiers are introduced. Medical data owners use authorization identifiers to allocate different access rights to group users, according to user identities. The mathematical construction of authorization identifiers effectively ensures that it cannot be forged. Second, to record revoked users and deprive them of access rights, a revoked user list based on a skip list is introduced. As skip list can support fast lookup and insertion, the overhead of revoking a user is only O(logn). Afterward, the concrete process and mathematical design of shared data integrity verification are improved. Finally, the security analysis and simulation experiments prove the security and efficiency of the scheme.
Abstract: Clustering algorithm based on the minimum spanning tree (MST) can identify clusters with arbitrary shapes, but the algorithm has limitations in efficiently constructing a minimum spanning tree and identifying invalid edges and is easily influenced by noise points. This study proposes an MST clustering algorithm based on local density peaks and label propagation (DPMST) by combining the advantages of the density peaks clustering algorithm to find local density peaks and exclude noise points with the MST algorithm. The DPMST algorithm adopts the shared neighbors-based distance between local density peaks and uses the neighborhood information between local density peaks to efficiently construct minimum spanning trees and identify invalid edges, enabling the discovery of clusters with complex structures. Label propagation is used to enhance the strong labels and weaken the weak labels to refine wrong labels, which can improve the quality of clustering results, especially for border region points as well as revealing complex manifolds. The experimental results on several synthetic and real-world datasets show that the DPMST algorithm outperforms classical clustering algorithms DPC, MST, K-means, DBSCAN, AP, SC, and BIRCH.
Abstract: Semantic segmentation of remote sensing images plays a crucial role in environmental detection, land cover classification, and urban planning. Convolutional neural networks and their improved models are the mainstream methods for semantic segmentation of remote sensing images. However, these methods focus more on learning local contextual features and cannot effectively model the global distribution relationship between different objects, thereby restricting the segmentation performance of the model. To address this issue, this study constructs a global semantic relationship learning module based on convolutional neural networks, which fully learns the symbiotic relationships between different objects and effectively enhances the model’s representation ability. In addition, a multi-scale learning module is constructed to integrate global semantic relationships of different scales, given the scale differences of the objects to be segmented in the same scene. To evaluate the performance of the model, sufficient experiments are conducted on two commonly used remote sensing image datasets, Vaihingen and Potsdam. The experimental results show that the proposed method can achieve higher segmentation performance than existing models based on convolutional neural networks.
Abstract: In solid oncology, on fluorescence microscopy images of interphase nuclei processed with fluorescence in situ hybridization (FISH) technology, DNA amplification often appears as diffraction-limited blobs. Imaging conditions limit image quality, resulting in a low image signal-to-noise ratio of the image, serious background interference, and non-blob structure interference. Designing suitable blob detection methods to provide objective and quantitative data helps doctors diagnose cancer. The algorithm first uses three-layer wavelet multiscale summation to denoise the fluorescence image, then uses the multiscale Laplacian of Gaussian operator to enhance the blob area, and finally suppresses the non-blob area through unilateral second-order Gaussian kernels in four directions to complete blob detection. Experimental results show that for 83 images in the self-built database, the average F-score reaches 0.96, and the average running time is less than 0.5 s.
Abstract: In the task of 3D human pose estimation, the complex topology formed by the connection relationship between human joints presents a challenge. Effective capture of the connections between local joints is possible through modeling this structure with a graph convolutional network. Although non-adjacent joints lack direct physical connections, Transformer encoders establish contextual relationships between joints, which is crucial for better human posture inference due to the biomechanical constraints influencing human motion and pose, as well as the synergistic interaction of human joints. Balancing model performance with a reduction in the number of parameters is of particular importance for large-scale models. To tackle these challenges, a multi-layer spatial feature fusion network model (MLSFFN) based on graph convolution and Transformer is designed. This model proficiently fuses local and global spatial features with a relatively minimal parameter set. Experimental results demonstrate that the proposed method achieves a mean point per joint error (MPJPE) of 49.9 mm on the Human3.6M dataset with only 2.1M parameters. Moreover, the model demonstrates a robust generalization capability.
Abstract: Effective segmentation of clouds and their shadows is a critical issue in the field of remote sensing image processing. It plays a significant role in surface feature extraction, climate detection, atmospheric correction, and more. However, the complex features of clouds and cloud shadows in remote sensing images, characterized by their diverse, irregular distributions and fuzzy boundary information that is easily disturbed by the background, make accurate feature extraction challenging. Moreover, there are few networks specifically designed for this task. To address these issues, this study proposes a dual-path network combining vision Transformer (ViT) and D-UNet. The network is divided into two branches: one is a convolutional local feature extraction module based on the dilated convolution module of D-UNet, which introduces a multi-scale atrous spatial pyramid pooling (ASPP) to extract multi-dimensional features; the other branch comprehends the context semantics globally through the Vision Transformer, enhancing feature extraction. Finally, the study performs an upsampling through a feature fusion decoder. The model achieves superior performance on both a self-built dataset of clouds and cloud shadows and the publicly available HRC_WHU dataset, leading the second-best model by 0.52% and 0.44% in the MIoU metric, achieving 92.05% and 85.37%, respectively.
Abstract: The task of camouflaged object detection involves locating and identifying camouflaged objects in complex scenes. While deep neural network-based methods have been applied to this task, many of them struggle to fully utilize multi-level features of the target for extracting rich semantic information in complex scenes with interference, often relying solely on fixed-size features to identify camouflaged objects. To address this challenge, this study proposes a camouflaged object detection network based on multi-scale and neighbor-level feature fusion. This network comprises two innovative designs: the multi-scale feature perception module and the two-stage neighbor-level interaction module. The former aims to capture rich local-global contrast information in complex scenes by combining multi-scale features. The latter integrates features from adjacent layers to exploit cross-layer correlations and transfer valuable contextual information from the encoder to the decoder network. The proposed method has been evaluated on three public datasets: CHAMELEON, CAMO-Test, and COD10K-Test, and compared with the current mainstream methods. The experimental results demonstrate that the proposed method outperforms the current mainstream methods, achieving excellent performance across all metrics.
Abstract: Synthetic aperture radar (SAR) and optical image fusion aim to leverage the imaging complementarity of satellite sensors for generating more comprehensive geomorphological information. However, existing network models often exhibit low imaging accuracy during the fusion process due to the heterogeneity in data distribution of each single satellite sensor and differences in imaging physical mechanisms. This study proposes the DNAP-Fusion, a novel SAR and optical image fusion network that incorporates dual non-local attention perception. The proposed method utilizes a dual non-local perceptual attention module to extract structural information from SAR images and texture details from optical images within a multi-level image pyramid with a gradually decreasing spatial scale. It then fuses their complementary features in both spatial and channel dimensions. Subsequently, the fused features are injected into the upsampled optical image through image reconstruction, resulting in the final fusion outcome. Additionally, before network training, image encapsulation decisions are employed to enhance the commonality between objects in SAR and optical images within the same scene. Qualitative and quantitative experimental results demonstrate that the proposed method outperforms state-of-the-art (SOTA) multisensor fusion methods. The correlation coefficient (CC) in the objective evaluation indices is 0.990 6, and the peak signal to noise ratio (PSNR) is 32.156 0 dB. Moreover, the proposed method effectively fuses the complementary features of SAR and optical images, offering a valuable idea and method for enhancing the accuracy and effectiveness of remote sensing image fusion.
Abstract: This study proposes a deep learning model for short-term precipitation forecasting, called MSF-Net, to address the limitations of traditional methods. This model integrates multi-source data, including GPM historical precipitation data, ERA5 meteorological data, radar data, and DEM data. A meteorological feature extraction module is employed to learn the meteorological features of the multi-source data. An attention fusion prediction module is used to achieve feature fusion and short-term precipitation forecasting. The precipitation forecasting results of MSF-Net are compared with those of various artificial intelligence methods. Experimental results indicate that MSF-Net achieves optimal threat score (TS) and bias score (Bias). This suggests that it can enhance the effectiveness of data-driven precipitation forecasting within a 6-hour prediction horizon.
Abstract: Gansu painted pottery has the most complete spatial and temporal sequence among all kinds of painted pottery cultures in China. However, no study has been specifically designed for the style transfer of Gansu painted pottery. To promote the excellent traditional Chinese culture, this research constructs the first Gansu painted pottery dataset and proposes a new geometric style transfer method. The method generates a neural distortion field that deforms Gansu painted pottery into the geometric style of the target object while maintaining the texture of the pottery. Two modules are incorporated into the network structure, namely position embedding and feature enhancement, to improve the quality of feature encoding. Shape consistency loss and a smooth regularization term are introduced to the loss function to prevent the details of the painted pottery from mutating and improve the deformation effect. The experimental results show that the model can achieve large-scale geometric style transfer between Gansu painted pottery and objects from different classes, maintaining the details of the pottery and providing new visual experiences.
Abstract: The malicious use of facial recognition technology may lead to personal information leakage, posing a significant threat to individual privacy security. Safeguarding facial privacy through universal adversarial attacks holds crucial research significance. However, existing universal adversarial attack algorithms primarily focus on image classification tasks. When applied to facial recognition models, they often encounter challenges such as low attack success rates and noticeable perturbation generation. To address these challenges, this study proposes a universal adversarial attack method for face recognition based on commonality gradients. This method optimizes universal adversarial perturbation through the common gradient of the adversarial perturbations of multiple face images and uses dominant feature loss to improve the attack capability of the perturbation. Combined with the multi-stage training strategy, it achieves a balance between attack effect and visual quality. Experiments on public datasets prove that the method outperforms methods such as Cos-UAP and SGA in the attack performance on facial recognition models, and the generated adversarial samples have better visual effects, indicating the effectiveness of the proposed method.
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.