Abstract: Training of deep neural networks (DNN) in mission-critical scenarios involves increasingly more resources, which stimulates model stealing from prediction API at the cloud and violates the intellectual property rights of the model owners. To trace public illegal model copies, DNN model fingerprint provides a promising copyright verification option for model owners who want to preserve the model integrity. However, existing fingerprinting schemes are mainly based on output-level traces (e.g., mis-prediction behavior on special inputs) to cause limited stealthiness during model fingerprint verification. This study proposes a novel task-agnostic fingerprinting scheme based on saliency map traces of model prediction. The proposed scheme puts forward a constrained manipulation objective of saliency maps to construct clean-label and natural fingerprint samples, thus significantly improving the stealthiness of model fingerprints. According to extensive evaluation results on three typical tasks, this scheme is proven to substantially enhance the fingerprint effectiveness of existing schemes and remain highly stealthy of model fingerprints.
Abstract: Multimodal sentiment analysis aims to assess users’ sentiment by analyzing the videos they upload on social platforms. The current research on multimodal sentiment analysis primarily focuses on designing complex multimodal fusion networks to learn the consistency information among modalities, which enhances the model’s performance to some extent. However, most of the research overlooks the complementary role played by the difference information among modalities, resulting in sentiment analysis biases. This study proposes a multimodal sentiment analysis model called DERL (dual encoder representation learning) based on dual encoder representation learning. This model learns modality-invariant representations and modality-specific representations by a dual encoder structure. Specifically, a cross-modal interaction encoder based on a hierarchical attention mechanism is employed to learn the modality-invariant representations of all modalities to obtain consistency information. Additionally, an intra-modal encoder based on a self-attention mechanism is adopted to learn the modality-specific representations within each modality and thus capture difference information. Furthermore, two gate network units are designed to enhance and filter the encoded features and enable a better combination of modality-invariant and modality-specific representations. Finally, during fusion, potential similar sentiment between different multimodal representations is captured for sentiment prediction by reducing the L2 distance among them. Experimental results on two publicly available datasets CMU-MOSI and CMU-MOSEI show that this model outperforms a range of baselines.
Abstract: Liver cancer is a malignant liver tumor that originates from liver cells, and its diagnosis has always been a difficult medical problem and a research hotspot in various fields. Early diagnosis of liver cancer can reduce the mortality rate of liver cancer. Histopathological image examination is the gold standard for oncology diagnosis as the images can display the cells and tissue structures of tissue slices, which can be employed to determine cell types, tissue structures, and the number and morphology of abnormal cells, and evaluate the specific condition of the tumor. This study focuses on the application of convolutional neural networks in liver cancer diagnosis algorithms for pathological images, including liver tumor detection, image segmentation, and preoperative prediction. The design ideas and related improvement goals and methods of each algorithm of convolutional neural networks are elaborated in detail to provide clearer reference ideas for researchers. Additionally, the advantages and disadvantages of convolutional neural network algorithms in diagnosis are summarized and analyzed, with potential research hotspots and related difficulties in the future discussed.
Abstract: The multi-client brain tumor classification method based on the convolutional block attention module has inadequate extraction of tumor region details from MRI images, and channel attention and spatial attention interfere with each other under the federated learning framework. In addition, the accuracy in classifying medical tumor data from multiple points is low. To address these problems, this study proposes a brain tumor classification method that amalgamates the federated learning framework with an enhanced CBAM-ResNet18 network. The method leverages the federated learning characteristic to collaboratively work with brain tumor data from multiple sources. It replaces the ReLU activation function with Leaky ReLU to mitigate issues of neuron death. The channel attention module within the convolutional block attention module is modified from a dimension reduction followed by a dimension increment approach to a dimension increment followed by a dimension reduction approach. This change significantly enhances the network’s ability to extract image details. Furthermore, the architecture of the channel attention module and spatial attention module in the convolutional block attention module has been shifted from a cascade structure to a parallel structure, ensuring that the network’s feature extraction capability remains unaffected by the order of processing. A publicly available brain tumor MRI dataset from Kaggle is used in the study. The results demonstrate that FL-CBAM-DIPC-ResNet has a remarkable performance. It achieves impressive accuracy, precision, recall, and F1 score of 97.78%, 97.68%, 97.61%, and 97.63%, respectively. These values of accuracy, precision, recall, and F1 score are 6.54%, 4.78%, 6.80%, and 7.00% higher than those of the baseline model. These experimental findings validate that the proposed method not only overcomes data islands and enables data fusion from multiple sources but also outperforms the majority of existing mainstream models in terms of performance.
Abstract: The visually impaired are a vulnerable group in society and face many obstacles when traveling independently. Providing safe and reliable auxiliary equipment for the visually impaired reflects the progress of social civilization. This study introduces the key technologies for obstacle detection and identification and path planning related algorithms for assisting visually impaired travel. The study mainly analyzes path planning algorithms after obstacle detection, comprehensively compares the application characteristics and scenarios of various technologies, and discusses the research progress of related methods in visually impaired assistive devices. In addition, it summarizes the current application status of multi-technology integration in intelligent assistance equipment. On this basis, combined with the advancement of technologies such as artificial intelligence and embedded devices, the future development direction of auxiliary visually impaired travel equipment is prospected.
Abstract: Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech. To this end, this study proposes a speech enhancement method based on a deep complex axial self-attention convolutional recurrent network (DCACRN), which enhances speech amplitude information and phase information in the complex domain simultaneously. Firstly, a complex convolutional network-based encoder is employed to extract complex features from the input speech signal, and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion, which enhances the information interaction and the gradient flow. Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability. Finally, the reconstruction of the speech signals is realized by the decoder, while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals. Meanwhile, the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals. The experiments are conducted on the public datasets Valentini and DNS Challenge, and the results show that the proposed method improves both the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) metrics compared to other models. In the non-reverberant dataset, PESQ is improved by 12.8% over DCTCRN and 3.9% over DCCRN, which validates the effectiveness of the proposed model in speech enhancement tasks.
Abstract: Mixed sample data enhancement methods focus only on the model’s forward representation of the category to which the image belongs while ignoring the reverse determination of whether the image belongs to a specific category. To address the problem of uniquely describing image categories and affecting model performance, this study proposes a method of image data augmentation with inverse target interference. To prevent overfitting of the network model, the method first modifies the original image to increase the diversity of background and target images. Secondly, the idea of reverse learning is adopted to enable the network model to correctly identify the category that the original image belongs to while fully learning the attributes of the populated image that do not belong to that category to increase the confidence of the network model in identifying the category that the original image belongs to. In conclusion, to verify the method’s effectiveness, the study utilizes different network models to perform many experiments on five datasets including CIFAR-10 and CIFAR-100. Experimental results show that compared to other state-of-the-art data augmentation methods, the proposed method can significantly enhance the model’s learning effect and generalization ability in complex settings.
Abstract: Accurate segmentation of colon polyps is important to remove abnormal tissue and reduce the risk of polyps converting to colon cancer. The current colon polyp segmentation model has the problems of high misjudgment rate and low segmentation accuracy in the segmentation of polyp images. To achieve accurate segmentation of polyp images, this study proposes a colon polyp segmentation model (MGW-Net) combining multi-scale gated convolution and window attention. Firstly, it designs an improved multi-scale gate convolution module (MGCM) to replace the U-Net convolutional block to achieve full extraction of colon polyp image information. Secondly, to reduce the information loss at the skip connection and make full use of the information at the bottom of the network, the study builds a multi-information fusion enhancement module (MFEM) by combining improved dilated convolution and hybrid enhanced residual window attention to optimize the feature fusion at the skip connection. Experimental results on CVC-ClinicDB and Kvasir-SEG data sets show that the similarity coefficients of MGW-Net are 93.8% and 92.7%, and the average crossover ratio is 89.4% and 87.9%, respectively. Experimental results on CVC-ColonDB, CVC-300, and ETIS datasets show that MGW-Net has strong generalization performance, which verifies that MGW-Net can effectively improve the accuracy and robustness of colon polyp segmentation.
Abstract: In the anti-external force damage inspection of transmission lines, the current lightweight target detection algorithm deployed at the edge has insufficient detection accuracy and slow reasoning speed. To solve the above problems, this study proposes a sparse convolution network (SCN) with global context enhancement for anti-external force damage detection of the power grid, Fast-YOLOv5. Based on the YOLOv5 algorithm, the FasterNet+ network is designed as a new feature extraction network, which can maintain detection accuracy, improve the reasoning speed of the model, and reduce computational complexity. In the bottleneck layer of the algorithm, an ECAFN module with efficient channel attention is designed, which improves the detection effect by adaptively calibrating the feature response in the channel direction, efficiently obtaining the cross-channel interactive information and further reducing the amount of parameters and calculation. The study proposes the detection layer of the sparse convolutional network SCN replacement model with context enhancement to enhance the foreground focus feature and improve the prediction ability of the model by capturing the global context information. The experimental results show that compared with the original model, the accuracy of the improved model is increased by 1.9%, and the detection speed is doubled, reaching 56.2 FPS. The amount of parameters and calculation are reduced by 50% and 53% respectively, which is more in line with the requirements for efficient detection of transmission lines.
Abstract: To address the inadequacy of existing remote sensing image super-resolution reconstruction models in long-term feature similarity and multi-scale feature relevance, this study proposes a novel remote sensing image super-resolution reconstruction algorithm based on a cross-scale hybrid attention mechanism. Initially, the study introduces a global layer attention (GLA) mechanism and employs layer-wise attention to weight and merge global features across different levels, thereby modeling the extended dependency between low-resolution and high-resolution image features. Concurrently, it designs a cross-scale local attention (CSLA) mechanism to identify and integrate local information patches in multi-scale low-resolution feature maps that correspond with high-resolution images, enhancing the model’s ability to restore image details. Finally, the study proposes a local information-aware loss function to guide the image reconstruction process, further improving the visual quality and detail preservation of the reconstructed images. Experiments on UC-Merced datasets demonstrate that the proposed method outperforms most mainstream methods in terms of average PSNR/SSIM across three magnification factors and exhibits superior quality and detail preservation in visual results.
Abstract: The distribution of grayscale values in calligraphic character document images exhibits significant variations under poor lighting conditions, resulting in lower image contrast in low-light areas and degradation of morphological texture features of the strokes. Traditional methods typically focus on local information such as mean, squared deviation, and entropy, while giving less consideration to morphological texture, rendering them insensitive to the features of low-contrast areas. To address these issues, this study proposes a binarization method called clustering segmentation-based side-window filter (CS-SWF) specifically designed for degraded calligraphic documents. Firstly, this method utilizes multi-dimensional SWF to describe pixel chunks with similar morphological features. Then, with multiple correction rules, it utilizes downsampling to extract low-latitude information and correct feature regions. Finally, the clustered blocks in the feature map are classified to obtain the binarization results. To evaluate the performance of the proposed method, it is compared with existing methods using F-measure (FM), peak signal-to-noise ratio (PSNR), and distance reciprocal distortion (DRD) as indicators. Experimental results on a self-constructed dataset consisting of 100 handwritten degraded document images demonstrate that the proposed binarization method exhibits greater stability in low-contrast dark regions and outperforms the comparison algorithm in terms of accuracy and robustness.
Abstract: To prevent and reduce the occurrence of WUI fires, this study mines the key causal factors of WUI fires and clarifies the action mechanism between the causal factors. First, this study obtains the causal factors from WUI fire accident cases based on the proposed mining technology and uses the Apriori algorithm to obtain association rules between the causal factors. Then it uses the complex network theory to construct the WUI fire causal factor network, calculate the topological parameters of the network, and analyze the characteristics of the WUI fire causal network. Finally, the study introduces the risk index of the WUI fire causal chain, mines the high-risk connecting edges, and proposes the chain breaking measures. The results show that the WUI fire causal factor network has a small-world characteristic, and high temperature, strong wind, and drought have a greater influence on other causal factors. Burning waste, plant fire, emergency response speed, human arson, and strong wind have important roles in the conversion of different causal factors, which should be controlled better. The most risky side of the network is burning waste → plant fire, and the risk chain can be cut off by enacting regulations such as the prohibition of unauthorized burning waste, to achieve the prevention and active control of WUI fires.
Abstract: Clinical diagnoses can be facilitated through the utilization of multi-organ medical image segmentation. This study proposes a multi-level feature interaction Transformer model to address the issues of weak global feature extraction capability in CNN, weak local feature extraction capability in Transformer, and the quadratic computational complexity problem of Transformer for multi-organ medical image segmentation. The proposed model employs CNN for extracting local features, which are then transformed into global features through Swin Transformer. Multi-level local and global features are generated through down-sampling, and each level of local and global features undergo interaction and enhancement. After the enhancement at each level, the features are cross-fused by multi-level feature fusion modules. The features, once again fused, pass through up-sampling and segmentation heads to produce segmentation masks. The proposed model is experimented on the Synapse and ACDC datasets, achieving average dice similarity coefficient (DSC) and average 95th percentile Hausdorff distance (HD95) values of 80.16% and 19.20 mm, respectively. These results outperform representative models such as LGNet and RFE-UNet. The proposed model is effective for multi-organ medical image segmentation.
Abstract: Videos captured in low illumination environments often carry problems such as low contrast, high noise, and unclear details, which seriously affect computer vision tasks such as target detection and segmentation. Most of the existing low-light video enhancement methods are constructed based on convolutional neural networks. Since convolution cannot make full use of the long-range dependencies between pixels, the generated video often suffers from loss of details and color distortion in some regions. To address the above problems, this study proposes a Siamese low-light video enhancement network coupling local and global features. The model obtains local features of video frames through a deformable convolution-based local feature extraction module and designs a lightweight self-attention module to capture the global features of video frames. Finally, the extracted local and global features are fused by a feature fusion module, which guides the model to generate enhanced videos with more realistic colors and details. The experimental results show that the proposed method can effectively improve the brightness of low-light videos and generate videos with richer colors and details. It also outperforms the methods proposed in recent years in evaluation metrics such as peak signal-to-noise ratio and structural similarity.
Abstract: Synthetic aperture radar (SAR) images provide an important time-series data source for land cover classification. The existing time-series matching algorithms can fully exploit the similarity among time-series features to obtain satisfactory classification results. In this study, the classic time-series matching algorithm named time-weighted dynamic time warping (TWDTW), which comprehensively considers shape similarity and phenological differences, is introduced to guide SAR-based land cover classification. To solve the problem that the traditional TWDTW algorithm only considers the similarity matching of a single feature on the time series, this study proposes a multi-feature fusion-based TWDTW (Mult-TWDTW) algorithm. In the proposed method, three features, namely, the backscattering coefficient, interferometric coherence, and the dual-polarization radar vegetation index (DpRVI), are extracted, and the Mult-TWDTW model is designed by fusing multiple features based on the TWDTW algorithm. To verify the effectiveness of the proposed method, the study implements land cover classification in the Danjiangkou area using time-series data obtained from the Sentinel-1A satellite. Then, the Mult-TWDTW algorithm is compared with the multi-layer perception (MLP), one-dimensional convolutional neural network (1D-CNN), K-means, and support vector machine (SVM) algorithms as well as the TWDTW algorithm using a single feature. The experimental results show that the Mult-TWDTW algorithm obtains the best classification results, manifested as its overall accuracy and Kappa coefficient reaching 95.09% and 91.76, respectively. In summary, the Mult-TWDTW algorithm effectively fuses the information of multiple features and can enhance the potential of time-series matching algorithms in the classification of multiple types of land covers.
Abstract: In the digital era, an increasing number of people prefer shopping on e-commerce platforms. With the development of agricultural product e-commerce platforms, consumers find it challenging to discover suitable products among numerous choices. To enhance user satisfaction and purchase intent, agricultural product e-commerce platforms need to recommend appropriate products based on user preferences. Considering various agricultural features such as season, region, user interests, and product attributes, feature interactions can better capture user demands. This study introduces a new model, fine-grained feature interaction selection networks (FgFisNet). The model effectively learns feature interactions using both the inner product and Hadamard product by introducing fine-grained interaction layers and feature interaction selection layers. During the training process, it automatically identifies important feature interactions, eliminates redundant ones, and feeds the significant feature interactions and first-order features into a deep neural network to obtain the final click through rate (CTR) prediction. Extensive experiments on a real dataset from agricultural e-commerce demonstrate significant economic benefits achieved by the proposed FgFisNet method.
Abstract: The currently available quality assessment methods for images rarely fully utilize the color coding mechanisms of the retina of human eyes and the visual cortex and fail to fully consider the influence of color information on image quality. In this study, an objective assessment model for the color harmony of visible light (dim-light) and infrared color fused images based on multiple visual features is proposed to address the above problems. This model incorporates more color information into image quality assessment by considering a variety of visual features of human eyes comprehensively, including the feature of visual contrast colors, the feature of color information fluctuation, and the feature of advanced visual content. Through feature fusion and support vector regression training, it achieves the objective assessment of the color harmony of color fused images. Experimental comparisons and analyses are conducted using databases of fused images in three typical scenes. The experimental results show that compared with the existing eight methods of objective image quality assessment, the proposed method is more consistent with the subjective perception of human eyes and has higher prediction accuracy.
Abstract: The emergence of network function virtualization (NFV) technology enables network services instantiated as service function chains (SFCs) to share the underlying network, alleviating the rigidity of traditional network architectures. However, the large number of service requests in the network brings new challenges to multi-domain SFC orchestration. For one thing, the privacy of the intra-domain resource information and internal policies of the network makes multi-domain SFC orchestration more complicated. For another, multi-domain SFC orchestration requires the determination of the optimal set of candidate orchestration domains. Nevertheless, previous studies rarely considered the inter-domain load balance, which negatively affected the service acceptance rate. In addition, the orchestration of service requests across network domains places more stringent requirements on the cost and response time of the service. To address the above challenges, this study proposes a construction method for domain-level graphs to meet the privacy requirement of multi-domain networks. Then, a calculation method for domain weight based on the inter-domain load balance is proposed to select SFC orchestration domains. Finally, the study proposes an orchestration algorithm considering the cost and responses time requirements of multi-domain networks. The experimental results show that the proposed algorithm effectively trades off the average service cost and the acceptance rate and also optimizes the average service response time.
Abstract: This study proposes an algorithm named DPCP-CROSS-JOIN for fast co-spatiotemporal relationship join queries of large-scale trajectory data in insufficient cluster computing resource environments. The proposed algorithm discretizes continuous trajectory data by segmenting and cross-coding the temporal fields of trajectory data and conducting spatiality gridded coding and then stores the data in two-level partitions using date and grid region coding. It achieves 3-level indexing and 4-level acceleration for spatiotemporal join queries through cross “equivalent” join queries. As a result, the time complexity of the co-spatiotemporal relationship join queries among n$\cdot $n objects is reduced from O(n2) to O(nlogn). It can improve the efficiency of join queries by up to 30.66 times when Hive and Tez are used on a Hadoop cluster for join queries of large-scale trajectory data. This algorithm uses time-slice and gridding coding as the join condition, thereby cleverly bypassing the real-time calculation of complex expressions during the join process. Moreover, complex expression calculation join is replaced with “equivalent” join to improve the parallelism of MapReduce tasks and enhance the utilization rates of cluster storage and computing resources. Similar tasks of larger scales of trajectory data that are almost impossible to accomplish using general optimization methods can still be completed by the proposed algorithm within a few minutes. The experimental results suggest that the proposed algorithm is efficient and stable, and it is especially suitable for the co-spatiotemporal relationship join queries of large-scale trajectory data under insufficient computing resource conditions. It can also be used as an atomic algorithm for searching accompanying spatiotemporal trajectories and determining the intimacy of relationships among objects. It can be widely applied in fields such as national security and social order maintenance, crime prevention and combat, and urban and rural planning support.
Abstract: Selecting appropriate optimizers for a federated learning environment is an effective way to improve model performance, especially in situations where the data is highly heterogeneous. In this study, the FedAvg and FedALA algorithms are mainly investigated, and an improved version called pFedALA is proposed. PFedALA effectively reduces resource waste caused by synchronization demands by allowing clients to continue local training during waiting periods. Then, the roles of the optimizers in these three algorithms are analyzed in detail, and the performance of various optimizers such as stochastic gradient descent (SGD), Adam, averaged SGD (ASGD), and AdaGrad in handling non-independent and identically distributed (Non-IID) and imbalanced data is compared by testing them on the MNIST and CIFAR-10 datasets. Special attention is given to practical heterogeneity based on the Dirichlet distribution and extreme heterogeneity in terms of data setting. The experimental results suggest the following observations: 1) The pFedALA algorithm outperforms the FedALA algorithm, with an average test accuracy approximately 1% higher than that of FedALA; 2) Optimizers commonly used in traditional single-machine deep learning environments deliver significantly different performance in a federated learning environment. Compared with other mainstream optimizers, the SGD, ASGD, and AdaGrad optimizers appear to be more adaptable and robust in the federated learning environment.
Abstract: The scenes in high-resolution aerial images are of many highly similar categories. The classic classification method based on deep learning offers low operational efficiency because of the redundant floating-point operations generated in the feature extraction process. FasterNet improves the operational efficiency through partial convolution but reduces the feature extraction ability and hence the classification accuracy of the model. To address the above problems, this study proposes a hybrid structure classification method integrating FasterNet and the attention mechanism. Specifically, the “cross-shaped convolution module” is used to partially extract scene features and thereby improve the operational efficiency of the model. Then, a dual-branch attention mechanism that integrates coordinate attention and channel attention is used to enable the model to better extract features. Finally, a residual connection is made between the “cross-shaped convolution module” and the dual-branch attention module so that more task-related features can be obtained from network training, thereby reducing operational costs and improving operational efficiency in addition to improving classification accuracy. The experimental results show that compared with the existing classification models based on deep learning, the proposed method has a short inference time and high accuracy. Its number of parameters is 19M, and its average inference time for one image is 7.1 ms. The classification accuracy of the proposed method on the public datasets NWPU-RESISC45, EuroSAT, VArcGIS (10%), and VArcGIS (20%) is 96.12%, 98.64%, 95.42%, and 97.87%, respectively, which is 2.06%, 0.77%, 1.34%, and 0.65% higher than that of the FasterNet model, respectively.
Abstract: This study aims to solve the problems faced by traditional U-Net network in the semantic segmentation task of street scene images, such as the low accuracy of object segmentation under multi-scale categories and the poor correlation of image context features. To this end, it proposes an improved U-Net semantic segmentation network AS-UNet to achieve accurate segmentation of street scene images. Firstly, the spatial and channel squeeze & excitation block (scSE) attention mechanism module is integrated into the U-Net network to guide the convolutional neural network to focus on semantic categories related to segmentation tasks in both channel and space dimensions, to extract more effective semantic information. Secondly, to obtain the global context information of the image, the multi-scale feature map is aggregated for feature enhancement, and the atrous spatial pyramid pooling (ASPP) multi-scale feature fusion module is embedded into the U-Net network. Finally, the cross-entropy loss function and Dice loss function are combined to solve the problem of unbalanced target categories in street scenes, and the accuracy of segmentation is further improved. The experimental results show that the mean intersection over union (MIoU) of the AS-UNet network model in the Cityscapes and CamVid datasets increases by 3.9% and 3.0%, respectively, compared with the traditional U-Net network. The improved network model significantly improves the segmentation effect of street scene images.
Abstract: Convolutional neural network (CNN), as an important part of U-Net baseline networks in the field of medical image segmentation, is mainly used to deal with the relationships among local feature information. Transformer is a visual model that can effectively strengthen the long-distance dependency among feature information. The previous study shows that Transformer can be combined with CNNs to improve the accuracy of medical image segmentation to a certain extent. However, labeled data in medical images are rarely available while a large amount of data is required to train the Transformer model, exposing the Transformer model to the challenges of high time consumption and a large number of parameters. Due to these considerations, this paper proposes a novel medical image segmentation model based on a hybrid multi-layer perception (MLP) network by combining the multi-scale hybrid MLP with a CNN based on the UNeXt model, namely, the LM-UNet model. This model can effectively enhance the connection between local and global information and strengthen the fusion between feature information. Experiments on multiple datasets reveal significantly improved segmentation performance of the LM-UNet model on the International Skin Imaging Collaboration (ISIC) 2018 dataset manifested as an average Dice coefficient of 92.58% and an average intersection over union (IoU) coefficient of 86.52%, which are 3% and 3.5% higher than those of the UNeXt model, respectively. The segmentation effects of the proposed model on the Osteoarthritis Initiative-Zuse Institute Berlin two-dimensional (OAI-ZIB 2D) and the Breast Ultrasound Image (BUSI) datasets are also substantially superior, represented as average Dice coefficients 2.5% and 1.0% higher than those of the UNeXt counterpart, respectively. In summary, the LM-UNet model not only improves the accuracy of medical image segmentation but also provides better generalization performance.
Abstract: Accurately predicting wind power is of great significance for improving the efficiency and safety of the power system, while the intermittence and randomness of wind energy make it difficult to predict wind power accurately. Therefore, an improved wind power prediction model based on Informer, namely PCI-Informer (PATCH-CNN-IRFFN-Informer) is proposed. The sequence data is divided into subsequence-level patches for feature extraction and integration, which improves the model’s ability to process sequence data and its effectiveness. Multiple-scale causal convolution self-attention mechanism is used to achieve multi-scale local feature fusion, which enhances the model’s understanding and modeling ability of local information. The inverse residual feedforward network (IRFFN) is introduced to enhance the model’s ability to extract and preserve local structural information. Experiment verification is conducted using data from a wind farm, and the results show that compared with mainstream prediction models, the PCI-Informer model achieves better prediction performance at different prediction time steps, with an average reduction of 11.1% in MAE compared with the Informer model, effectively improving the short-term wind power prediction accuracy.
Abstract: GSNet relies on graspness to distinguish graspable areas in cluttered scenes, which significantly improves the accuracy of robot grasping pose detection in cluttered scenes. However, GSNet only uses a fixed-size cylinder to determine the grasping pose parameters and ignores the influence of features of different sizes on grasping pose estimation. To address this problem, this study proposes a multi-scale cylinder attention feature fusion module (Ms-CAFF), which contains two core modules: the attention fusion module and the gating unist. It replaces the original feature extraction method in GSNet and uses an attention mechanism to effectively integrate the geometric features inside the four cylinders of different sizes, thereby enhancing the network’s ability to perceive geometric features at different scales. The experimental results on GraspNet-1Billion, a grabbing pose detection dataset for large-scale cluttered scenes, show that after the introduction of the modules, the accuracy of the network’s grasping poses is increased by up to 10.30% and 6.65%. At the same time, this study applies the network to actual experiments to verify the effectiveness of the method in real scenes.
Abstract: The statistical inference of network data has become a hot topic in statistical research in recent years. The independence assumption among sample data in traditional models often fails to meet the analytical demands of modern network-linked data. This work studies the independent effect of each network node in the network-linked data, and based on the idea of fusion penalty, the independent effect of the associated nodes is converged. Knockoff variables construct covariates independent of the target variable by imitating the structure of the original variable. With the help of Knockoff variables, this study proposes a general method framework for variable selection for network-linked data (NLKF). The study proves that NLKF can control the false discovery rate (FDR) at the target level and has higher statistical power than the Lasso variable selection method. When the covariance of the original data is unknown, the covariance matrix using the estimation still has good statistical properties. Finally, combining the 200 factor samples of more than 4 000 stocks in the A-share market and their network relations constructed by Shenyin Wanguo’s first-level industry classification, an example of the application in the field of financial engineering is given.
Abstract: As one of the important development directions of artificial intelligence, spiking neural networks have received extensive attention in the fields of neuromorphic engineering and brain-inspired computing. To solve the problems of poor generalization as well as large memory and time consumption in spiking neural networks, this study proposes a classification method based on spiking neural networks for spatio-temporal interactive images. Specifically, a temporal efficient training algorithm is introduced to compensate for the kinetic energy loss in the gradient descent process. Then, the spatial learning through time algorithms are integrated to improve the ability of the network to process information efficiently. Finally, the spatial attention mechanism is added to enable the network to better capture important features in the spatial dimension. The experimental results show that the training memory occupation on the three datasets of CIFAR10, DVS Gesture, and CIFAR10-DVS are reduced by 46.68%, 48.52%, and 10.46%, respectively, and the training speed is increased by 2.80 times, 1.31 times, and 2.76 times, respectively. These results indicate that the proposed method improves network performance effectively on the premise of maintaining accuracy.
Abstract: Abstract neural networks have made significant progress and demonstrated remarkable achievements in the field of text summarization. However, abstract summarization is highly likely to generate summaries of poor fidelity and even deviate from the semantic essence of the source documents due to its flexibility. To address this issue, this study proposes two methods to improve the fidelity of summaries. For Method 1, since entities play an important role in summaries and are usually derived from the original documents, the paper suggests allowing the model to copy entities from the source document to ensure that the generated entities match those in the source document and thereby prevent the generation of inconsistent entities. For Method 2, to better prevent the generated summary from deviating from the original text semantically, the study uses key entities and key tokens as two types of guiding information at different levels of granularity in the summary generation process. The performance of the proposed methods is evaluated using the ROUGE metric on two widely used text summarization datasets, namely, CNNDM and XSum. The experimental results demonstrate that both methods have significantly improved the performance of the model. Furthermore, the experiments also prove that the entity copy mechanism can, to some extent, use guiding information to correct introduced semantic noise.
Abstract: Existing Siamese network object tracking techniques perform only one fusion operation of template features and search features, which makes the object features on the fused feature map relatively coarse and unfavorable to the tracker’s precise positioning. In this study, a serial mutual correlation module is designed. It aims to use the existing mutual correlation method to enhance the object features on the fused feature map by performing multiple mutual correlation operations on the template features and the search features, so as to improve the accuracy of the subsequent classification and regression results and strike a balance between speed and accuracy with fewer parameters. The experimental results show that the proposed method achieves good results on four mainstream tracking datasets.
Abstract: This study is dedicated to exploring the complex process of opinion formation in social networks, with a particular focus on the mechanisms of consensus achievement in decentralized environments. A novel opinion classification strategy, termed “the second confidence interval” is proposed to improve the traditional DeGroot consensus model, and two distinct opinion dynamics models are developed: the far attack inbreeding (FAI) model and the outbred recent attack (ORA) model. These models comprehensively consider the degree of individual acceptance and emphasis on surrounding opinions. In addition, through an in-depth analysis of neighborhood opinions in social networks, a comprehensive setup of the individual model is carried out, covering multiple factors such as private opinions, expressed opinions, obstinacy, and preferences. The results indicate that under specific parameter settings, both the FAI and ORA models can reach a consensus more rapidly than the original DeGroot model. Specifically, the ORA model converges at around 700 steps, while the convergence speed of the FAI model gradually approaches that of the ORA model with increasing parameter values. Compared with the baseline model, the ORA model exhibits smaller variations in converged opinion values, no more than 3.5%, whereas the FAI model demonstrates greater volatility. These findings not only deepen people’s understanding of the public opinion formation mechanisms in social networks but also highlight the significance of opinion dynamics within individual neighborhoods in the consensus formation process, offering new perspectives and research directions for future studies in this field.
Abstract: There are challenges in training local models at resource-constrained edges in federated learning systems. The limitations in computing, storage, energy consumption, and other aspects constantly affect the scale and effectiveness of the model. Traditional federated pruning methods prune the model during the federated training process, but they fail to prune models adaptively according to the environment and may remove some important parameters, resulting in poor performance of models. This study proposes a distributed model pruning method based on federated reinforcement learning to solve this problem. Firstly, the model pruning process is abstracted, and a Markov decision process is established. DQN algorithm is used to construct a universal reinforcement pruning model, so as to dynamically adjust the pruning rate and improve model generalization performance. Secondly, an aggregation method for sparse models is designed to reinforce and generalize pruning methods, optimize the structure of the model, and reduce its complexity. Finally, this method is compared with different baselines on multiple publicly available datasets. The experimental results show that the proposed method maintains model effectiveness while reducing model complexity.
Abstract: The traditional prediction models for the corrosion rates of industrial pipelines often have the problems of dependence of feature extraction on artificial experience and insufficient generalization ability. To address this issue, this study combines the convolutional neural network (CNN) with the long short-term memory (LSTM) network and proposes a network model based on the cuckoo search (CS) optimization algorithm, namely, the CNN-LSTM-CS model, to predict the corrosion rates of industrial pipelines. Specifically, the collected pipeline corrosion dataset is pre-processed by normalization. Then, the CNN is used to extract information on the deep features of factors affecting the corrosion rates of the pipelines, and a CNN-LSTM prediction model is constructed by training the LSTM network. Finally, the CS algorithm is used to optimize the parameters of the prediction model, thereby reducing the prediction error and predicting the corrosion rate accurately. The experimental results show that compared with several typical prediction methods for the corrosion rate, the method proposed has higher prediction accuracy and provides a new approach for predicting the corrosion rates of industrial pipelines.
Abstract: During peer evaluation, evaluators may give inaccurate evaluation scores as a result of strategic evaluation. Taking into account the evaluators’ social interest (SI) relations, this study proposes a prediction method named graph attention network-social interest relation-oriented attention network (GAT-SIROAN) that integrates SI and the GAT. This method consists of a weighted network SIROAN that represents the evaluators’ relations with the solutions and a GAT that is used to predict peer evaluation scores. In the SIROAN, the interrupted time-series analysis (ITSA) method is applied to define the evaluators’ two characteristics: the self-evaluation ability and the peer evaluation ability, and these two characteristics are compared to obtain the SI factors and relations among the evaluators. In the score prediction stage, considering the importance of each node, this study uses a self-attention mechanism to calculate the attention coefficients at the nodes, thereby improving the prediction ability. Network parameters are learned by minimizing the root mean square error (RMSE) to obtain more accurate predicted peer evaluation scores. The GAT-SIROAN method is compared experimentally with five baseline methods, namely, the mean, median, PeerRank, RankwithTA, and GCN-SOIN methods, on real datasets. The results show that the GAT-SIROAN method outperforms all the above baseline methods in the RMSE.
Abstract: MonteCloPi is an anytime subgroup discovery algorithm based on Monte Carlo tree search (MCTS). It aims to build an asymmetric best-first search tree to discover a diverse pattern set with high quality by MCTS policies, while it is limited to a binary target. To this end, this study combines the characteristics of the numerical target to extend the MonteCloPi algorithm to the numerical target. The study selects the appropriate C value for the upper confidence bound (UCB) formula, adjusts the expansion weight of each sample dynamically as well as prunes the search tree, and uses the adaptivetop-k-mean-update policy. Finally, the experimental results on the UCI datasets and the National Health and Nutrition Examination Survey (NHANES) audiometry datasets show that the proposed algorithm outperforms other algorithms in terms of discovering diverse pattern sets with high quality and the interpretability of the best subgroup.
Abstract: In the field of short-text intent recognition, convolutional neural networks (CNN) have garnered considerable attention due to their outstanding performance in extracting local information. Nevertheless, their limitations arise from the difficulty in capturing the global features of short-text corpora. To address this issue, this study combines the strengths of TextCNN and BiGRU-att to propose a dual-channel short-text intent recognition model, aiming to better recognize the intent of short texts by leveraging both local and global features, thereby compensating for the model’s inadequacies in capturing overall text features. The AB-CNN-BGRU-att model initially utilizes an ALBERT multi-layer bidirectional Transformer structure to vectorize the input text and subsequently feeds these vectors separately into TextCNN and BiGRU network models to extract local and global features, respectively. The fusion of these two types of features, followed by passing through fully connected layers and inputting into the Softmax function, yields the intent labels. The experimental results demonstrate that on the THUCNews_Title dataset, the proposed AB-CNN-BGRU-att algorithm achieves an accuracy (Acc) of 96.68% and an F1 score of 96.67%, exhibiting superior performance compared with other commonly used intent recognition models.
Abstract: This study analyzes the multivariate, nonlinear, and strong coupling characteristics of permanent magnet synchronous motors (PMSM) in industrial applications, as well as the difficulties in their parameter adjustment, response delay, poor robustness, and adaptability issues encountered with traditional PID control. A novel approach combining a twin delayed deep deterministic policy gradient (TD3) algorithm with PID control is proposed to optimize PID parameter adjustment for more accurate motor speed control. In this method, bidirectional long short-term memory networks (BiLSTM) are integrated into the Actor and Critic networks, significantly enhancing the processing capability for time-series data of PMSM’s dynamic behavior. This enables the system to accurately capture the current state and predict future trends, achieving more precise and adaptive self-tuning of PID parameters. Moreover, the integration of entropy regularization and curiosity-driven exploration methods further enhances the diversity of the strategy, preventing premature convergence to suboptimal strategies and encouraging in-depth exploration of unknown environments. To validate the effectiveness of the proposed method, a simulation model of a PMSM is designed, and the proposed BiLSTM-TD3-ICE method is compared with the traditional TD3 and the classical Ziegler-Nichols (Z-N) method. The experimental results demonstrate the significant advantages of the proposed strategy in control performance.
Abstract: Optical coherence tomography (OCT) is a new type of ophthalmic diagnosis method with non-contact, high resolution, and other characteristics, which has been used as an important reference for doctors to clinically diagnose ophthalmic diseases. As early detection and clinical diagnosis of retinopathy are crucial, it is necessary to change the time-consuming and laborious status quo of the manual classification of diseases. To this end, this study proposes a multi-classification recognition method for retinal OCT images based on an improved MobileNetV2 neural network. This method uses feature fusion technology to process images and designs an attention increase mechanism to improve the network model, greatly improving the classification accuracy of OCT images. Compared with the original algorithm, the classification effect has been significantly improved, and the classification accuracy, recall value, accuracy, and F1 value of the proposed model reach 98.3%, 98.44%, 98.94% and 98.69%, respectively, which has exceeded the accuracy of manual classification. Such methods not only speed up the diagnostic process, reduce the burden on doctors, and improve the quality of diagnosis in actual diagnosis, but also provide a new direction for ophthalmic medical research.
Abstract: To address the challenge of data sparsity within session recommendation systems, this study introduces a self-supervised graph convolution session recommendation model based on the attention mechanism (ATSGCN). The model constructs the session sequence into three distinct views: the hypergraph view, item view, and session view, showing the high-order and low-order connection relationships of the session. Secondly, the hypergraph view employs hypergraph convolutional networks to capture higher-order pairwise relationships among items within a conversation. The item view and session view employ graph convolutional networks and attention mechanisms respectively to capture lower-order connection details within local conversation data at both item and session levels. Finally, self-supervised learning is adopted to maximize the mutual information between the session representations learned by the two encoders, thereby effectively improving recommendation performance. Comparative experiment on the Nowplaying and Diginetica public datasets demonstrates the superior performance of the proposed model over the baseline model.
Abstract: Spatiotemporal forecasting finds extensive applications in domains such as pollution management, transportation, energy, and meteorology. Predicting PM2.5 concentration, as a quintessential spatiotemporal forecasting task, necessitates the analysis and utilization of spatiotemporal dependencies within air quality data. Existing studies on spatiotemporal graph neural networks (ST-GNNs) either employ predefined heuristic rules or trainable parameters for adjacency matrices, posing challenges in accurately representing authentic inter-station relationships. This study introduces the adaptive hierarchical graph convolutional neural network (AHGCNN) to address these issues concerning PM2.5 prediction. Firstly, a hierarchical mapping graph convolutional architecture is introduced, employing distinct self-learning adjacency matrices at different hierarchical levels, efficiently uncovering unique spatiotemporal dependencies among various monitoring stations. Secondly, an attention-based aggregation mechanism is employed to connect adjacency matrices across different hierarchical levels, expediting the convergence process. Finally, the hidden spatial states are fused with gated recurrent unit (GRU), forming a unified predictive framework capable of concurrently capturing multi-level spatial and temporal dependencies, ultimately delivering the prediction results. In the experiments, the proposed model is comparatively analyzed with seven mainstream models. The results indicate that the model can effectively capture the spatiotemporal dependencies between air monitoring stations, improving predictive accuracy.
Abstract: To address the problems of few shots and varying sizes in the surface defects on steel strips in industrial scenarios, this study proposes a detection network for surface defects on steel strips readily applicable to few-shot situations. Specifically, the algorithm is based on the you only look once version 5 small (YOLOv5s) framework and a multi-scale path aggregation network with an attention mechanism is designed to serve as the neck of the model and thereby enhance the ability of the model to predict the defect objects on multiple scales. Then, a self-adaptive decoupled detection structure is proposed to alleviate the contradiction among classification and positioning tasks in few-shot scenarios. Finally, a bounding box regression loss function fused with the Wasserstein distance is presented to improve the accuracy of the model in detecting small defect objects. Experiments show that the proposed model outperforms other few-shot object detection models on the few-shot dataset of surface defects on steel strips, indicating that it is more suitable for few-shot defect detection tasks in industrial environments.
Abstract: The previous methods for precipitation nowcasting based on deep learning try to model the spatiotemporal evolution of radar echoes in a unified architecture. However, these methods may face difficulty in capturing the complex spatiotemporal relationships completely. This study proposes a two-stage precipitation nowcasting network based on the Halo attention mechanism. This network divides the spatiotemporal evolution process of precipitation nowcasting into two stages: motion trend prediction and spatial appearance reconstruction. Firstly, a learnable optical flow module models the motion trend of radar echoes and generates coarse prediction results. Secondly, a feature reconstruction module models the spatial appearance changes in the historical radar echo sequences and refines the spatial appearance of the coarse-grained prediction results, generating fine-grained radar echo maps. The experimental results on the CIKM dataset demonstrate that the proposed method outperforms mainstream methods. The average Heidke skill score and critical success index are improved by 4.60% and 3.63%, reaching 0.48 and 0.45, respectively. The structural similarity index is improved by 4.84%, reaching 0.52, and the mean squared error is reduced by 6.13%, reaching 70.23.
Abstract: Unlike appearance-based methods whose input may bring in some background noises, skeleton-based gait representation methods take key joints as input, which can neglect the noise interference. Meanwhile, most of the skeleton-based representation methods ignore the significance of the prior knowledge of human body structure or tend to focus on the local features. This study proposes a skeleton-based gait recognition framework, GaitBody, to capture more distinctive features from the gait sequences. Firstly, the study leverages a temporal multi-scale convolution module with a large kernel size to learn the multi-granularity temporal information. Secondly, it introduces topology information of the human body into a self-attention mechanism to exploit the spatial representations. Moreover, to make full use of temporal information, the most salient temporal information is generated and introduced into the self-attention mechanism. Experiments on the CASIA-B and OUMVLP-Pose datasets show that the method achieves state-of-the-art performance in skeleton-based gait recognition, and ablation studies show the effectiveness of the proposed modules.
Abstract: This study proposes a two-stage path planning method for the path planning task of the inner wall operation of a mobile robot in multi-room. In the first stage, for the sensor failure caused by dust or fog in the environment during wall operation and incomplete path planning when there are many exits in a room, the study proposes a start-point automatically selected wall following path planning method, which is based on grid maps to generate the wall following paths offline. In the second stage, for the dynamic obstacle avoidance problem during point-to-point path planning, it proposes a point-to-point path planning method based on the prioritized experience replay soft actor critic (PSAC) algorithm, which introduces the prioritized experience playback strategy in the soft actor critic (SAC) to achieve dynamic obstacle avoidance. The comparison experiments of wall following path planning and dynamic obstacle avoidance are designed to verify the effectiveness of the proposed method in the indoor wall following path planning and point-to-point path planning.
Abstract: A new method for short-term power load forecasting is proposed to address issues such as complex and non-stationary load data, as well as large prediction errors. Firstly, this study utilizes the maximum information coefficient (MIC) to analyze the correlation of feature variables and selects relevant variables related to power load sequences. At the same time, as the variational mode decomposition (VMD) method is susceptible to subjective factors, the study employs the rime optimization algorithm (RIME) to optimize VMD and decompose the original power load sequence. Then, the long and short-term time series network (LSTNet) is improved as the prediction model by replacing the recursive LSTM layer with BiLSTM and incorporating the convolutional block attention mechanism (CBAM). Comparative experiments and ablation experiments demonstrate that RIME-VMD reduces the root mean square error (RMSE) of the LSTM, GRU, and LSTNet models by more than 20%, significantly improving the prediction accuracy of the models, and can be adapted to different prediction models. Compared with LSTM, GRU, and LSTNet, the proposed BLSTNet-CBAM model reduces the RMSE by 35.54%, 6.78%, and 1.46% respectively, improving the accuracy of short-term power load forecasting.
Abstract: In the context of current multi-modal emotion analysis in videos, the influence of modality representation learning on modality fusion and final classification results has not been adequately considered. To this end, this study proposes a multi-modal emotion analysis model that integrates cross-modal representation learning. Firstly, the study utilizes Bert and LSTM to extract internal information from text, audio, and visual modalities separately, followed by cross-modal representation learning to obtain more information-rich unimodal features. In the modal fusion stage, the study fuses the gating mechanism and improves the traditional Transformer fusion mechanism to control the information flow more accurately. Experimental results on the publicly available CMU-MOSI and CMU-MOSEI datasets demonstrate that the accuracy and F1 score of this model are improved compared with the traditional models, validating the effectiveness of this model.
Abstract: At present, there are many small targets in UAV images and the background is complex, which makes it easy to cause a high error detection rate in target detection. To solve these problems, this study proposes a small target detection algorithm for high-order depth separable UAV images. Firstly, by combining the CSPNet structure and ConvMixer network, the study utilizes the deeply separable convolution kernel to obtain the gradient binding information and introduces a recursively gated convolution C3 module to improve the higher-order spatial interaction ability of the model and enhance the sensitivity of the network to small targets. Secondly, the detection head adopts two heads to decouple and respectively outputs the feature map classification and position information, accelerating the model convergence speed. Finally, the border loss function EIoU is leveraged to improve the accuracy of the detection frame. The experimental results on the VisDrone2019 data set show that the detection accuracy of the model reaches 35.1%, and the missing and false detection rates of the model are significantly reduced, which can be effectively applied to the small target detection task of UAV images. The model generalization ability is tested on the DOTA 1.0 dataset and the HRSID dataset, and the experimental results show that the model has good robustness.
Abstract: With the continuous evolution of computer technology, process simulation is becoming increasingly widely employed in various industries and utilizes simulation models to mimic business process behavior. Additionally, it can be adopted to predict and optimize system performance, assess the impact of decisions, provide a decision-making basis for managers, and reduce the experimental cost and time. Currently, how to efficiently develop a simulation model that can be trusted has caught widespread attention. This study traces, summarizes, and analyzes the relevant references on methods for building business process simulation models. Meanwhile, the processes, advantages, disadvantages, and progress of process model-based, system dynamics-based, and deep learning-based simulation modeling approaches are presented. Finally, the challenges and future directions of process simulation are discussed to provide references for future research in this field.
Abstract: The security of electric energy plays an important role in national security. With the development of power 5G communication, a large number of power terminals have positioning demand. The traditional global positioning system (GPS) is vulnerable to spoofing. How to improve the security of GPS effectively has become an urgent problem. This study proposes a GPS spoofing detection algorithm with base station assistance in power 5G terminals. It uses the base station positioning with high security to verify the GPS positioning that may be spoofed and introduces the consistency factor (CF) to measure the consistency between GPS positioning and base station positioning. If CF is greater than a threshold, the GPS positioning is classified as spoofed. Otherwise, it is judged as normal. The experimental results show that the accuracy of the algorithm is 99.98%, higher than that of traditional classification algorithms based on machine learning. In addition, our scheme is also faster than those algorithms.
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.