Abstract: In relation extraction tasks, building dependency trees or syntactic trees is usually adopted to obtain deeper and richer structural information. Graph neural network, as a powerful representation learning method for graph data structures, can better model such complex data structures. This study introduces a relation extraction method based on graph neural network, aiming to gain a deep understanding of the latest research progress and trends in this field. Firstly, it briefly introduces the classification and structure of relation graph neural networks and then elaborates on the core technology and application scenarios of relation extraction methods based on graph neural networks, including sentence-level and document-level methods, and joint entity-relation extraction methods. The advantages, disadvantages, and performance of each method are analyzed and compared, and possible future research directions and challenges are discussed.
Abstract: Retinal blood vessel image segmentation has a good auxiliary diagnostic effect on various eye diseases such as glaucoma and diabetic retinopathy. Currently, deep learning, with its powerful ability to discover abstract features, is expected to meet people’s needs for extracting feature information from retinal blood vessel images for automatic image segmentation. It has become a research hotspot in the field of retinal blood vessel image segmentation. To better grasp the research progress in this field, this study summarizes the relevant datasets and evaluation indicators and elaborates in detail on the application of deep learning in retinal blood vessel image segmentation. It focuses on the basic ideas, network structure, and improvements of various segmentation methods, analyzing the limitations and challenges faced by existing retinal blood vessel image segmentation methods and looking forward to the future research direction in this field.
Abstract: With the improvement of the Internet and connection technology, the data generated by sensors is gradually becoming complex. Deep learning methods have made great progress in anomaly detection of high-dimensional data. The graph deviation network (GDN) learns the relationship between sensor nodes to predict anomalies and has achieved certain results. Since the GDN model fails to deal with time dependence and instability of abnormal data, an external attention autoencoder based on GDN (AEEA-GDN) is proposed to deeply extract features. In addition, an adaptive learning mechanism is introduced during model training to help the network better adapt to changes in abnormal data. Experimental results on three real-world collected sensor datasets show that the AEEA-GDN model can more accurately detect anomalies than baseline methods and has better overall performance.
Abstract: Detecting outliers is crucial for practical applications in large and high-dimensional datasets. Outlier detection is the process of identifying data points that deviate from the typical data distribution. This process primarily involves density estimation. Substantial advancements are achieved by models like the deep autoencoder Gaussian mixture model, which initially reduces dimensionality and subsequently estimates density. However, it introduces noise into the low-dimensional latent space and faces limitations in optimizing the density estimation module, such as the requirement to ensure positive definiteness of the covariance matrix. To overcome these constraints, this study introduces the deep autoencoder normalizing flow (DANF) for unsupervised outlier detection. The model employs deep autoencoders to produce low-dimensional latent space representations and reconstruction errors for individual input samples. These outputs are subsequently fed into a normalizing flow (NF) for transformation into a Gaussian distribution. Experimental results on several widely recognized benchmark datasets reveal that the DANF model consistently surpasses state-of-the-art outlier detection methods. The most notable improvement is a remarkable 26.43% increase in the F1-score evaluation metric.
Abstract: Link prediction is an important means of mining potential relationships between nodes in the future through known network topology and node attributes, which is an effective method for predicting missing links and identifying false links and has practical significance in the study of social network structure evolution. Traditional link prediction methods are based on the similarity of node information or path information. However, the former considers a single index, resulting in limited prediction accuracy, and the latter is not suitable for application in large-scale networks due to excessive computational complexity. Through the analysis of network topology, this study proposes a social network link prediction method based on the interacting degree of nodes (IDN). The method first introduces the concept of node efficiency based on the path characteristics between nodes in the network, which improves the accuracy of link prediction between nodes without common neighbors. In order to further explore the relevant attributes of common neighbors between nodes, by analyzing the topology of common neighbors between nodes, the method also innovatively integrates the path characteristics and local information to propose the definition of the IDN in a social network, which accurately portrays the degree of similarity between nodes and thus enhances the prediction ability of network links. Finally, this study validates the IDN method with the help of six real network datasets, and the experimental results show that, compared with the current mainstream algorithms, the method proposed in this study shows better prediction performance in both AUC and Precision evaluation indexes, and the prediction results have been improved by an average of 22% and 54%, respectively. Therefore, the proposal of node interaction degree has high feasibility and effectiveness in link prediction.
Abstract: Session-based recommendation algorithms only statically model a single preference of users and fail to capture the preference fluctuation of the users affected by the environment, thus reducing the recommendation accuracy. Therefore, this study proposes a session recommendation method that integrates dual-branch dynamic preferences. First, the heterogeneous hypergraph is used to model different types of information, and a dual-branch aggregation mechanism is designed to acquire and integrate the information in the heterogeneous hypergraph and learn the relationship between multiple types of nodes. Then, a price-embedded enhancer is used to strengthen the relationship between categories and prices. Second, a two-layer preference encoder is designed, which uses a multi-scale temporal Transformer to extract the user’s dynamic price preference, and a soft attention mechanism and reverse position encoding are used to learn the user’s dynamic interest preference. Finally, a gating mechanism is used to integrate the user’s multi-type dynamic preferences and make recommendations to users. By conducting experiments on two datasets, namely Cosmetics and Diginetica-buy, the results prove that there is a significant improvement in Precision and MRR evaluation metrics compared with other algorithms.
Abstract: Unlike appearance-based methods whose input may bring in some background noises, skeleton-based gait representation methods take key joints as input, which can neglect the noise interference. Meanwhile, most of the skeleton-based representation methods ignore the significance of the prior knowledge of human body structure or tend to focus on the local features. This study proposes a skeleton-based gait recognition framework, GaitBody, to capture more distinctive features from the gait sequences. Firstly, the study leverages a temporal multi-scale convolution module with a large kernel size to learn the multi-granularity temporal information. Secondly, it introduces topology information of the human body into a self-attention mechanism to exploit the spatial representations. Moreover, to make full use of temporal information, the most salient temporal information is generated and introduced into the self-attention mechanism. Experiments on the CASIA-B and OUMVLP-Pose datasets show that the method achieves state-of-the-art performance in skeleton-based gait recognition, and ablation studies show the effectiveness of the proposed modules.
Abstract: A new method for short-term power load forecasting is proposed to address issues such as complex and non-stationary load data, as well as large prediction errors. Firstly, this study utilizes the maximum information coefficient (MIC) to analyze the correlation of feature variables and selects relevant variables related to power load sequences. At the same time, as the variational mode decomposition (VMD) method is susceptible to subjective factors, the study employs the rime optimization algorithm (RIME) to optimize VMD and decompose the original power load sequence. Then, the long and short-term time series network (LSTNet) is improved as the prediction model by replacing the recursive LSTM layer with BiLSTM and incorporating the convolutional block attention mechanism (CBAM). Comparative experiments and ablation experiments demonstrate that RIME-VMD reduces the root mean square error (RMSE) of the LSTM, GRU, and LSTNet models by more than 20%, significantly improving the prediction accuracy of the models, and can be adapted to different prediction models. Compared with LSTM, GRU, and LSTNet, the proposed BLSTNet-CBAM model reduces the RMSE by 35.54%, 6.78%, and 1.46% respectively, improving the accuracy of short-term power load forecasting.
Abstract: In the context of current multi-modal emotion analysis in videos, the influence of modality representation learning on modality fusion and final classification results has not been adequately considered. To this end, this study proposes a multi-modal emotion analysis model that integrates cross-modal representation learning. Firstly, the study utilizes Bert and LSTM to extract internal information from text, audio, and visual modalities separately, followed by cross-modal representation learning to obtain more information-rich unimodal features. In the modal fusion stage, the study fuses the gating mechanism and improves the traditional Transformer fusion mechanism to control the information flow more accurately. Experimental results on the publicly available CMU-MOSI and CMU-MOSEI datasets demonstrate that the accuracy and F1 score of this model are improved compared with the traditional models, validating the effectiveness of this model.
Abstract: This study proposes a two-stage path planning method for the path planning task of the inner wall operation of a mobile robot in multi-room. In the first stage, for the sensor failure caused by dust or fog in the environment during wall operation and incomplete path planning when there are many exits in a room, the study proposes a start-point automatically selected wall following path planning method, which is based on grid maps to generate the wall following paths offline. In the second stage, for the dynamic obstacle avoidance problem during point-to-point path planning, it proposes a point-to-point path planning method based on the prioritized experience replay soft actor critic (PSAC) algorithm, which introduces the prioritized experience playback strategy in the soft actor critic (SAC) to achieve dynamic obstacle avoidance. The comparison experiments of wall following path planning and dynamic obstacle avoidance are designed to verify the effectiveness of the proposed method in the indoor wall following path planning and point-to-point path planning.
Abstract: Recently, reinforcement learning techniques have achieved success in sequence recommendation systems, as they can learn effective recommendation strategies from long-term user feedback signals. However, the design of the model’s reward function faces the challenge of low discriminability. This limits the model’s ability to learn the value differences between different user feedback signals, leading to suboptimal recommendation strategies. Existing studies mainly ensure discriminability of the reward function by adjusting decay factors, but this relies on expert prior knowledge and lacks a theoretical foundation. In order to more reasonably design the reward function and enhance its discriminability, this study analyzes the recommendation system based on counterfactual reasoning and proposes a sequence recommendation algorithm CAL4Rec based on counterfactual discriminability enhancement. Firstly, the proposed method uses structural causal graphs to describe the sequence recommendation process and creatively defines causally identifiable value reward discriminability using causal graphs. Secondly, this method uses a counterfactual generative adversarial self-supervised learning process to optimize the recommendation strategy network and learn the user’s true preferences. Extensive comparative and ablation experiments were conducted on a series of sequence recommendation benchmark datasets for CAL4Rec, and the experimental results show that CAL4Rec’s improvement is effective for various network implementation structures (average 2.34%).
Abstract: Missing data affects the quality of the data, which may lead to inaccurate results and reduce the reliability of the model. Missing value filling reduces the bias and facilitates subsequent analysis. Most missing value filling algorithms assume a weak correlation or even no correlation between multiple missing values, with little consideration of the correlation between missing values and the order of filling. Independent filling of missing values in the sales domain reduces the utilization of missing value information, which has a greater impact on the accuracy of missing value filling. To address the above problems, this study takes the sales field as the research objective and explores the updating mechanism of multiple missing values based on the multidimensional characteristics of sales behavior and the spatial distribution characteristics of output values of different models. In addition, the work studies the incremental filling method of multiple missing values of sales data, which is based on the correlation of features, orders the missing features, and fuses the already-filled data as an information element to incrementally fill in the following missing values. The algorithm also takes into account the generalization of the model. The algorithm takes into account the generalization of the model and the information correlation between the missing data and combines with multi-model fusion to effectively fill multiple missing values. Finally, the effectiveness of the proposed algorithm is verified by a large number of experimental comparisons based on a real-chain drugstore sales dataset.
Abstract: The images generated by low-light image enhancement algorithms based on deep learning generally have problems such as noise highlighting and detail loss. However, the performance of end-to-end deep learning algorithms largely depends on the extraction ability of the backbone network. Therefore, exploring more effective backbone network structures can improve the performance benefits of low-light enhancement tasks. This study proposes an image enhancement algorithm based on a composite backbone network fusion strategy, which integrates backbone networks from different image enhancement algorithms to improve the overall network’s feature extraction ability. The algorithm integrates feature information from different backbone networks layer by layer and guides composite features into the decoder. It then fully utilizes different upsampling methods to stack the fused features of the backbone network, ultimately generating images under normal lighting conditions. Through quantitative and qualitative comparative experiments with existing mainstream algorithms, the results show that our method significantly improves the brightness of low-light images while preserving the detailed features of the images. In terms of objective indicators such as peak signal-to-noise ratio and structural similarity, it achieves 24.35 dB and 0.871 in the LOL-V2 dataset, effectively solving the problems of noise highlighting and detail loss after image enhancement.
Abstract: Leakage tolerance refers to allowing the scheme to leak some secret information to enhance the robustness of the signature scheme, which is suitable for most occasions where the equipment and communication lines cannot be perfectly protected. The length of the short signature is generally only half that of the ordinary signature, which can greatly reduce the communication data volume of the narrowband real-time interactive system. This study proposes a short signature scheme for the signature key associated with the information to be signed, and the scheme is tolerant to partial leakage. The efficiency and security of the scheme are analyzed, and the security of the scheme is proved under the tolerant leak oracle. The experimental results show that the scheme has good performance and is suitable for applications with limited transmission bandwidth.
Abstract: The variability in size, shape, color, and texture, along with the blurring demarcation of the bowel wall, presents a significant challenge in colon polyp segmentation. The detail information loss and lack of interaction between different feature levels due to continuous sampling in single-branch networks lead to poor segmentation results. To address this problem, this study proposes a two-branch colon polyp segmentation network based on local-global feature interaction. The network utilizes a dual branch structure consisting of CNN and Transformer, systematically capturing the precise local details and the global semantic features of the polyp in each layer. To make full use of the complementary nature of feature information at different levels and scales, and to utilize the guidance and enhancement of shallow detailed features by deep semantic features, the paper designs the feature cooperative interaction module to dynamically sense and aggregate cross-level feature interaction information. To enhance the feature of the polyp lesion region while reducing background noise, the feature enhancement module utilizes spatial and channel attention mechanisms. Additionally, the skip-connection mechanism in conjunction with the attention gate further highlights boundary information, resulting in improved edge region segmentation accuracy. Experiments show that the proposed network achieves better mDice and mIoU scores than the baseline network on multiple polyp segmentation datasets, with higher segmentation accuracy and stability.
Abstract: In the era of big data, the number of algorithms used for data processing is exploding. The current management method for a large number of algorithms is usually to classify and label the algorithms, or store task flows composed of algorithms on a task-by-task basis, while insufficient attention has been paid to the topological relationships between algorithms in the task set. With the accumulation of domain knowledge and task flows, the dependency between algorithms becomes increasingly important. Based on the requirement of massive algorithm management, this study proposes a management method for splitting branched dependencies into unbranched dependencies. By searching for topological relationships through pointers in an index-free adjacency graph database, it avoids Join operations and has innate advantages in managing algorithm dependencies. In addition, this study proposes connection points to highlight the reusability of algorithm modules, which are utilized to represent dependency edges in the graph model. The position of algorithm modules in different task flows can be distinguished, so that algorithm modules reused by multiple tasks only need to be represented by one algorithm module node in the graph. Finally, based on specific projects, the algorithm relationship management method proposed in this study is validated. It is proved that the algorithm relationship management method has significant advantages in scenarios where the number of algorithms is large and the algorithm modules are highly reusable.
Abstract: The multi-client brain tumor classification method based on the convolutional block attention module has inadequate extraction of tumor region details from MRI images, and channel attention and spatial attention interfere with each other under the federated learning framework. In addition, the accuracy in classifying medical tumor data from multiple points is low. To address these problems, this study proposes a brain tumor classification method that amalgamates the federated learning framework with an enhanced CBAM-ResNet18 network. The method leverages the federated learning characteristic to collaboratively work with brain tumor data from multiple sources. It replaces the ReLU activation function with Leaky ReLU to mitigate issues of neuron death. The channel attention module within the convolutional block attention module is modified from a dimension reduction followed by a dimension increment approach to a dimension increment followed by a dimension reduction approach. This change significantly enhances the network’s ability to extract image details. Furthermore, the architecture of the channel attention module and spatial attention module in the convolutional block attention module has been shifted from a cascade structure to a parallel structure, ensuring that the network’s feature extraction capability remains unaffected by the order of processing. A publicly available brain tumor MRI dataset from Kaggle is used in the study. The results demonstrate that FL-CBAM-DIPC-ResNet has a remarkable performance. It achieves impressive accuracy, precision, recall, and F1 score of 97.78%, 97.68%, 97.61%, and 97.63%, respectively. These values of accuracy, precision, recall, and F1 score are 6.54%, 4.78%, 6.80%, and 7.00% higher than those of the baseline model. These experimental findings validate that the proposed method not only overcomes data islands and enables data fusion from multiple sources but also outperforms the majority of existing mainstream models in terms of performance.
Abstract: Residential demand forecasting is affected by multiple factors and is non-linear. To address this issue, the study modifies the original neighborhood rough set (NRS) and then combines it with extreme learning machines (ELMs) to forecast residential demands. Specifically, the modified NRS (MNRS) algorithm constructs a neighborhood relationship matrix based on the neighborhood radii and standard deviations of different conditional attributes, thereby overcoming the failure of the original NRS algorithm to set the optimal neighborhood value for different conditional attributes. Then, the Pearson correlation coefficient is introduced into output attribute importance ranking to overcome the influence among conditional attributes, and the minimal redundant attribute-based reduction set is obtained to serve as the indicator system for residential demand forecasting. Finally, the residential demand indicator system is input into the ELM model to output an accurate forecasted value. Experimental results show that the MNRS-ELM forecasting model not only effectively reduces the operational complexity but also achieves higher prediction accuracy.
Abstract: The relationship extraction method based on remote supervision can cut the cost of labor-based annotated datasets and has been widely used in the construction of the domain knowledge graph. However, the existing remote supervised relationship extraction methods are not domain-specific and also neglect the utilization of domain entity feature information. To solve the above problems, this study proposes a relationship extraction model PCNN-EFMA that integrates entity features and multiple types of attention mechanisms. The model adopts remote supervision and multi-instance technology, no longer limited by labor-based annotation. At the same time, to reduce the impact of noise in remote supervision, the model uses two types of attention: sentence attention and inter-packet attention. In addition, it integrates entity feature information in the word embedding layer and sentence attention, enhancing the model’s feature selection ability. Experiments show that the PR curve of this model is better on the domain dataset, and its average accuracy on P@N is better than that of the PCNN-ATT model.
Abstract: To address the problems of noise interference and missed detection of small objects in water surface object detection, this study proposes an improved You Only Look Once version 8 (YOLOv8) algorithm for water surface small object detection, namely, YOLOv8-WSSOD. Specifically, to reduce the noise interference caused by the complex water surface environment during the downsampling in the backbone network, the study proposes the C2f-BiFormer (C2fBF) module constructed based on BiFormer’s bi-level routing attention mechanism to retain fine-grained contextual feature information during feature extraction. Then, as to the missed detection of small objects on the water surface, a smaller detection head is added to enhance the network’s sensitivity to small objects. At the Neck end, the ghost-shuffle convolution (GSConv) and Slim-neck structures are used to reduce the model’s complexity and maintain precision. Finally, the limitations of the complete intersection over union (CIoU) loss function are overcome by the minimum point distance-based IoU (MPDIoU) loss function to improve the model’s detection precision. The experimental results show that compared with the original YOLOv8 algorithm, the proposed algorithm increases the mean average precision mAP@0.5 and mAP@0.5:0.95 on small objects on the water surface by 4.6% and 2.2%, respectively. Furthermore, the modified algorithm, achieving a detection speed of 86 f/s, is readily available for fast and accurate detection of small objects on the water surface.
Abstract: In the face of large-scale image defects and irregular damage areas, existing image restoration methods often produce results with structural inconsistencies and blurry texture details. This study proposes an image restoration algorithm using the generated edge map and multi-scale feature fusion—MSFGAN (multi-scale feature network model based on edge condition). The model adopts a two-stage network design, using the edge map as a restoration condition to constrain the structural aspects of the restoration results. Firstly, the Canny operator is used to extract the edge map of the image to be restored, generating a complete edge map. Then, the complete edge map is combined with the image to be restored for image restoration. To address common issues in image restoration algorithms, an Attention Mechanism Multi-Fusion convolution block (AM block) is proposed, integrating an attention mechanism for feature extraction and fusion of damaged images. Skip connections are introduced in the decoder part of the image restoration network to fuse high-level semantics and low-level features, achieving high-quality detail and texture restoration. Test results on the CelebA and Places2 datasets show that MSFGAN has improved restoration quality compared to current methods. In the 20%–30% mask ratio, the average improvement of SSIM is 1.535, and PSNR improvement is 0.0791 dB. Ablation experiments validate the effectiveness of the proposed optimization and innovations in image restoration tasks.
Abstract: The crowd density detection algorithm based on deep learning has made great progress, while there is still a lot of room for improvement in the detection accuracy and robustness of the algorithm in actual complex scenes. Factors such as inconsistent object scales and background information interference in complex scenes make crowd density detection a challenging task. Aiming at this problem, this study proposes a crowd density detection network based on multi-scale feature fusion. The network first uses images of different resolutions to interactively extract coarse and fine-grained features of the crowd and introduces a multi-level feature fusion mechanism to make full use of multi-level scale information. Secondly, the study utilizes the spatial and channel attention mechanism to highlight the weight of crowd characteristics, focus on interested crowds, reduce background information interference, and generate high-quality density maps. Experimental results show that the crowd density detection network with multi-scale feature fusion has better accuracy and robustness than representative crowd density detection methods on multiple typical public datasets.
Abstract: In recent years, unstructured road segmentation has become one of the important research directions in the field of computer vision. Most existing methods are suitable for structured road segmentation and cannot meet the accuracy and real-time requirements of unstructured road segmentation. To address the above issues, this study improves the short-term dense concatenate (STDC) network by introducing residual connections to better integrate multi-scale semantic information. Additionally, it proposes a position attention-aware spatial pyramid pooling (PA-ASPP) module to enhance the network’s position awareness ability for specific regions such as roads. Experiments are conducted on two datasets, RUGD and RELLIS-3D, and the proposed method achieves a mean intersection over union (MIoU) of 50.78% and 49.96% on the test sets of the two datasets, respectively.
Abstract: In recent years, underwater acoustic target recognition has received considerable attention. However, due to the time-varying and space-varying nature of the underwater acoustic channel, as well as the complex and variable characteristics of the underwater target sound sources, water sound signal recognition tasks face significant challenges. Traditional methods for water sound signal recognition struggle to capture sufficient representation information of the targets and lack robustness against noise, resulting in suboptimal recognition performance. To address these issues, this study proposes a water sound signal recognition method based on the multi-branch external attention network (MEANet), which can effectively extract features and perform recognition in complex marine environments. MEANet consists of multiple branches for the backbone network, channel and spatial attention modules, and external attention modules. Firstly, the study feeds the input data through multiple parallel branches of the backbone network to extract features at different levels from the water sound signals. Secondly, it employs the channel and spatial attention modules to weight the channels and spatial dimensions of the water sound signals. Finally, the external attention module integrates external memory units and additional computations to guide feature extraction and prediction, significantly improving the recognition rate and robustness of the model. Experimental results demonstrate that the proposed MEANet achieves a recognition rate of 98.84% on the ShipsEar dataset, outperforming other comparative algorithms.
Abstract: As the resources of edge servers are limited, how to design a reasonable resource management and task scheduling scheme is important research. To improve the utility of system services, this study proposes the strategy of joint resource allocation and computing offloading. Firstly, the optimal matching of communication and computing resources is obtained by binary search and the Lagrange multiplier method. Then, the offloading decision is made based on the whale optimization algorithm integrating with multiple strategies, including adjusting the convergence factor with a nonlinear change strategy of the exponential power, the adaptive weight strategy balancing the exploration and utilization stage, and the wandering strategy of the triangle and Levy flight. Besides, the study introduces a penalty function in fitness evaluation to satisfy the constraint of user access. Finally, it formulates a V-shaped transfer function to make binary offloading decisions. The simulation results show that in various indicator evaluations with other benchmark schemes, the proposed strategy can effectively increase network throughput and significantly improve system utility.
Abstract: Existing scene text recognizers are prone to be troubled by blurred text images, leading to poor performance in practical applications. Therefore, several scene text image super-resolution models have been proposed as the pre-processor for text recognizers to improve the quality of input images. However, real-world training samples for the scene text image super-resolution task are difficult to collect. In addition, existing STISR models only learn to transform low-resolution (LR) text images into high-resolution (HR) text images while ignoring blurring patterns from HR to LR images. This study proposes a blurring pattern aware module (BPAM), which learns blurring patterns from existing real-world HR-LR pairs and transfers them to other HR images for generating LR images with different degrees of degradation. Therefore, the proposed BPAM can produce massive HR-LR pairs for STISR models to compensate for the deficiency of training data, significantly improving performance. The experimental results show that when equipped with the proposed BPAM, the performance of SOTA STISR methods can be further improved. For instance, the SOTA method TG achieves a 5.8% improvement in recognition accuracy with CRNN for evaluation.
Abstract: The visually impaired are a vulnerable group in society and face many obstacles when traveling independently. Providing safe and reliable auxiliary equipment for the visually impaired reflects the progress of social civilization. This study introduces the key technologies for obstacle detection and identification and path planning related algorithms for assisting visually impaired travel. The study mainly analyzes path planning algorithms after obstacle detection, comprehensively compares the application characteristics and scenarios of various technologies, and discusses the research progress of related methods in visually impaired assistive devices. In addition, it summarizes the current application status of multi-technology integration in intelligent assistance equipment. On this basis, combined with the advancement of technologies such as artificial intelligence and embedded devices, the future development direction of auxiliary visually impaired travel equipment is prospected.
Abstract: Currently, the application of blockchain in the supply chain is receiving increasing attention from the industry.
However, due to the presence of a large number of complex transactions in the supply chain, selecting trustworthy primary nodes poses a challenge. Therefore, based on the machine learning classification algorithms and PBFT (practical Byzantine fault tolerance), this study proposes a blockchain PBFT optimization method applied to the supply chain. The integrated framework for the supply chain and blockchain is analyzed, and K-nearest neighbors (K-NN) is applied to optimize the primary node selection rules of the PBFT consensus algorithm based on the features of participating nodes in the supply chain consensus. Experimental results show that trust evaluation classification of consensus nodes can effectively address efficiency issues caused by view switching, thereby improving the consensus performance of blockchain in terms of throughput, latency, fault tolerance, and other aspects. The proposed method is practical and provides ideas for the application of blockchain in other industries.
Abstract: The controllable text summary models can generate summaries that conform to user preferences. Previous summary models focus on controlling a certain attribute alone, rather than the combination of multiple attributes. When multiple control attributes are satisfied, the traditional Seq2Seq multi-attribute controllable text summary model cannot integrate all control attributes, accurately reproduce key information in the texts, and handle words outside the word lists. Therefore, this study proposes a model based on the extended Transformer and pointer generator network (PGN). The extended Transformer in the model extends the Transformer single encoder-single decoder model form into a dual encoder with dual text semantic information extraction and a single decoder form that can fuse guidance signal features. Then the PGN model is employed to select the source from the source copy words in the text or adopt vocabulary to generate new summary information to solve the OOV (out of vocabulary) problem that often occurs in summary tasks. Additionally, to efficiently complete position information encoding, the model utilizes relative position representation in the attention layer to introduce sequence information of the texts. The model can be leveraged to control many important summary attributes, including lengths, topics, and specificity. Experiments on the public dataset MACSum show that compared with previous methods, the proposed model performs better at ensuring the summary quality. At the same time, it is more in line with the attribute requirements given by users.
Abstract: This study proposes a multi-hierarchical classification method for marine organisms. Marine organisms are diverse, and organisms of the same phylum have strong inter-class similarity, while organisms of various phyla have large differences. Meanwhile, a multi-hierarchical classification method is designed by utilizing the similarity among species to help the network learn biological prior knowledge. Additionally, this study designs a C-MBConv module and improves the EfficientNetV2 network architecture by combining the multi-hierarchical classification method, and the improved network architecture is called CM-EfficientNetV2. The experiments show that CM-EfficientNetV2 has higher accuracy than the original network EfficientNetV2, with an accuracy improvement of 1.5% on the inter-tidal marine biology dataset of the Nanji Islands and 2% on CIFAR-100.
Abstract: With the development of the Internet of Things (IoT), efficient consensus algorithms are the key to applying blockchain technology to the IoT. This study proposes an improved PBFT consensus algorithm based on the binary K-means practical Byzantine fault tolerance algorithm (BK-PBFT) to address the issues of high communication times, lack of consideration for consensus power consumption, and high consensus latency in IoT scenarios. Firstly, it obtains the geographic coordinates of the nodes, calculates the comprehensive evaluation values of the nodes, and divides the nodes into a two-layer multi-center clustering cluster by the binary K-means algorithm. Then, PBFT consensus is performed on the blocks in the lower-level cluster and then in the upper-level cluster. Finally, the cluster validates and stores the blocks to complete the consensus. Additionally, this study proves that the algorithm can achieve the minimum number of communication times when nodes are evenly distributed in each cluster, and obtain the optimal cluster number under the least communication times. The analysis and simulation results show that the proposed algorithm can effectively reduce communication times, consensus power consumption, and consensus latency.
Abstract: Traffic flow prediction is an important method for achieving urban traffic optimization in intelligent transportation systems. Accurate traffic flow prediction holds significant importance for traffic management and guidance. However, due to the high spatiotemporal dependence, the traffic flow exhibits complex nonlinear characteristics. Existing methods mainly consider the local spatiotemporal features of nodes in the road network, overlooking the long-term spatiotemporal characteristics of all nodes in the network. To fully explore the complex spatiotemporal dependencies in traffic flow data, this study proposes a Transformer-based traffic flow prediction model called multi-spatiotemporal self-attention Transformer (MSTTF). This model embeds temporal and spatial information through position encoding in the embedding layer and integrates various self-attention mechanisms, including adjacent spatial self-attention, similar spatial self-attention, temporal self-attention, and spatiotemporal self-attention, to uncover potential spatiotemporal dependencies in the data. The predictions are made in the output layer. The results demonstrate that the MSTTF model achieves an average reduction of 10.36% in MAE compared to the traditional spatiotemporal Transformer model. Particularly, when compared to the state-of-the-art PDFormer model, the MSTTF model achieves an average MAE reduction of 1.24%, indicating superior predictive performance.
Abstract: To solve the problem that it is difficult for neural networks to obtain enough information to correctly classify images by using a small amount of labeled data, this study proposes a new relational network, SDM-RNET, which combines random deep network and multi-scale convolution. First, a stochastic deep network is introduced into the model embedding module to deepen the model depth. Then, in the feature extraction stage, multi-scale depth-separable convolution is adopted to replace ordinary convolution for feature fusion. After the backbone network, deep and shallow layer feature fusion is applied to obtain richer image features and finally learn to predict the categories of images. Compared with other small sample image classification methods on mini-ImageNet, RP2K, and Omniglot datasets, the results show that the proposed method has the highest accuracy on 5-way 1-shot and 5-way 5-shot classification tasks.
Abstract: Multimodal sentiment analysis aims to assess users’ sentiment by analyzing the videos they upload on social platforms. The current research on multimodal sentiment analysis primarily focuses on designing complex multimodal fusion networks to learn the consistency information among modalities, which enhances the model’s performance to some extent. However, most of the research overlooks the complementary role played by the difference information among modalities, resulting in sentiment analysis biases. This study proposes a multimodal sentiment analysis model called DERL (dual encoder representation learning) based on dual encoder representation learning. This model learns modality-invariant representations and modality-specific representations by a dual encoder structure. Specifically, a cross-modal interaction encoder based on a hierarchical attention mechanism is employed to learn the modality-invariant representations of all modalities to obtain consistency information. Additionally, an intra-modal encoder based on a self-attention mechanism is adopted to learn the modality-specific representations within each modality and thus capture difference information. Furthermore, two gate network units are designed to enhance and filter the encoded features and enable a better combination of modality-invariant and modality-specific representations. Finally, during fusion, potential similar sentiment between different multimodal representations is captured for sentiment prediction by reducing the L2 distance among them. Experimental results on two publicly available datasets CMU-MOSI and CMU-MOSEI show that this model outperforms a range of baselines.
Abstract: At present, there are many small targets in UAV images and the background is complex, which makes it easy to cause a high error detection rate in target detection. To solve these problems, this study proposes a small target detection algorithm for high-order depth separable UAV images. Firstly, by combining the CSPNet structure and ConvMixer network, the study utilizes the deeply separable convolution kernel to obtain the gradient binding information and introduces a recursively gated convolution C3 module to improve the higher-order spatial interaction ability of the model and enhance the sensitivity of the network to small targets. Secondly, the detection head adopts two heads to decouple and respectively outputs the feature map classification and position information, accelerating the model convergence speed. Finally, the border loss function EIoU is leveraged to improve the accuracy of the detection frame. The experimental results on the VisDrone2019 data set show that the detection accuracy of the model reaches 35.1%, and the missing and false detection rates of the model are significantly reduced, which can be effectively applied to the small target detection task of UAV images. The model generalization ability is tested on the DOTA 1.0 dataset and the HRSID dataset, and the experimental results show that the model has good robustness.
Abstract: Ciphertext-policy attribute-based encryption (CP-ABE) can provide fine-grained access control while guaranteeing data privacy. Considering that the existing CP-ABE-based access control schemes can not effectively address critical data security in edge computing, this study proposes a blockchain-based lightweight access control scheme over ciphertext (BLAC) in edge computing. In BLAC, a lightweight CP-ABE algorithm based on elliptic curve cryptography is designed, and fast elliptic curve scalar multiplication is adopted to realize algorithm encryption and decryption. Additionally, most of the encryption and decryption operations are securely transferred to make user devices with limited computing power efficiently complete the fine-grained access control process of ciphertext data with the assistance of edge servers. Meanwhile, a distributed key management method based on blockchain is designed, which enables multiple edge servers to collaboratively distribute private keys for users by blockchain. Security analysis and performance evaluation show that BLAC can guarantee data confidentiality, resist conspiracy attacks, and support forward security. Additionally, it has high user-side computational efficiency and low server-side decryption overhead and storage overhead.
Abstract: Liver cancer is a malignant liver tumor that originates from liver cells, and its diagnosis has always been a difficult medical problem and a research hotspot in various fields. Early diagnosis of liver cancer can reduce the mortality rate of liver cancer. Histopathological image examination is the gold standard for oncology diagnosis as the images can display the cells and tissue structures of tissue slices, which can be employed to determine cell types, tissue structures, and the number and morphology of abnormal cells, and evaluate the specific condition of the tumor. This study focuses on the application of convolutional neural networks in liver cancer diagnosis algorithms for pathological images, including liver tumor detection, image segmentation, and preoperative prediction. The design ideas and related improvement goals and methods of each algorithm of convolutional neural networks are elaborated in detail to provide clearer reference ideas for researchers. Additionally, the advantages and disadvantages of convolutional neural network algorithms in diagnosis are summarized and analyzed, with potential research hotspots and related difficulties in the future discussed.
Abstract: Aiming at the current inaccurate predictions in 3D human pose due to factors such as occlusion and complexity of poses, this paper proposes an improved 3D human pose estimation algorithm to obtain accurate 3D human pose and enhance the performance of human pose estimation. Meanwhile, it adopts the graph attention block from the spatio-temporal graph attention convolutional network to construct the entire network. On this basis, the network structure of the global multi-head graph attention part is improved to facilitate better information propagation and fusion among nodes and capture semantic information not explicitly represented in the graph. Kinematic constraints are introduced as well, and a bone length loss is added based on the MPJPE loss. By the modeling of local and global spatial node information, the learning of kinematic constraints of human skeletal movements is achieved, including local kinematic connections, symmetry, and global poses. Empirical results show that the improved model effectively enhances the performance of human pose estimation. Compared to the original model on the Human3.6M dataset, a 1.8% improvement in mean per joint position error (MPJPE) and a 1.3% improvement in the Procrustes aligned MPJPE (P-MPJPE) after rigid alignment of predicted and true joints have been realized.
Abstract: Training of deep neural networks (DNN) in mission-critical scenarios involves increasingly more resources, which stimulates model stealing from prediction API at the cloud and violates the intellectual property rights of the model owners. To trace public illegal model copies, DNN model fingerprinting provides a promising copyright verification option for model owners who want to preserve the model integrity. However, existing fingerprinting schemes are mainly based on output-level traces (e.g., mis-prediction behavior on special inputs) to cause limited stealthiness during model fingerprinting verification. This study proposes a novel task-agnostic fingerprinting scheme based on saliency map traces of model prediction. The proposed scheme puts forward a constrained manipulation objective of saliency maps to construct clean-label and natural fingerprint samples, thus significantly improving the stealthiness of model fingerprints. According to extensive evaluation results on three typical tasks, this scheme is proven to substantially enhance the fingerprint effectiveness of existing schemes and remain highly stealthy of model fingerprints.
Abstract: Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech. To this end, this study proposes a speech enhancement method based on a deep complex axial self-attention convolutional recurrent network (DCACRN), which enhances speech amplitude information and phase information in the complex domain simultaneously. Firstly, a complex convolutional network-based encoder is employed to extract complex features from the input speech signal, and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion, which enhances the information interaction and the gradient flow. Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability. Finally, the reconstruction of the speech signals is realized by the decoder, while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals. Meanwhile, the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals. The experiments are conducted on the public datasets Valentini and DNS Challenge, and the results show that the proposed method improves both the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) metrics compared to other models. In the non-reverberant dataset, PESQ is improved by 12.8% over DCTCRN and 3.9% over DCCRN, which validates the effectiveness of the proposed model in speech enhancement tasks.
Abstract: At present, since the recognition of most students’ classroom behavior is mainly based on a single frame image and ignores behavior coherence, video information cannot be made full use of to accurately depict students’ classroom behavior. Therefore, this study proposes an improved YOWO algorithm model to effectively employ video information to identify students’ classroom behavior. First, this paper collects teaching videos from real classroom teaching in a university and produces an AVA format video dataset containing five types of students’ classroom behavior. Second, the temporal shift module (TSM) is adopted to enhance the ability of this model to obtain time context information. Finally, a non-local operation module is utilized to improve the ability of the model to extract key location information. The experimental results show that by optimizing the YOWO model, the recognition performance of the network is better. In the classroom behavior dataset, the mAP value of the improved algorithm is 95.7%, 4.6% higher than that of the original YOWO algorithm. The parameter number in the model is reduced by 32.3% at 81.97×106 and the calculation amount is decreased by 9.6% at 22.6 GFLOPs. The detection speed is 24.03 f/s, an increase of about 3 f/s.
Abstract: By directly processing each view of original data, multi-view subspace clustering algorithms typically obtain potential subspace representation matrices. However, these methods often underestimate the influences of redundant data, making it challenging to accurately capture the accurate clustering results in the potential subspace representation. Furthermore, the K-means algorithm used to produce the clustering results easily neglects the local structure of the data within the subspaces, leading to unstable results. To address the aforementioned problems, this study proposes a multi-view subspace method to acquire high-quality subspace representations. Specifically, the study initially gets a robust representation through a feature decomposition method. Then, it constructs a joint latent subspace representation for multiple views. Next, it uses spectral rotation to obtain clustering results and employs orthogonal constraints on the partition matrix to reconstruct the subspaces, thereby enhancing clustering performance. Finally, an iterative optimization algorithm is applied to solve relevant optimization problems. Experiments are conducted on five benchmark datasets, and the results demonstrate that the proposed algorithm is more effective than recent multi-view clustering algorithms.
Abstract: Gait recognition is the process of identifying individuals based on their walking patterns. Currently, most gait recognition methods employ shallow neural networks for feature extraction, which performs well in indoor gait datasets but produces poor performance on the newly released outdoor gait datasets. To address the complicated challenges that arise from outdoor gait datasets, this study proposes a deep gait recognition model based on video residual neural networks. In the feature extraction phase, a deep 3D convolutional neural network (3D CNN) is constructed by the proposed video residual blocks to extract the spatio-temporal dynamics features of the entire gait sequence. Subsequently, temporal pooling and horizontal pyramid mapping are introduced to reduce the feature resolution of sampling data and extract local gait features. The training process is driven by a joint loss function, and finally loss functions are balanced and the feature space is adjusted by BNNeck. The experiments are conducted on three publicly available gait datasets, including both indoor (CASIA-B) and outdoor (GREW, Gait3D) gait datasets. The experimental results verify that the model outperforms other models in accuracy and convergence speed on outdoor gait datasets.
Abstract: Mobile edge computing and ultra-dense network technologies have obvious advantages in improving the computing power of mobile devices and enhancing network capacity. However, under the scenario of convergence between the two, how to effectively reduce the co-channel interference among base stations and reduce the delay and energy consumption of task transmission is an important research topic. Therefore, this study designs a distributed wireless resource management algorithm based on multi-base station game equilibrium. The wireless resource management problem among small base stations is transformed into a game one to propose a reward-driven strategy selection algorithm. The base stations continuously update the selection probability of their strategies by iterations, which finally optimizes the sub-channel allocation and transmission power regulation. Simulation results show that the proposed algorithm has advantages in improving channel utilization and reducing latency and energy consumption for task transmission.
Abstract: With the continuous evolution of computer technology, process simulation is becoming increasingly widely employed in various industries and utilizes simulation models to mimic business process behavior. Additionally, it can be adopted to predict and optimize system performance, assess the impact of decisions, provide a decision-making basis for managers, and reduce the experimental cost and time. Currently, how to efficiently develop a simulation model that can be trusted has caught widespread attention. This study traces, summarizes, and analyzes the relevant references on methods for building business process simulation models. Meanwhile, the processes, advantages, disadvantages, and progress of process model-based, system dynamics-based, and deep learning-based simulation modeling approaches are presented. Finally, the challenges and future directions of process simulation are discussed to provide references for future research in this field.
Abstract: The security of electric energy plays an important role in national security. With the development of power 5G communication, a large number of power terminals have positioning demand. The traditional global positioning system (GPS) is vulnerable to spoofing. How to improve the security of GPS effectively has become an urgent problem. This study proposes a GPS spoofing detection algorithm with base station assistance in power 5G terminals. It uses the base station positioning with high security to verify the GPS positioning that may be spoofed and introduces the consistency factor (CF) to measure the consistency between GPS positioning and base station positioning. If CF is greater than a threshold, the GPS positioning is classified as spoofed. Otherwise, it is judged as normal. The experimental results show that the accuracy of the algorithm is 99.98%, higher than that of traditional classification algorithms based on machine learning. In addition, our scheme is also faster than those algorithms.
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.