• Current Issue
  • Online First
  • Archive
  • Click Rank
  • Most Downloaded
  • 综述文章
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2024,33(4):1-12, DOI: 10.15888/j.cnki.csa.009459
    [Abstract] (206) [HTML] (63) [PDF 2.72 M] (371)
    Abstract:
    Training of deep neural networks (DNN) in mission-critical scenarios involves increasingly more resources, which stimulates model stealing from prediction API at the cloud and violates the intellectual property rights of the model owners. To trace public illegal model copies, DNN model fingerprint provides a promising copyright verification option for model owners who want to preserve the model integrity. However, existing fingerprinting schemes are mainly based on output-level traces (e.g., mis-prediction behavior on special inputs) to cause limited stealthiness during model fingerprint verification. This study proposes a novel task-agnostic fingerprinting scheme based on saliency map traces of model prediction. The proposed scheme puts forward a constrained manipulation objective of saliency maps to construct clean-label and natural fingerprint samples, thus significantly improving the stealthiness of model fingerprints. According to extensive evaluation results on three typical tasks, this scheme is proven to substantially enhance the fingerprint effectiveness of existing schemes and remain highly stealthy of model fingerprints.
    2024,33(4):13-25, DOI: 10.15888/j.cnki.csa.009461
    [Abstract] (136) [HTML] (39) [PDF 2.65 M] (352)
    Abstract:
    Multimodal sentiment analysis aims to assess users’ sentiment by analyzing the videos they upload on social platforms. The current research on multimodal sentiment analysis primarily focuses on designing complex multimodal fusion networks to learn the consistency information among modalities, which enhances the model’s performance to some extent. However, most of the research overlooks the complementary role played by the difference information among modalities, resulting in sentiment analysis biases. This study proposes a multimodal sentiment analysis model called DERL (dual encoder representation learning) based on dual encoder representation learning. This model learns modality-invariant representations and modality-specific representations by a dual encoder structure. Specifically, a cross-modal interaction encoder based on a hierarchical attention mechanism is employed to learn the modality-invariant representations of all modalities to obtain consistency information. Additionally, an intra-modal encoder based on a self-attention mechanism is adopted to learn the modality-specific representations within each modality and thus capture difference information. Furthermore, two gate network units are designed to enhance and filter the encoded features and enable a better combination of modality-invariant and modality-specific representations. Finally, during fusion, potential similar sentiment between different multimodal representations is captured for sentiment prediction by reducing the L2 distance among them. Experimental results on two publicly available datasets CMU-MOSI and CMU-MOSEI show that this model outperforms a range of baselines.
    2024,33(4):26-38, DOI: 10.15888/j.cnki.csa.009466
    [Abstract] (161) [HTML] (144) [PDF 1.86 M] (508)
    Abstract:
    Liver cancer is a malignant liver tumor that originates from liver cells, and its diagnosis has always been a difficult medical problem and a research hotspot in various fields. Early diagnosis of liver cancer can reduce the mortality rate of liver cancer. Histopathological image examination is the gold standard for oncology diagnosis as the images can display the cells and tissue structures of tissue slices, which can be employed to determine cell types, tissue structures, and the number and morphology of abnormal cells, and evaluate the specific condition of the tumor. This study focuses on the application of convolutional neural networks in liver cancer diagnosis algorithms for pathological images, including liver tumor detection, image segmentation, and preoperative prediction. The design ideas and related improvement goals and methods of each algorithm of convolutional neural networks are elaborated in detail to provide clearer reference ideas for researchers. Additionally, the advantages and disadvantages of convolutional neural network algorithms in diagnosis are summarized and analyzed, with potential research hotspots and related difficulties in the future discussed.
    2024,33(4):39-49, DOI: 10.15888/j.cnki.csa.009469
    [Abstract] (143) [HTML] (51) [PDF 2.10 M] (251)
    Abstract:
    The multi-client brain tumor classification method based on the convolutional block attention module has inadequate extraction of tumor region details from MRI images, and channel attention and spatial attention interfere with each other under the federated learning framework. In addition, the accuracy in classifying medical tumor data from multiple points is low. To address these problems, this study proposes a brain tumor classification method that amalgamates the federated learning framework with an enhanced CBAM-ResNet18 network. The method leverages the federated learning characteristic to collaboratively work with brain tumor data from multiple sources. It replaces the ReLU activation function with Leaky ReLU to mitigate issues of neuron death. The channel attention module within the convolutional block attention module is modified from a dimension reduction followed by a dimension increment approach to a dimension increment followed by a dimension reduction approach. This change significantly enhances the network’s ability to extract image details. Furthermore, the architecture of the channel attention module and spatial attention module in the convolutional block attention module has been shifted from a cascade structure to a parallel structure, ensuring that the network’s feature extraction capability remains unaffected by the order of processing. A publicly available brain tumor MRI dataset from Kaggle is used in the study. The results demonstrate that FL-CBAM-DIPC-ResNet has a remarkable performance. It achieves impressive accuracy, precision, recall, and F1 score of 97.78%, 97.68%, 97.61%, and 97.63%, respectively. These values of accuracy, precision, recall, and F1 score are 6.54%, 4.78%, 6.80%, and 7.00% higher than those of the baseline model. These experimental findings validate that the proposed method not only overcomes data islands and enables data fusion from multiple sources but also outperforms the majority of existing mainstream models in terms of performance.
    2024,33(4):50-59, DOI: 10.15888/j.cnki.csa.009486
    [Abstract] (73) [HTML] (138) [PDF 2.16 M] (382)
    Abstract:
    The visually impaired are a vulnerable group in society and face many obstacles when traveling independently. Providing safe and reliable auxiliary equipment for the visually impaired reflects the progress of social civilization. This study introduces the key technologies for obstacle detection and identification and path planning related algorithms for assisting visually impaired travel. The study mainly analyzes path planning algorithms after obstacle detection, comprehensively compares the application characteristics and scenarios of various technologies, and discusses the research progress of related methods in visually impaired assistive devices. In addition, it summarizes the current application status of multi-technology integration in intelligent assistance equipment. On this basis, combined with the advancement of technologies such as artificial intelligence and embedded devices, the future development direction of auxiliary visually impaired travel equipment is prospected.
    2024,33(4):60-68, DOI: 10.15888/j.cnki.csa.009458
    [Abstract] (117) [HTML] (114) [PDF 1.72 M] (408)
    Abstract:
    Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech. To this end, this study proposes a speech enhancement method based on a deep complex axial self-attention convolutional recurrent network (DCACRN), which enhances speech amplitude information and phase information in the complex domain simultaneously. Firstly, a complex convolutional network-based encoder is employed to extract complex features from the input speech signal, and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion, which enhances the information interaction and the gradient flow. Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability. Finally, the reconstruction of the speech signals is realized by the decoder, while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals. Meanwhile, the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals. The experiments are conducted on the public datasets Valentini and DNS Challenge, and the results show that the proposed method improves both the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) metrics compared to other models. In the non-reverberant dataset, PESQ is improved by 12.8% over DCTCRN and 3.9% over DCCRN, which validates the effectiveness of the proposed model in speech enhancement tasks.
    Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009507
    Abstract:
    Mixed sample data enhancement methods focus only on the model’s forward representation of the category to which the image belongs while ignoring the reverse determination of whether the image belongs to a specific category. To address the problem of uniquely describing image categories and affecting model performance, this study proposes a method of image data augmentation with inverse target interference. To prevent overfitting of the network model, the method first modifies the original image to increase the diversity of background and target images. Secondly, the idea of reverse learning is adopted to enable the network model to correctly identify the category that the original image belongs to while fully learning the attributes of the populated image that do not belong to that category to increase the confidence of the network model in identifying the category that the original image belongs to. In conclusion, to verify the method’s effectiveness, the study utilizes different network models to perform many experiments on five datasets including CIFAR-10 and CIFAR-100. Experimental results show that compared to other state-of-the-art data augmentation methods, the proposed method can significantly enhance the model’s learning effect and generalization ability in complex settings.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009509
    Abstract:
    Accurate segmentation of colon polyps is important to remove abnormal tissue and reduce the risk of polyps converting to colon cancer. The current colon polyp segmentation model has the problems of high misjudgment rate and low segmentation accuracy in the segmentation of polyp images. To achieve accurate segmentation of polyp images, this study proposes a colon polyp segmentation model (MGW-Net) combining multi-scale gated convolution and window attention. Firstly, it designs an improved multi-scale gate convolution module (MGCM) to replace the U-Net convolutional block to achieve full extraction of colon polyp image information. Secondly, to reduce the information loss at the skip connection and make full use of the information at the bottom of the network, the study builds a multi-information fusion enhancement module (MFEM) by combining improved dilated convolution and hybrid enhanced residual window attention to optimize the feature fusion at the skip connection. Experimental results on CVC-ClinicDB and Kvasir-SEG data sets show that the similarity coefficients of MGW-Net are 93.8% and 92.7%, and the average crossover ratio is 89.4% and 87.9%, respectively. Experimental results on CVC-ColonDB, CVC-300, and ETIS datasets show that MGW-Net has strong generalization performance, which verifies that MGW-Net can effectively improve the accuracy and robustness of colon polyp segmentation.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009514
    Abstract:
    In the anti-external force damage inspection of transmission lines, the current lightweight target detection algorithm deployed at the edge has insufficient detection accuracy and slow reasoning speed. To solve the above problems, this study proposes a sparse convolution network (SCN) with global context enhancement for anti-external force damage detection of the power grid, Fast-YOLOv5. Based on the YOLOv5 algorithm, the FasterNet+ network is designed as a new feature extraction network, which can maintain detection accuracy, improve the reasoning speed of the model, and reduce computational complexity. In the bottleneck layer of the algorithm, an ECAFN module with efficient channel attention is designed, which improves the detection effect by adaptively calibrating the feature response in the channel direction, efficiently obtaining the cross-channel interactive information and further reducing the amount of parameters and calculation. The study proposes the detection layer of the sparse convolutional network SCN replacement model with context enhancement to enhance the foreground focus feature and improve the prediction ability of the model by capturing the global context information. The experimental results show that compared with the original model, the accuracy of the improved model is increased by 1.9%, and the detection speed is doubled, reaching 56.2 FPS. The amount of parameters and calculation are reduced by 50% and 53% respectively, which is more in line with the requirements for efficient detection of transmission lines.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009525
    Abstract:
    To address the inadequacy of existing remote sensing image super-resolution reconstruction models in long-term feature similarity and multi-scale feature relevance, this study proposes a novel remote sensing image super-resolution reconstruction algorithm based on a cross-scale hybrid attention mechanism. Initially, the study introduces a global layer attention (GLA) mechanism and employs layer-wise attention to weight and merge global features across different levels, thereby modeling the extended dependency between low-resolution and high-resolution image features. Concurrently, it designs a cross-scale local attention (CSLA) mechanism to identify and integrate local information patches in multi-scale low-resolution feature maps that correspond with high-resolution images, enhancing the model’s ability to restore image details. Finally, the study proposes a local information-aware loss function to guide the image reconstruction process, further improving the visual quality and detail preservation of the reconstructed images. Experiments on UC-Merced datasets demonstrate that the proposed method outperforms most mainstream methods in terms of average PSNR/SSIM across three magnification factors and exhibits superior quality and detail preservation in visual results.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009526
    Abstract:
    The distribution of grayscale values in calligraphic character document images exhibits significant variations under poor lighting conditions, resulting in lower image contrast in low-light areas and degradation of morphological texture features of the strokes. Traditional methods typically focus on local information such as mean, squared deviation, and entropy, while giving less consideration to morphological texture, rendering them insensitive to the features of low-contrast areas. To address these issues, this study proposes a binarization method called clustering segmentation-based side-window filter (CS-SWF) specifically designed for degraded calligraphic documents. Firstly, this method utilizes multi-dimensional SWF to describe pixel chunks with similar morphological features. Then, with multiple correction rules, it utilizes downsampling to extract low-latitude information and correct feature regions. Finally, the clustered blocks in the feature map are classified to obtain the binarization results. To evaluate the performance of the proposed method, it is compared with existing methods using F-measure (FM), peak signal-to-noise ratio (PSNR), and distance reciprocal distortion (DRD) as indicators. Experimental results on a self-constructed dataset consisting of 100 handwritten degraded document images demonstrate that the proposed binarization method exhibits greater stability in low-contrast dark regions and outperforms the comparison algorithm in terms of accuracy and robustness.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009527
    Abstract:
    To prevent and reduce the occurrence of WUI fires, this study mines the key causal factors of WUI fires and clarifies the action mechanism between the causal factors. First, this study obtains the causal factors from WUI fire accident cases based on the proposed mining technology and uses the Apriori algorithm to obtain association rules between the causal factors. Then it uses the complex network theory to construct the WUI fire causal factor network, calculate the topological parameters of the network, and analyze the characteristics of the WUI fire causal network. Finally, the study introduces the risk index of the WUI fire causal chain, mines the high-risk connecting edges, and proposes the chain breaking measures. The results show that the WUI fire causal factor network has a small-world characteristic, and high temperature, strong wind, and drought have a greater influence on other causal factors. Burning waste, plant fire, emergency response speed, human arson, and strong wind have important roles in the conversion of different causal factors, which should be controlled better. The most risky side of the network is burning waste → plant fire, and the risk chain can be cut off by enacting regulations such as the prohibition of unauthorized burning waste, to achieve the prevention and active control of WUI fires.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009528
    Abstract:
    Clinical diagnoses can be facilitated through the utilization of multi-organ medical image segmentation. This study proposes a multi-level feature interaction Transformer model to address the issues of weak global feature extraction capability in CNN, weak local feature extraction capability in Transformer, and the quadratic computational complexity problem of Transformer for multi-organ medical image segmentation. The proposed model employs CNN for extracting local features, which are then transformed into global features through Swin Transformer. Multi-level local and global features are generated through down-sampling, and each level of local and global features undergo interaction and enhancement. After the enhancement at each level, the features are cross-fused by multi-level feature fusion modules. The features, once again fused, pass through up-sampling and segmentation heads to produce segmentation masks. The proposed model is experimented on the Synapse and ACDC datasets, achieving average dice similarity coefficient (DSC) and average 95th percentile Hausdorff distance (HD95) values of 80.16% and 19.20 mm, respectively. These results outperform representative models such as LGNet and RFE-UNet. The proposed model is effective for multi-organ medical image segmentation.
    Available online:  April 19, 2024 , DOI: 10.15888/j.cnki.csa.009533
    Abstract:
    Videos captured in low illumination environments often carry problems such as low contrast, high noise, and unclear details, which seriously affect computer vision tasks such as target detection and segmentation. Most of the existing low-light video enhancement methods are constructed based on convolutional neural networks. Since convolution cannot make full use of the long-range dependencies between pixels, the generated video often suffers from loss of details and color distortion in some regions. To address the above problems, this study proposes a Siamese low-light video enhancement network coupling local and global features. The model obtains local features of video frames through a deformable convolution-based local feature extraction module and designs a lightweight self-attention module to capture the global features of video frames. Finally, the extracted local and global features are fused by a feature fusion module, which guides the model to generate enhanced videos with more realistic colors and details. The experimental results show that the proposed method can effectively improve the brightness of low-light videos and generate videos with richer colors and details. It also outperforms the methods proposed in recent years in evaluation metrics such as peak signal-to-noise ratio and structural similarity.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009518
    Abstract:
    Synthetic aperture radar (SAR) images provide an important time-series data source for land cover classification. The existing time-series matching algorithms can fully exploit the similarity among time-series features to obtain satisfactory classification results. In this study, the classic time-series matching algorithm named time-weighted dynamic time warping (TWDTW), which comprehensively considers shape similarity and phenological differences, is introduced to guide SAR-based land cover classification. To solve the problem that the traditional TWDTW algorithm only considers the similarity matching of a single feature on the time series, this study proposes a multi-feature fusion-based TWDTW (Mult-TWDTW) algorithm. In the proposed method, three features, namely, the backscattering coefficient, interferometric coherence, and the dual-polarization radar vegetation index (DpRVI), are extracted, and the Mult-TWDTW model is designed by fusing multiple features based on the TWDTW algorithm. To verify the effectiveness of the proposed method, the study implements land cover classification in the Danjiangkou area using time-series data obtained from the Sentinel-1A satellite. Then, the Mult-TWDTW algorithm is compared with the multi-layer perception (MLP), one-dimensional convolutional neural network (1D-CNN), K-means, and support vector machine (SVM) algorithms as well as the TWDTW algorithm using a single feature. The experimental results show that the Mult-TWDTW algorithm obtains the best classification results, manifested as its overall accuracy and Kappa coefficient reaching 95.09% and 91.76, respectively. In summary, the Mult-TWDTW algorithm effectively fuses the information of multiple features and can enhance the potential of time-series matching algorithms in the classification of multiple types of land covers.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009519
    Abstract:
    In the digital era, an increasing number of people prefer shopping on e-commerce platforms. With the development of agricultural product e-commerce platforms, consumers find it challenging to discover suitable products among numerous choices. To enhance user satisfaction and purchase intent, agricultural product e-commerce platforms need to recommend appropriate products based on user preferences. Considering various agricultural features such as season, region, user interests, and product attributes, feature interactions can better capture user demands. This study introduces a new model, fine-grained feature interaction selection networks (FgFisNet). The model effectively learns feature interactions using both the inner product and Hadamard product by introducing fine-grained interaction layers and feature interaction selection layers. During the training process, it automatically identifies important feature interactions, eliminates redundant ones, and feeds the significant feature interactions and first-order features into a deep neural network to obtain the final click through rate (CTR) prediction. Extensive experiments on a real dataset from agricultural e-commerce demonstrate significant economic benefits achieved by the proposed FgFisNet method.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009515
    Abstract:
    The currently available quality assessment methods for images rarely fully utilize the color coding mechanisms of the retina of human eyes and the visual cortex and fail to fully consider the influence of color information on image quality. In this study, an objective assessment model for the color harmony of visible light (dim-light) and infrared color fused images based on multiple visual features is proposed to address the above problems. This model incorporates more color information into image quality assessment by considering a variety of visual features of human eyes comprehensively, including the feature of visual contrast colors, the feature of color information fluctuation, and the feature of advanced visual content. Through feature fusion and support vector regression training, it achieves the objective assessment of the color harmony of color fused images. Experimental comparisons and analyses are conducted using databases of fused images in three typical scenes. The experimental results show that compared with the existing eight methods of objective image quality assessment, the proposed method is more consistent with the subjective perception of human eyes and has higher prediction accuracy.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009516
    Abstract:
    The emergence of network function virtualization (NFV) technology enables network services instantiated as service function chains (SFCs) to share the underlying network, alleviating the rigidity of traditional network architectures. However, the large number of service requests in the network brings new challenges to multi-domain SFC orchestration. For one thing, the privacy of the intra-domain resource information and internal policies of the network makes multi-domain SFC orchestration more complicated. For another, multi-domain SFC orchestration requires the determination of the optimal set of candidate orchestration domains. Nevertheless, previous studies rarely considered the inter-domain load balance, which negatively affected the service acceptance rate. In addition, the orchestration of service requests across network domains places more stringent requirements on the cost and response time of the service. To address the above challenges, this study proposes a construction method for domain-level graphs to meet the privacy requirement of multi-domain networks. Then, a calculation method for domain weight based on the inter-domain load balance is proposed to select SFC orchestration domains. Finally, the study proposes an orchestration algorithm considering the cost and responses time requirements of multi-domain networks. The experimental results show that the proposed algorithm effectively trades off the average service cost and the acceptance rate and also optimizes the average service response time.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009517
    Abstract:
    This study proposes an algorithm named DPCP-CROSS-JOIN for fast co-spatiotemporal relationship join queries of large-scale trajectory data in insufficient cluster computing resource environments. The proposed algorithm discretizes continuous trajectory data by segmenting and cross-coding the temporal fields of trajectory data and conducting spatiality gridded coding and then stores the data in two-level partitions using date and grid region coding. It achieves 3-level indexing and 4-level acceleration for spatiotemporal join queries through cross “equivalent” join queries. As a result, the time complexity of the co-spatiotemporal relationship join queries among n$\cdot $n objects is reduced from O(n2) to O(nlogn). It can improve the efficiency of join queries by up to 30.66 times when Hive and Tez are used on a Hadoop cluster for join queries of large-scale trajectory data. This algorithm uses time-slice and gridding coding as the join condition, thereby cleverly bypassing the real-time calculation of complex expressions during the join process. Moreover, complex expression calculation join is replaced with “equivalent” join to improve the parallelism of MapReduce tasks and enhance the utilization rates of cluster storage and computing resources. Similar tasks of larger scales of trajectory data that are almost impossible to accomplish using general optimization methods can still be completed by the proposed algorithm within a few minutes. The experimental results suggest that the proposed algorithm is efficient and stable, and it is especially suitable for the co-spatiotemporal relationship join queries of large-scale trajectory data under insufficient computing resource conditions. It can also be used as an atomic algorithm for searching accompanying spatiotemporal trajectories and determining the intimacy of relationships among objects. It can be widely applied in fields such as national security and social order maintenance, crime prevention and combat, and urban and rural planning support.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009511
    Abstract:
    Selecting appropriate optimizers for a federated learning environment is an effective way to improve model performance, especially in situations where the data is highly heterogeneous. In this study, the FedAvg and FedALA algorithms are mainly investigated, and an improved version called pFedALA is proposed. PFedALA effectively reduces resource waste caused by synchronization demands by allowing clients to continue local training during waiting periods. Then, the roles of the optimizers in these three algorithms are analyzed in detail, and the performance of various optimizers such as stochastic gradient descent (SGD), Adam, averaged SGD (ASGD), and AdaGrad in handling non-independent and identically distributed (Non-IID) and imbalanced data is compared by testing them on the MNIST and CIFAR-10 datasets. Special attention is given to practical heterogeneity based on the Dirichlet distribution and extreme heterogeneity in terms of data setting. The experimental results suggest the following observations: 1) The pFedALA algorithm outperforms the FedALA algorithm, with an average test accuracy approximately 1% higher than that of FedALA; 2) Optimizers commonly used in traditional single-machine deep learning environments deliver significantly different performance in a federated learning environment. Compared with other mainstream optimizers, the SGD, ASGD, and AdaGrad optimizers appear to be more adaptable and robust in the federated learning environment.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009512
    Abstract:
    The scenes in high-resolution aerial images are of many highly similar categories. The classic classification method based on deep learning offers low operational efficiency because of the redundant floating-point operations generated in the feature extraction process. FasterNet improves the operational efficiency through partial convolution but reduces the feature extraction ability and hence the classification accuracy of the model. To address the above problems, this study proposes a hybrid structure classification method integrating FasterNet and the attention mechanism. Specifically, the “cross-shaped convolution module” is used to partially extract scene features and thereby improve the operational efficiency of the model. Then, a dual-branch attention mechanism that integrates coordinate attention and channel attention is used to enable the model to better extract features. Finally, a residual connection is made between the “cross-shaped convolution module” and the dual-branch attention module so that more task-related features can be obtained from network training, thereby reducing operational costs and improving operational efficiency in addition to improving classification accuracy. The experimental results show that compared with the existing classification models based on deep learning, the proposed method has a short inference time and high accuracy. Its number of parameters is 19M, and its average inference time for one image is 7.1 ms. The classification accuracy of the proposed method on the public datasets NWPU-RESISC45, EuroSAT, VArcGIS (10%), and VArcGIS (20%) is 96.12%, 98.64%, 95.42%, and 97.87%, respectively, which is 2.06%, 0.77%, 1.34%, and 0.65% higher than that of the FasterNet model, respectively.
    Available online:  April 07, 2024 , DOI: 10.15888/j.cnki.csa.009513
    Abstract:
    This study aims to solve the problems faced by traditional U-Net network in the semantic segmentation task of street scene images, such as the low accuracy of object segmentation under multi-scale categories and the poor correlation of image context features. To this end, it proposes an improved U-Net semantic segmentation network AS-UNet to achieve accurate segmentation of street scene images. Firstly, the spatial and channel squeeze & excitation block (scSE) attention mechanism module is integrated into the U-Net network to guide the convolutional neural network to focus on semantic categories related to segmentation tasks in both channel and space dimensions, to extract more effective semantic information. Secondly, to obtain the global context information of the image, the multi-scale feature map is aggregated for feature enhancement, and the atrous spatial pyramid pooling (ASPP) multi-scale feature fusion module is embedded into the U-Net network. Finally, the cross-entropy loss function and Dice loss function are combined to solve the problem of unbalanced target categories in street scenes, and the accuracy of segmentation is further improved. The experimental results show that the mean intersection over union (MIoU) of the AS-UNet network model in the Cityscapes and CamVid datasets increases by 3.9% and 3.0%, respectively, compared with the traditional U-Net network. The improved network model significantly improves the segmentation effect of street scene images.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009510
    Abstract:
    Convolutional neural network (CNN), as an important part of U-Net baseline networks in the field of medical image segmentation, is mainly used to deal with the relationships among local feature information. Transformer is a visual model that can effectively strengthen the long-distance dependency among feature information. The previous study shows that Transformer can be combined with CNNs to improve the accuracy of medical image segmentation to a certain extent. However, labeled data in medical images are rarely available while a large amount of data is required to train the Transformer model, exposing the Transformer model to the challenges of high time consumption and a large number of parameters. Due to these considerations, this paper proposes a novel medical image segmentation model based on a hybrid multi-layer perception (MLP) network by combining the multi-scale hybrid MLP with a CNN based on the UNeXt model, namely, the LM-UNet model. This model can effectively enhance the connection between local and global information and strengthen the fusion between feature information. Experiments on multiple datasets reveal significantly improved segmentation performance of the LM-UNet model on the International Skin Imaging Collaboration (ISIC) 2018 dataset manifested as an average Dice coefficient of 92.58% and an average intersection over union (IoU) coefficient of 86.52%, which are 3% and 3.5% higher than those of the UNeXt model, respectively. The segmentation effects of the proposed model on the Osteoarthritis Initiative-Zuse Institute Berlin two-dimensional (OAI-ZIB 2D) and the Breast Ultrasound Image (BUSI) datasets are also substantially superior, represented as average Dice coefficients 2.5% and 1.0% higher than those of the UNeXt counterpart, respectively. In summary, the LM-UNet model not only improves the accuracy of medical image segmentation but also provides better generalization performance.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009499
    Abstract:
    Accurately predicting wind power is of great significance for improving the efficiency and safety of the power system, while the intermittence and randomness of wind energy make it difficult to predict wind power accurately. Therefore, an improved wind power prediction model based on Informer, namely PCI-Informer (PATCH-CNN-IRFFN-Informer) is proposed. The sequence data is divided into subsequence-level patches for feature extraction and integration, which improves the model’s ability to process sequence data and its effectiveness. Multiple-scale causal convolution self-attention mechanism is used to achieve multi-scale local feature fusion, which enhances the model’s understanding and modeling ability of local information. The inverse residual feedforward network (IRFFN) is introduced to enhance the model’s ability to extract and preserve local structural information. Experiment verification is conducted using data from a wind farm, and the results show that compared with mainstream prediction models, the PCI-Informer model achieves better prediction performance at different prediction time steps, with an average reduction of 11.1% in MAE compared with the Informer model, effectively improving the short-term wind power prediction accuracy.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009500
    Abstract:
    GSNet relies on graspness to distinguish graspable areas in cluttered scenes, which significantly improves the accuracy of robot grasping pose detection in cluttered scenes. However, GSNet only uses a fixed-size cylinder to determine the grasping pose parameters and ignores the influence of features of different sizes on grasping pose estimation. To address this problem, this study proposes a multi-scale cylinder attention feature fusion module (Ms-CAFF), which contains two core modules: the attention fusion module and the gating unist. It replaces the original feature extraction method in GSNet and uses an attention mechanism to effectively integrate the geometric features inside the four cylinders of different sizes, thereby enhancing the network’s ability to perceive geometric features at different scales. The experimental results on GraspNet-1Billion, a grabbing pose detection dataset for large-scale cluttered scenes, show that after the introduction of the modules, the accuracy of the network’s grasping poses is increased by up to 10.30% and 6.65%. At the same time, this study applies the network to actual experiments to verify the effectiveness of the method in real scenes.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009504
    Abstract:
    The statistical inference of network data has become a hot topic in statistical research in recent years. The independence assumption among sample data in traditional models often fails to meet the analytical demands of modern network-linked data. This work studies the independent effect of each network node in the network-linked data, and based on the idea of fusion penalty, the independent effect of the associated nodes is converged. Knockoff variables construct covariates independent of the target variable by imitating the structure of the original variable. With the help of Knockoff variables, this study proposes a general method framework for variable selection for network-linked data (NLKF). The study proves that NLKF can control the false discovery rate (FDR) at the target level and has higher statistical power than the Lasso variable selection method. When the covariance of the original data is unknown, the covariance matrix using the estimation still has good statistical properties. Finally, combining the 200 factor samples of more than 4 000 stocks in the A-share market and their network relations constructed by Shenyin Wanguo’s first-level industry classification, an example of the application in the field of financial engineering is given.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009505
    Abstract:
    As one of the important development directions of artificial intelligence, spiking neural networks have received extensive attention in the fields of neuromorphic engineering and brain-inspired computing. To solve the problems of poor generalization as well as large memory and time consumption in spiking neural networks, this study proposes a classification method based on spiking neural networks for spatio-temporal interactive images. Specifically, a temporal efficient training algorithm is introduced to compensate for the kinetic energy loss in the gradient descent process. Then, the spatial learning through time algorithms are integrated to improve the ability of the network to process information efficiently. Finally, the spatial attention mechanism is added to enable the network to better capture important features in the spatial dimension. The experimental results show that the training memory occupation on the three datasets of CIFAR10, DVS Gesture, and CIFAR10-DVS are reduced by 46.68%, 48.52%, and 10.46%, respectively, and the training speed is increased by 2.80 times, 1.31 times, and 2.76 times, respectively. These results indicate that the proposed method improves network performance effectively on the premise of maintaining accuracy.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009508
    Abstract:
    Abstract neural networks have made significant progress and demonstrated remarkable achievements in the field of text summarization. However, abstract summarization is highly likely to generate summaries of poor fidelity and even deviate from the semantic essence of the source documents due to its flexibility. To address this issue, this study proposes two methods to improve the fidelity of summaries. For Method 1, since entities play an important role in summaries and are usually derived from the original documents, the paper suggests allowing the model to copy entities from the source document to ensure that the generated entities match those in the source document and thereby prevent the generation of inconsistent entities. For Method 2, to better prevent the generated summary from deviating from the original text semantically, the study uses key entities and key tokens as two types of guiding information at different levels of granularity in the summary generation process. The performance of the proposed methods is evaluated using the ROUGE metric on two widely used text summarization datasets, namely, CNNDM and XSum. The experimental results demonstrate that both methods have significantly improved the performance of the model. Furthermore, the experiments also prove that the entity copy mechanism can, to some extent, use guiding information to correct introduced semantic noise.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009487
    Abstract:
    Existing Siamese network object tracking techniques perform only one fusion operation of template features and search features, which makes the object features on the fused feature map relatively coarse and unfavorable to the tracker’s precise positioning. In this study, a serial mutual correlation module is designed. It aims to use the existing mutual correlation method to enhance the object features on the fused feature map by performing multiple mutual correlation operations on the template features and the search features, so as to improve the accuracy of the subsequent classification and regression results and strike a balance between speed and accuracy with fewer parameters. The experimental results show that the proposed method achieves good results on four mainstream tracking datasets.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009488
    Abstract:
    This study is dedicated to exploring the complex process of opinion formation in social networks, with a particular focus on the mechanisms of consensus achievement in decentralized environments. A novel opinion classification strategy, termed “the second confidence interval” is proposed to improve the traditional DeGroot consensus model, and two distinct opinion dynamics models are developed: the far attack inbreeding (FAI) model and the outbred recent attack (ORA) model. These models comprehensively consider the degree of individual acceptance and emphasis on surrounding opinions. In addition, through an in-depth analysis of neighborhood opinions in social networks, a comprehensive setup of the individual model is carried out, covering multiple factors such as private opinions, expressed opinions, obstinacy, and preferences. The results indicate that under specific parameter settings, both the FAI and ORA models can reach a consensus more rapidly than the original DeGroot model. Specifically, the ORA model converges at around 700 steps, while the convergence speed of the FAI model gradually approaches that of the ORA model with increasing parameter values. Compared with the baseline model, the ORA model exhibits smaller variations in converged opinion values, no more than 3.5%, whereas the FAI model demonstrates greater volatility. These findings not only deepen people’s understanding of the public opinion formation mechanisms in social networks but also highlight the significance of opinion dynamics within individual neighborhoods in the consensus formation process, offering new perspectives and research directions for future studies in this field.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009489
    Abstract:
    There are challenges in training local models at resource-constrained edges in federated learning systems. The limitations in computing, storage, energy consumption, and other aspects constantly affect the scale and effectiveness of the model. Traditional federated pruning methods prune the model during the federated training process, but they fail to prune models adaptively according to the environment and may remove some important parameters, resulting in poor performance of models. This study proposes a distributed model pruning method based on federated reinforcement learning to solve this problem. Firstly, the model pruning process is abstracted, and a Markov decision process is established. DQN algorithm is used to construct a universal reinforcement pruning model, so as to dynamically adjust the pruning rate and improve model generalization performance. Secondly, an aggregation method for sparse models is designed to reinforce and generalize pruning methods, optimize the structure of the model, and reduce its complexity. Finally, this method is compared with different baselines on multiple publicly available datasets. The experimental results show that the proposed method maintains model effectiveness while reducing model complexity.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009493
    Abstract:
    The traditional prediction models for the corrosion rates of industrial pipelines often have the problems of dependence of feature extraction on artificial experience and insufficient generalization ability. To address this issue, this study combines the convolutional neural network (CNN) with the long short-term memory (LSTM) network and proposes a network model based on the cuckoo search (CS) optimization algorithm, namely, the CNN-LSTM-CS model, to predict the corrosion rates of industrial pipelines. Specifically, the collected pipeline corrosion dataset is pre-processed by normalization. Then, the CNN is used to extract information on the deep features of factors affecting the corrosion rates of the pipelines, and a CNN-LSTM prediction model is constructed by training the LSTM network. Finally, the CS algorithm is used to optimize the parameters of the prediction model, thereby reducing the prediction error and predicting the corrosion rate accurately. The experimental results show that compared with several typical prediction methods for the corrosion rate, the method proposed has higher prediction accuracy and provides a new approach for predicting the corrosion rates of industrial pipelines.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009494
    Abstract:
    During peer evaluation, evaluators may give inaccurate evaluation scores as a result of strategic evaluation. Taking into account the evaluators’ social interest (SI) relations, this study proposes a prediction method named graph attention network-social interest relation-oriented attention network (GAT-SIROAN) that integrates SI and the GAT. This method consists of a weighted network SIROAN that represents the evaluators’ relations with the solutions and a GAT that is used to predict peer evaluation scores. In the SIROAN, the interrupted time-series analysis (ITSA) method is applied to define the evaluators’ two characteristics: the self-evaluation ability and the peer evaluation ability, and these two characteristics are compared to obtain the SI factors and relations among the evaluators. In the score prediction stage, considering the importance of each node, this study uses a self-attention mechanism to calculate the attention coefficients at the nodes, thereby improving the prediction ability. Network parameters are learned by minimizing the root mean square error (RMSE) to obtain more accurate predicted peer evaluation scores. The GAT-SIROAN method is compared experimentally with five baseline methods, namely, the mean, median, PeerRank, RankwithTA, and GCN-SOIN methods, on real datasets. The results show that the GAT-SIROAN method outperforms all the above baseline methods in the RMSE.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009496
    Abstract:
    MonteCloPi is an anytime subgroup discovery algorithm based on Monte Carlo tree search (MCTS). It aims to build an asymmetric best-first search tree to discover a diverse pattern set with high quality by MCTS policies, while it is limited to a binary target. To this end, this study combines the characteristics of the numerical target to extend the MonteCloPi algorithm to the numerical target. The study selects the appropriate C value for the upper confidence bound (UCB) formula, adjusts the expansion weight of each sample dynamically as well as prunes the search tree, and uses the adaptive top-k-mean-update policy. Finally, the experimental results on the UCI datasets and the National Health and Nutrition Examination Survey (NHANES) audiometry datasets show that the proposed algorithm outperforms other algorithms in terms of discovering diverse pattern sets with high quality and the interpretability of the best subgroup.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009497
    Abstract:
    In the field of short-text intent recognition, convolutional neural networks (CNN) have garnered considerable attention due to their outstanding performance in extracting local information. Nevertheless, their limitations arise from the difficulty in capturing the global features of short-text corpora. To address this issue, this study combines the strengths of TextCNN and BiGRU-att to propose a dual-channel short-text intent recognition model, aiming to better recognize the intent of short texts by leveraging both local and global features, thereby compensating for the model’s inadequacies in capturing overall text features. The AB-CNN-BGRU-att model initially utilizes an ALBERT multi-layer bidirectional Transformer structure to vectorize the input text and subsequently feeds these vectors separately into TextCNN and BiGRU network models to extract local and global features, respectively. The fusion of these two types of features, followed by passing through fully connected layers and inputting into the Softmax function, yields the intent labels. The experimental results demonstrate that on the THUCNews_Title dataset, the proposed AB-CNN-BGRU-att algorithm achieves an accuracy (Acc) of 96.68% and an F1 score of 96.67%, exhibiting superior performance compared with other commonly used intent recognition models.
    Available online:  April 01, 2024 , DOI: 10.15888/j.cnki.csa.009498
    Abstract:
    This study analyzes the multivariate, nonlinear, and strong coupling characteristics of permanent magnet synchronous motors (PMSM) in industrial applications, as well as the difficulties in their parameter adjustment, response delay, poor robustness, and adaptability issues encountered with traditional PID control. A novel approach combining a twin delayed deep deterministic policy gradient (TD3) algorithm with PID control is proposed to optimize PID parameter adjustment for more accurate motor speed control. In this method, bidirectional long short-term memory networks (BiLSTM) are integrated into the Actor and Critic networks, significantly enhancing the processing capability for time-series data of PMSM’s dynamic behavior. This enables the system to accurately capture the current state and predict future trends, achieving more precise and adaptive self-tuning of PID parameters. Moreover, the integration of entropy regularization and curiosity-driven exploration methods further enhances the diversity of the strategy, preventing premature convergence to suboptimal strategies and encouraging in-depth exploration of unknown environments. To validate the effectiveness of the proposed method, a simulation model of a PMSM is designed, and the proposed BiLSTM-TD3-ICE method is compared with the traditional TD3 and the classical Ziegler-Nichols (Z-N) method. The experimental results demonstrate the significant advantages of the proposed strategy in control performance.
    Available online:  March 22, 2024 , DOI: 10.15888/j.cnki.csa.009482
    Abstract:
    Optical coherence tomography (OCT) is a new type of ophthalmic diagnosis method with non-contact, high resolution, and other characteristics, which has been used as an important reference for doctors to clinically diagnose ophthalmic diseases. As early detection and clinical diagnosis of retinopathy are crucial, it is necessary to change the time-consuming and laborious status quo of the manual classification of diseases. To this end, this study proposes a multi-classification recognition method for retinal OCT images based on an improved MobileNetV2 neural network. This method uses feature fusion technology to process images and designs an attention increase mechanism to improve the network model, greatly improving the classification accuracy of OCT images. Compared with the original algorithm, the classification effect has been significantly improved, and the classification accuracy, recall value, accuracy, and F1 value of the proposed model reach 98.3%, 98.44%, 98.94% and 98.69%, respectively, which has exceeded the accuracy of manual classification. Such methods not only speed up the diagnostic process, reduce the burden on doctors, and improve the quality of diagnosis in actual diagnosis, but also provide a new direction for ophthalmic medical research.
    Available online:  March 22, 2024 , DOI: 10.15888/j.cnki.csa.009483
    Abstract:
    To address the challenge of data sparsity within session recommendation systems, this study introduces a self-supervised graph convolution session recommendation model based on the attention mechanism (ATSGCN). The model constructs the session sequence into three distinct views: the hypergraph view, item view, and session view, showing the high-order and low-order connection relationships of the session. Secondly, the hypergraph view employs hypergraph convolutional networks to capture higher-order pairwise relationships among items within a conversation. The item view and session view employ graph convolutional networks and attention mechanisms respectively to capture lower-order connection details within local conversation data at both item and session levels. Finally, self-supervised learning is adopted to maximize the mutual information between the session representations learned by the two encoders, thereby effectively improving recommendation performance. Comparative experiment on the Nowplaying and Diginetica public datasets demonstrates the superior performance of the proposed model over the baseline model.
    Available online:  March 22, 2024 , DOI: 10.15888/j.cnki.csa.009484
    Abstract:
    Spatiotemporal forecasting finds extensive applications in domains such as pollution management, transportation, energy, and meteorology. Predicting PM2.5 concentration, as a quintessential spatiotemporal forecasting task, necessitates the analysis and utilization of spatiotemporal dependencies within air quality data. Existing studies on spatiotemporal graph neural networks (ST-GNNs) either employ predefined heuristic rules or trainable parameters for adjacency matrices, posing challenges in accurately representing authentic inter-station relationships. This study introduces the adaptive hierarchical graph convolutional neural network (AHGCNN) to address these issues concerning PM2.5 prediction. Firstly, a hierarchical mapping graph convolutional architecture is introduced, employing distinct self-learning adjacency matrices at different hierarchical levels, efficiently uncovering unique spatiotemporal dependencies among various monitoring stations. Secondly, an attention-based aggregation mechanism is employed to connect adjacency matrices across different hierarchical levels, expediting the convergence process. Finally, the hidden spatial states are fused with gated recurrent unit (GRU), forming a unified predictive framework capable of concurrently capturing multi-level spatial and temporal dependencies, ultimately delivering the prediction results. In the experiments, the proposed model is comparatively analyzed with seven mainstream models. The results indicate that the model can effectively capture the spatiotemporal dependencies between air monitoring stations, improving predictive accuracy.
    Available online:  March 22, 2024 , DOI: 10.15888/j.cnki.csa.009506
    Abstract:
    To address the problems of few shots and varying sizes in the surface defects on steel strips in industrial scenarios, this study proposes a detection network for surface defects on steel strips readily applicable to few-shot situations. Specifically, the algorithm is based on the you only look once version 5 small (YOLOv5s) framework and a multi-scale path aggregation network with an attention mechanism is designed to serve as the neck of the model and thereby enhance the ability of the model to predict the defect objects on multiple scales. Then, a self-adaptive decoupled detection structure is proposed to alleviate the contradiction among classification and positioning tasks in few-shot scenarios. Finally, a bounding box regression loss function fused with the Wasserstein distance is presented to improve the accuracy of the model in detecting small defect objects. Experiments show that the proposed model outperforms other few-shot object detection models on the few-shot dataset of surface defects on steel strips, indicating that it is more suitable for few-shot defect detection tasks in industrial environments.
    Available online:  March 22, 2024 , DOI: 10.15888/j.cnki.csa.009502
    Abstract:
    The previous methods for precipitation nowcasting based on deep learning try to model the spatiotemporal evolution of radar echoes in a unified architecture. However, these methods may face difficulty in capturing the complex spatiotemporal relationships completely. This study proposes a two-stage precipitation nowcasting network based on the Halo attention mechanism. This network divides the spatiotemporal evolution process of precipitation nowcasting into two stages: motion trend prediction and spatial appearance reconstruction. Firstly, a learnable optical flow module models the motion trend of radar echoes and generates coarse prediction results. Secondly, a feature reconstruction module models the spatial appearance changes in the historical radar echo sequences and refines the spatial appearance of the coarse-grained prediction results, generating fine-grained radar echo maps. The experimental results on the CIKM dataset demonstrate that the proposed method outperforms mainstream methods. The average Heidke skill score and critical success index are improved by 4.60% and 3.63%, reaching 0.48 and 0.45, respectively. The structural similarity index is improved by 4.84%, reaching 0.52, and the mean squared error is reduced by 6.13%, reaching 70.23.
    Available online:  March 15, 2024 , DOI: 10.15888/j.cnki.csa.009485
    Abstract:
    Unlike appearance-based methods whose input may bring in some background noises, skeleton-based gait representation methods take key joints as input, which can neglect the noise interference. Meanwhile, most of the skeleton-based representation methods ignore the significance of the prior knowledge of human body structure or tend to focus on the local features. This study proposes a skeleton-based gait recognition framework, GaitBody, to capture more distinctive features from the gait sequences. Firstly, the study leverages a temporal multi-scale convolution module with a large kernel size to learn the multi-granularity temporal information. Secondly, it introduces topology information of the human body into a self-attention mechanism to exploit the spatial representations. Moreover, to make full use of temporal information, the most salient temporal information is generated and introduced into the self-attention mechanism. Experiments on the CASIA-B and OUMVLP-Pose datasets show that the method achieves state-of-the-art performance in skeleton-based gait recognition, and ablation studies show the effectiveness of the proposed modules.
    Available online:  March 15, 2024 , DOI: 10.15888/j.cnki.csa.009495
    Abstract:
    This study proposes a two-stage path planning method for the path planning task of the inner wall operation of a mobile robot in multi-room. In the first stage, for the sensor failure caused by dust or fog in the environment during wall operation and incomplete path planning when there are many exits in a room, the study proposes a start-point automatically selected wall following path planning method, which is based on grid maps to generate the wall following paths offline. In the second stage, for the dynamic obstacle avoidance problem during point-to-point path planning, it proposes a point-to-point path planning method based on the prioritized experience replay soft actor critic (PSAC) algorithm, which introduces the prioritized experience playback strategy in the soft actor critic (SAC) to achieve dynamic obstacle avoidance. The comparison experiments of wall following path planning and dynamic obstacle avoidance are designed to verify the effectiveness of the proposed method in the indoor wall following path planning and point-to-point path planning.
    Available online:  March 15, 2024 , DOI: 10.15888/j.cnki.csa.009490
    Abstract:
    A new method for short-term power load forecasting is proposed to address issues such as complex and non-stationary load data, as well as large prediction errors. Firstly, this study utilizes the maximum information coefficient (MIC) to analyze the correlation of feature variables and selects relevant variables related to power load sequences. At the same time, as the variational mode decomposition (VMD) method is susceptible to subjective factors, the study employs the rime optimization algorithm (RIME) to optimize VMD and decompose the original power load sequence. Then, the long and short-term time series network (LSTNet) is improved as the prediction model by replacing the recursive LSTM layer with BiLSTM and incorporating the convolutional block attention mechanism (CBAM). Comparative experiments and ablation experiments demonstrate that RIME-VMD reduces the root mean square error (RMSE) of the LSTM, GRU, and LSTNet models by more than 20%, significantly improving the prediction accuracy of the models, and can be adapted to different prediction models. Compared with LSTM, GRU, and LSTNet, the proposed BLSTNet-CBAM model reduces the RMSE by 35.54%, 6.78%, and 1.46% respectively, improving the accuracy of short-term power load forecasting.
    Available online:  March 15, 2024 , DOI: 10.15888/j.cnki.csa.009492
    Abstract:
    In the context of current multi-modal emotion analysis in videos, the influence of modality representation learning on modality fusion and final classification results has not been adequately considered. To this end, this study proposes a multi-modal emotion analysis model that integrates cross-modal representation learning. Firstly, the study utilizes Bert and LSTM to extract internal information from text, audio, and visual modalities separately, followed by cross-modal representation learning to obtain more information-rich unimodal features. In the modal fusion stage, the study fuses the gating mechanism and improves the traditional Transformer fusion mechanism to control the information flow more accurately. Experimental results on the publicly available CMU-MOSI and CMU-MOSEI datasets demonstrate that the accuracy and F1 score of this model are improved compared with the traditional models, validating the effectiveness of this model.
    Available online:  January 30, 2024 , DOI: 10.15888/j.cnki.csa.009471
    Abstract:
    At present, there are many small targets in UAV images and the background is complex, which makes it easy to cause a high error detection rate in target detection. To solve these problems, this study proposes a small target detection algorithm for high-order depth separable UAV images. Firstly, by combining the CSPNet structure and ConvMixer network, the study utilizes the deeply separable convolution kernel to obtain the gradient binding information and introduces a recursively gated convolution C3 module to improve the higher-order spatial interaction ability of the model and enhance the sensitivity of the network to small targets. Secondly, the detection head adopts two heads to decouple and respectively outputs the feature map classification and position information, accelerating the model convergence speed. Finally, the border loss function EIoU is leveraged to improve the accuracy of the detection frame. The experimental results on the VisDrone2019 data set show that the detection accuracy of the model reaches 35.1%, and the missing and false detection rates of the model are significantly reduced, which can be effectively applied to the small target detection task of UAV images. The model generalization ability is tested on the DOTA 1.0 dataset and the HRSID dataset, and the experimental results show that the model has good robustness.
    Available online:  November 28, 2023 , DOI: 10.15888/j.cnki.csa.009377
    Abstract:
    With the continuous evolution of computer technology, process simulation is becoming increasingly widely employed in various industries and utilizes simulation models to mimic business process behavior. Additionally, it can be adopted to predict and optimize system performance, assess the impact of decisions, provide a decision-making basis for managers, and reduce the experimental cost and time. Currently, how to efficiently develop a simulation model that can be trusted has caught widespread attention. This study traces, summarizes, and analyzes the relevant references on methods for building business process simulation models. Meanwhile, the processes, advantages, disadvantages, and progress of process model-based, system dynamics-based, and deep learning-based simulation modeling approaches are presented. Finally, the challenges and future directions of process simulation are discussed to provide references for future research in this field.
    Available online:  March 31, 2022 , DOI: 10.15888/j.cnki.csa.008603
    [Abstract] (560) [HTML] (8) [PDF 1.10 M] (6918)
    Abstract:
    The security of electric energy plays an important role in national security. With the development of power 5G communication, a large number of power terminals have positioning demand. The traditional global positioning system (GPS) is vulnerable to spoofing. How to improve the security of GPS effectively has become an urgent problem. This study proposes a GPS spoofing detection algorithm with base station assistance in power 5G terminals. It uses the base station positioning with high security to verify the GPS positioning that may be spoofed and introduces the consistency factor (CF) to measure the consistency between GPS positioning and base station positioning. If CF is greater than a threshold, the GPS positioning is classified as spoofed. Otherwise, it is judged as normal. The experimental results show that the accuracy of the algorithm is 99.98%, higher than that of traditional classification algorithms based on machine learning. In addition, our scheme is also faster than those algorithms.
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2000,9(2):38-41, DOI:
    [Abstract] (12490) [HTML] (0) [PDF ] (19919)
    Abstract:
    本文详细讨论了VRML技术与其他数据访问技术相结合 ,实现对数据库实时交互的技术实现方法 ,并简要阐述了相关技术规范的语法结构和技术要求。所用技术手段安全可靠 ,具有良好的实际应用表现 ,便于系统移植。
    1993,2(8):41-42, DOI:
    [Abstract] (9309) [HTML] (0) [PDF ] (29641)
    Abstract:
    本文介绍了作者近年来应用工具软件NU清除磁盘引导区和硬盘主引导区病毒、修复引导区损坏磁盘的 经验,经实践检验,简便有效。
    1995,4(5):2-5, DOI:
    [Abstract] (9045) [HTML] (0) [PDF ] (11918)
    Abstract:
    本文简要介绍了海关EDI自动化通关系统的定义概况及重要意义,对该EDI应用系统下的业务运作模式所涉及的法律问题,采用EDIFACT国际标准问题、网络与软件技术问题,以及工程管理问题进行了结合实际的分析。
    2016,25(8):1-7, DOI: 10.15888/j.cnki.csa.005283
    [Abstract] (8407) [HTML] () [PDF 1167952] (35330)
    Abstract:
    从2006年开始,深度神经网络在图像/语音识别、自动驾驶等大数据处理和人工智能领域中都取得了巨大成功,其中无监督学习方法作为深度神经网络中的预训练方法为深度神经网络的成功起到了非常重要的作用. 为此,对深度学习中的无监督学习方法进行了介绍和分析,主要总结了两类常用的无监督学习方法,即确定型的自编码方法和基于概率型受限玻尔兹曼机的对比散度等学习方法,并介绍了这两类方法在深度学习系统中的应用,最后对无监督学习面临的问题和挑战进行了总结和展望.
    2008,17(5):122-126, DOI:
    [Abstract] (7480) [HTML] (0) [PDF ] (45604)
    Abstract:
    随着Internet的迅速发展,网络资源越来越丰富,人们如何从网络上抽取信息也变得至关重要,尤其是占网络资源80%的Deep Web信息检索更是人们应该倍加关注的难点问题。为了更好的研究Deep Web爬虫技术,本文对有关Deep Web爬虫的内容进行了全面、详细地介绍。首先对Deep Web爬虫的定义及研究目标进行了阐述,接着介绍了近年来国内外关于Deep Web爬虫的研究进展,并对其加以分析。在此基础上展望了Deep Web爬虫的研究趋势,为下一步的研究奠定了基础。
    2011,20(11):80-85, DOI:
    [Abstract] (7470) [HTML] () [PDF 863160] (39966)
    Abstract:
    在研究了目前主流的视频转码方案基础上,提出了一种分布式转码系统。系统采用HDFS(HadoopDistributed File System)进行视频存储,利用MapReduce 思想和FFMPEG 进行分布式转码。详细讨论了视频分布式存储时的分段策略,以及分段大小对存取时间的影响。同时,定义了视频存储和转换的元数据格式。提出了基于MapReduce 编程框架的分布式转码方案,即Mapper 端进行转码和Reducer 端进行视频合并。实验数据显示了转码时间随视频分段大小和转码机器数量不同而变化的趋势。结
    1999,8(7):43-46, DOI:
    [Abstract] (7067) [HTML] (0) [PDF ] (21567)
    Abstract:
    用较少的颜色来表示较大的色彩空间一直是人们研究的课题,本文详细讨论了半色调技术和抖动技术,并将它们扩展到实用的真彩色空间来讨论,并给出了实现的算法。
    2007,16(9):22-25, DOI:
    [Abstract] (6330) [HTML] (0) [PDF ] (4688)
    Abstract:
    本文结合物流遗留系统的实际安全状态,分析了面向对象的编程思想在横切关注点和核心关注点处理上的不足,指出面向方面的编程思想解决方案对系统进行分离关注点处理的优势,并对面向方面的编程的一种具体实现AspectJ进行分析,提出了一种依据AspectJ对遗留物流系统进行IC卡安全进化的方法.
    2012,21(3):260-264, DOI:
    [Abstract] (6257) [HTML] () [PDF 336300] (42547)
    Abstract:
    开放平台的核心问题是用户验证和授权问题,OAuth 是目前国际通用的授权方式,它的特点是不需要用户在第三方应用输入用户名及密码,就可以申请访问该用户的受保护资源。OAuth 最新版本是OAuth2.0,其认证与授权的流程更简单、更安全。研究了OAuth2.0 的工作原理,分析了刷新访问令牌的工作流程,并给出了OAuth2.0 服务器端的设计方案和具体的应用实例。
    2011,20(7):184-187,120, DOI:
    [Abstract] (6068) [HTML] () [PDF 731903] (30397)
    Abstract:
    针对智能家居、环境监测等的实际要求,设计了一种远距离通讯的无线传感器节点。该系统采用集射频与控制器于一体的第二代片上系统CC2530 为核心模块,外接CC2591 射频前端功放模块;软件上基于ZigBee2006 协议栈,在ZStack 通用模块基础上实现应用层各项功能。介绍了基于ZigBee 协议构建无线数据采集网络,给出了传感器节点、协调器节点的硬件设计原理图及软件流程图。实验证明节点性能良好、通讯可靠,通讯距离较TI 第一代产品有明显增大。
    2004,13(10):7-9, DOI:
    [Abstract] (5832) [HTML] (0) [PDF ] (9568)
    Abstract:
    本文介绍了车辆监控系统的组成,研究了如何应用Rockwell GPS OEM板和WISMOQUIKQ2406B模块进行移动单元的软硬件设计,以及监控中心 GIS软件的设计.重点介绍嵌入TCP/IP协议处理的Q2406B模块如何通过AT指令接入Internet以及如何和监控中心传输TCP数据.
    2008,17(1):113-116, DOI:
    [Abstract] (5746) [HTML] (0) [PDF ] (47404)
    Abstract:
    排序是计算机程序设计中一种重要操作,本文论述了C语言中快速排序算法的改进,即快速排序与直接插入排序算法相结合的实现过程。在C语言程序设计中,实现大量的内部排序应用时,所寻求的目的就是找到一个简单、有效、快捷的算法。本文着重阐述快速排序的改进与提高过程,从基本的性能特征到基本的算法改进,通过不断的分析,实验,最后得出最佳的改进算法。
    2008,17(8):87-89, DOI:
    [Abstract] (5681) [HTML] (0) [PDF ] (39369)
    Abstract:
    随着面向对象软件开发技术的广泛应用和软件测试自动化的要求,基于模型的软件测试逐渐得到了软件开发人员和软件测试人员的认可和接受。基于模型的软件测试是软件编码阶段的主要测试方法之一,具有测试效率高、排除逻辑复杂故障测试效果好等特点。但是误报、漏报和故障机理有待进一步研究。对主要的测试模型进行了分析和分类,同时,对故障密度等参数进行了初步的分析;最后,提出了一种基于模型的软件测试流程。
    2008,17(8):2-5, DOI:
    [Abstract] (5607) [HTML] (0) [PDF ] (30275)
    Abstract:
    本文介绍了一个企业信息门户中单点登录系统的设计与实现。系统实现了一个基于Java EE架构的结合凭证加密和Web Services的单点登录系统,对门户用户进行统一认证和访问控制。论文详细阐述了该系统的总体结构、设计思想、工作原理和具体实现方案,目前系统已在部分省市的广电行业信息门户平台中得到了良好的应用。
    2004,13(8):58-59, DOI:
    [Abstract] (5538) [HTML] (0) [PDF ] (25988)
    Abstract:
    本文介绍了Visual C++6.0在对话框的多个文本框之间,通过回车键转移焦点的几种方法,并提出了一个改进方法.
    2009,18(3):164-167, DOI:
    [Abstract] (5472) [HTML] (0) [PDF ] (26906)
    Abstract:
    介绍了一种基于DWGDirectX在不依赖于AutoCAD平台的情况下实现DWG文件的显示、操作、添加的简单的实体的方法,并对该方法进行了分析和实现。
    2009,18(5):182-185, DOI:
    [Abstract] (5455) [HTML] (0) [PDF ] (31186)
    Abstract:
    DICOM 是医学图像存储和传输的国际标准,DCMTK 是免费开源的针对DICOM 标准的开发包。解读DICOM 文件格式并解决DICOM 医学图像显示问题是医学图像处理的基础,对医学影像技术的研究具有重要意义。解读了DICOM 文件格式并介绍了调窗处理的原理,利用VC++和DCMTK 实现医学图像显示和调窗功能。
    2019,28(6):1-12, DOI: 10.15888/j.cnki.csa.006915
    [Abstract] (5446) [HTML] (16078) [PDF 672566] (13083)
    Abstract:
    知识图谱是以图的形式表现客观世界中的概念和实体及其之间关系的知识库,是语义搜索、智能问答、决策支持等智能服务的基础技术之一.目前,知识图谱的内涵还不够清晰;且因建档不全,已有知识图谱的使用率和重用率不高.为此,本文给出知识图谱的定义,辨析其与本体等相关概念的关系.本体是知识图谱的模式层和逻辑基础,知识图谱是本体的实例化;本体研究成果可以作为知识图谱研究的基础,促进知识图谱的更快发展和更广应用.本文罗列分析了国内外已有的主要通用知识图谱和行业知识图谱及其构建、存储及检索方法,以提高其使用率和重用率.最后指出知识图谱未来的研究方向.
    2010,19(10):42-46, DOI:
    [Abstract] (5396) [HTML] () [PDF 1301305] (20406)
    Abstract:
    综合考虑基于构件组装技术的虚拟实验室的系统需求,分析了工作流驱动的动态虚拟实验室的业务处理模型,介绍了轻量级J2EE框架(SSH)与工作流系统(Shark和JaWE)的集成模型,提出了一种轻量级J2EE框架下工作流驱动的动态虚拟实验室的设计和实现方法,给出了虚拟实验项目的实现机制、数据流和控制流的管理方法,以及实验流程的动态组装方法,最后,以应用实例说明了本文方法的有效性。
  • 全文下载排行(总排行年度排行各期排行)
    摘要点击排行(总排行年度排行各期排行)

  • Article Search
    Search by issue
    Select AllDeselectExport
    Display Method:
    2007,16(10):48-51, DOI:
    [Abstract] (4652) [HTML] (0) [PDF 0.00 Byte] (86080)
    Abstract:
    论文对HDF数据格式和函数库进行研究,重点以栅格图像为例,详细论述如何利用VC++.net和VC#.net对光栅数据进行读取与处理,然后根据所得到的象素矩阵用描点法显示图像.论文是以国家气象中心开发Micaps3.0(气象信息综合分析处理系统)的课题研究为背景的.
    2002,11(12):67-68, DOI:
    [Abstract] (3765) [HTML] (0) [PDF 0.00 Byte] (57438)
    Abstract:
    本文介绍非实时操作系统Windows 2000下,利用VisualC++6.0开发实时数据采集的方法.所用到的数据采集卡是研华的PCL-818L.借助数据采集卡PCL-818L的DLLs中的API函数,提出三种实现高速实时数据采集的方法及优缺点.
    2008,17(1):113-116, DOI:
    [Abstract] (5746) [HTML] (0) [PDF 0.00 Byte] (47404)
    Abstract:
    排序是计算机程序设计中一种重要操作,本文论述了C语言中快速排序算法的改进,即快速排序与直接插入排序算法相结合的实现过程。在C语言程序设计中,实现大量的内部排序应用时,所寻求的目的就是找到一个简单、有效、快捷的算法。本文着重阐述快速排序的改进与提高过程,从基本的性能特征到基本的算法改进,通过不断的分析,实验,最后得出最佳的改进算法。
    2008,17(5):122-126, DOI:
    [Abstract] (7480) [HTML] (0) [PDF 0.00 Byte] (45604)
    Abstract:
    随着Internet的迅速发展,网络资源越来越丰富,人们如何从网络上抽取信息也变得至关重要,尤其是占网络资源80%的Deep Web信息检索更是人们应该倍加关注的难点问题。为了更好的研究Deep Web爬虫技术,本文对有关Deep Web爬虫的内容进行了全面、详细地介绍。首先对Deep Web爬虫的定义及研究目标进行了阐述,接着介绍了近年来国内外关于Deep Web爬虫的研究进展,并对其加以分析。在此基础上展望了Deep Web爬虫的研究趋势,为下一步的研究奠定了基础。

External Links

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063