Abstract: Single-cell RNA sequencing (scRNA-seq) performs high-throughput sequencing analysis of the transcriptomes at the level of individual cells. Its primary application is to identify cell subpopulations with distinct functions, usually based on cell clustering. However, the high dimensionality, noise, and sparsity of scRNA-seq data make clustering challenging. Traditional clustering methods are inadequate, and most existing single-cell clustering approaches only consider gene expression patterns while ignoring relationships between cells. To address these issues, a self-optimizing single-cell clustering method with contrastive learning and graph neural network (scCLG) is proposed. This method employs an autoencoder to learn cellular feature distribution. First, it begins by constructing a cell-gene graph, which is encoded using a graph neural network to effectively harness information on intercellular relationships. Subgraph sampling and feature masking create augmented views for contrastive learning, further optimizing feature representation. Finally, a self-optimizing strategy is utilized to jointly train the clustering and feature modules, continually refining feature representation and clustering centers for more accurate clustering. Experiments on 10 real scRNA-seq datasets demonstrate that scCLG can learn robust representations of cell features, significantly surpassing other methods in clustering accuracy.
Abstract: Dimensionality reduction plays a crucial role in machine learning and pattern recognition. The existing projection-based methods tend to solely utilize distance information or representation relationships among data points to maintain the data structure, which makes it difficult to effectively capture the nonlinear features and complex correlations of data manifolds in high-dimensional space. To address this issue, this study proposes a method: enhanced locality preserving projection with latent sparse representation learning (LPP_SRL). The method not only utilizes distance information to preserve the local structure of the data but also leverages multiple local linear representations to unveil the global nonlinear structure of the data. Moreover, to establish a connection between projection learning and sparse self-representation, this study employs a novel strategy by replacing the dictionary in sparse self-representation with reconstructed samples from the low-dimensional representation. This approach effectively filters out irrelevant features and noise, thereby better preserving the principal components in the original feature space. Extensive experiments conducted on multiple publicly available benchmark datasets have demonstrated the effectiveness and superiority of the proposed method.
Abstract: Accurate estimation of tropical cyclone intensity is the basis of effective intensity prediction and is crucial for disaster forecasting. Current tropical cyclone intensity estimation technology based on deep learning shows superior performance, but there is still a problem of insufficient physical information fusion. Therefore, based on the deep learning framework, this study proposes a physical factor fusion for tropical cyclone intensity estimation model (PF-TCIE) to estimate the intensity of tropical cyclones in the northwest Pacific. PF-TCIE consists of a multi-channel satellite cloud image learning branch and a physical information extraction branch. The multi-channel satellite cloud image learning branch is used to extract tropical cyclone cloud system features, and the physical information extraction branch is used to extract physical factor features to constrain the learning of cloud system features. The data used in this article include Himawari-8 satellite data and ERA-5 reanalysis data. Experimental results show that after introducing multiple channels, the root mean squared error (RMSE) of the model is reduced by 3.7% compared with a single channel. At the same time, the introduction of physical information further reduces the model error by 8.5%. The RMSE of PF-TCIE finally reaches 4.83 m/s, which is better than most deep learning methods.
Abstract: This study constructs a named entity recognition (NER) model suitable for the bone-sign interpretations of Han Chang’an City to solve the problem of the inability to classify some bone-sign interpretations due to the lack of key content. The original text of the bone-sign interpretations of Han Chang’an City is used as the dataset, and the begin, inside, outside, end (BIOE) annotation method is utilized to annotate the bone-sign interpretation entities. A multi-feature fusion network (MFFN) model is proposed, which not only considers the structural features of individual characters but also integrates the structural features of character-word combinations to enhance the model’s comprehension of the bone-sign interpretations. The experimental results demonstrate that the MFFN model can better identify the named entities of the bone-sign interpretations of Han Chang’an City and classify the bone-sign interpretations, outperforming existing NER models. This model provides historians and researchers with richer and more precise data support.
Abstract: In the task of few-shot open-set recognition (FSOSR), effectively distinguishing closed-set from open-set samples presents a notable challenge, especially in cases of sample scarcity. Current approaches exhibit uncertainty in describing boundaries for known class distributions, leading to insufficient discrimination between closed-set and open-set spaces. To tackle this issue, this study introduces a novel method for FSOSR leveraging feature decoupling and openness learning. The primary objective is to employ a feature decoupling module to compel the model to decouple class-specific features and open-set features, thereby accentuating the disparity between unknown and known classes. To achieve effective feature decoupling, an openness learning loss is introduced to facilitate the acquisition of open-set features. By integrating similarity metric values and anti-openness scores as the optimization target, the model is steered towards learning more discriminative feature representations. Experimental results on publicly datasets miniImageNet and tieredImageNet demonstrate that the proposed method substantially enhances the detection rate of unknown class samples while accurately classifying known classes.
Abstract: Knowledge distillation (KD) is a technique that transfers knowledge from a complex model (teacher model) to a simpler model (student model). While many popular distillation methods currently focus on intermediate feature layers, response-based knowledge distillation (RKD) has regained its position among the SOTA models after decoupled knowledge distillation (DKD) was introduced. RKD leverages strong consistency constraints to split classic knowledge distillation into two parts, addressing the issue of high coupling. However, this approach overlooks the significant representation gap caused by the disparity in teacher-student network architectures, leading to the problem where smaller student models cannot effectively learn knowledge from teacher models. To solve this problem, this study proposes a diffusion model to narrow the representation gap between teacher and student models. This model transfers teacher features to train a lightweight diffusion model, which is then used to denoise the student model, thus reducing the representation gap between teacher and student models. Extensive experiments demonstrate that the proposed model achieves significant improvements over baseline models on CIFAR-100 and ImageNet datasets, maintaining good performance even when there is a large gap in teacher-student network architectures.
Abstract: Effective detection of damage and foreign matter on transmission lines is very important for intelligent circuit inspection. However, it is difficult to collect data from different power companies to train a unified detection model due to the data island problem. Therefore, this study proposes a circuit defect detection method based on federated transfer learning by combining federated transfer learning and object detection algorithms. Specifically, a high-performance detection model is selected as the basic detection model, whose initial weight is frozen. The model adaptively learns from the data of different clients by using the low-rank decomposition of the weight matrix and inserting an adapter layer, so as to greatly reduce the number of the trainable parameters. An adaptive weight screening method is also proposed to accurately determine the low-rank decomposition of the weight layer and the insertion position of the adapter layer of the model. Through simple adaptive learning, the model can effectively adapt to the data distributions from different power companies. Experimental verification on a power dataset that closely resembles real-world conditions shows that the proposed model can adapt to different distributed detection scenarios under the premise of ensuring the security and privacy of customer data.
Abstract: This study proposes a CEEMDAN-SBiGRU combined prediction model with an optimized multi-head attention mechanism to enhance the precision of short-term power load forecasting and fully explore the complex correlation of power load data. The model improves two modules: feature extraction and feature fusion. Firstly, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is utilized to decompose the power load data into multiple intrinsic mode function (IMF) and a residual signal (RES); and a denoising autoencoder DAE is introduced to extract potential features from the data affected by meteorological factors, workday types, and temperature changes. Secondly, the extracted intricate features are fed into the stacked bidirectional gated recurrent unit (SBiGRU) module to obtain hidden states. Finally, the obtained hidden states are input into the optimized multi-head attention (OMHA) mechanism module,which incorporates residual mechanism and layer normalization, to accurately assign higher weights to important features and solve the problem of noise interference. The experimental results indicate that the CEEMDAN-SBiGRU-OMHA combined model achieves higher accuracy.
Abstract: Alzheimer’s disease poses a significant public health challenge in the global aging society. One of its main clinical symptoms is the gradual decline in cognitive abilities. A crucial topic in Alzheimer’s disease research is to establish models that link cognitive performance with neuroimaging data to identify neuroimaging biomarkers associated with cognitive abilities. However, neuroimaging data often exhibit high dimensions, heavy-tailed distributions, and outliers. These characteristics not only reduce the accuracy and stability of models but also pose challenges for result explanations. To address these issues, this study uses sparse quantile regression to model and perform feature selection on data from the Alzheimer’s disease neuroimaging initiative (ADNI). This study also explores the distribution characteristics of cognitive scores at different quantiles and identifies specific brain regions associated with cognitive abilities. Experimental results demonstrate that sparse quantile regression successfully identifies the brain regions relevant to cognitive abilities at different quantiles. This research shows the potential of applying sparse quantile regression in neuroimaging data analysis and provides a novel perspective and approach for neuroimaging research.
Abstract: Point cloud segmentation is a crucial step in 3D visual guidance and scene understanding, whose quality directly affects the quality of 3D measurement or imaging. To improve the segmentation accuracy and solve the out-of-bounds problem, this study proposes a point cloud segmentation algorithm for 3D vision guidance. This algorithm generates initial supervoxel data and extracts boundary points based on the spatial position, curvature and normal vectors of the point cloud. Boundary refinement is then performed, which refers to the redistribution of boundary points to optimize supervoxels, by calculating the similarity measure between boundary points and neighboring supervoxels. Ultimately, candidate fragments are obtained based on region growing and merged according to their concavity and convexity to achieve object-level segmentation. Visualization and quantitative comparison show that this algorithm effectively solves the out-of-bounds problem and accurately segment complex point cloud models. The segmentation accuracy is 89.04% and the recall rate is 87.38%.
Abstract: Model obfuscation refers to the equivalent transformation of neural networks into another form, which is an efficient and low-cost technique for protecting neural networks. To detect the flaws of model obfuscation, researchers have proposed model deobfuscation techniques in the hope of improving model obfuscation methods. However, model deobfuscation techniques are not fully explored, with limited applicability and effectiveness. Therefore, this study proposes a model deobfuscation method based on neural machine translation (NMT). This method models a deobfuscation task as a seq2seq task. It provides a more detailed sequential representation of the obfuscated model, identifies and processes the obfuscated information in the weight parameters, and utilizes an NMT-based model for deobfuscation translation. The experimental results demonstrate that this method addresses the shortcomings of existing methods, effectively capturing the obfuscation features and restoring the architectures of models. It can serve as a general solution to model deobfuscation.
Abstract: To address the issue of nonlinear radial distortion present in multimodal remote sensing images, this study proposes a method for matching multimodal remote sensing images that integrates phase symmetry features with rank-based local self-similarity. Initially, the local phase information of the images is utilized to construct a phase symmetry map, upon which feature extraction is performed using the features from the accelerated segment test (FAST) algorithm. Subsequently, a new feature descriptor named RPCLSS is constructed, which combines rank-based local self-similarity and phase congruency. Finally, the fast sample consensus (FSC) algorithm is employed to eliminate mismatched points. Comparative experiments are conducted on publicly available multi-source remote sensing image datasets, comparing the proposed method against five existing advanced matching methods. The results reveal that the proposed method outperforms these state-of-the-art methods in terms of the number of correct matching points, matching precision, and matching correctness.
Abstract: Due to the lack of cooperative information, non-cooperative spacecraft cannot obtain pose data directly from sensors. Therefore, a pose recognition network based on inverse synthetic aperture radar (ISAR) images is proposed. Compared with the images taken by space photography satellites and simulation data, this kind of image is easier to obtain and cheaper, but there are some problems such as low resolution ratio and incomplete panel image. Therefore, in image preprocessing, the network uses YOLOX-tiny as a spacecraft clipping network by adjusting it to avoid the data marked in the image affecting the subsequent network training, so that the network only focuses on the region where the spacecraft is located. The enhanced Lee filter is used to remove image noise and improve image quality. In the backbone network, the STN module is added to make the network select the most relevant region attention, and the U-Net is designed into a dense residual block structure and combined with the CBAM module to reduce the feature loss during sampling and improve the accuracy of the model. In addition, multi-head self-attention is introduced to capture more global information. The experimental results show that the minimum, maximum, and average errors of this model are improved compared with some mainstream models, and the errors are averagely reduced by 0.5–0.6. All this proves that the network has a better pose recognition ability.
Abstract: Current progressive secret image sharing schemes do not consider cheating attacks by dishonest participants, allowing them to use false shadow images for cheating attacks. To ensure successful progressive reconstruction, this study divides the bit plane of pixels into two parts and uses the Lagrange interpolation algorithm along with visual cryptography schemes to address this issue. The sliding window of the pixel bit plane is determined by a pseudo-random number, and authentication information is embedded into the sliding window through a filtering operation to achieve authentication capability. Additionally, different strategies for bit plane division produce different progressive reconstruction effects, enabling more flexible progressive reconstruction. Theoretical analysis and experimental results both demonstrate the effectiveness of the proposed scheme.
Abstract: Ensuring the precise maintenance and stable operation of mineral processing equipment has always been an important challenge for mining-related enterprises while developing predictive maintenance systems for equipment has become a crucial means to reduce maintenance costs and improve production efficiency. This study analyzes the functional requirements of predictive maintenance systems, designs architecture and overall functional structure for a predictive maintenance system based on a micro-service architecture, and elaborates on the key technologies of the system. Moreover, the study proposes an evaluation model for equipment health status based on a multi-scale CNN fusion attention mechanism, as well as a prediction model for current trend fusion based on CNN and BiLSTM, to support the construction of the predictive maintenance system. The completed system has been applied at Ansteel Group Guanbaoshan Mining Co. Ltd., where the proposed model undergoes testing. The results show that the proposed model outperforms existing models with its high accuracy and robustness. The developed system can provide precise equipment maintenance plans, reduce equipment maintenance costs, and improve enterprise production efficiency.
Abstract: To improve the detection accuracy and speed of deep reinforcement learning object detection models, modifications are made to traditional models. To address inadequate feature extraction, a VGG16 feature extraction module integrated with a channel attention mechanism is introduced as the state input for reinforcement learning, enabling a more comprehensive capture of key information in images. To address inaccurate evaluation caused by relying solely on the intersection over union as a reward, an improved reward mechanism that also considers the distance between the center points and the aspect ratio of the ground truth box and the predicted box is employed, making the reward more reasonable. To accelerate the convergence of the training process and enhance the objectivity of the agent’s evaluation of current states and actions, the Dueling DQN algorithm is used for training. After conducting experiments on the PASCAL VOC2007 and PASCAL VOC2012 datasets, experimental results show that the detection model only needs 4–10 candidate boxes to detect the target. Compared with Caicedo-RL, the accuracy is improved by 9.8%, and the mean intersection over union between the predicted and ground truth boxes is increased by 5.6%.
Abstract: The narrow spectral bands of hyperspectral images (HSI) provide rich information for many visual tasks, but also pose challenges for feature extraction. Despite various deep learning methods proposed by researchers, the advantages of these architectures are not fully combined. Therefore, this study proposes a high-frequency enhanced dual-branch hyperspectral image super-resolution network (HFEDB-Net) that effectively extracts spatial and spectral information of HSI by integrating the image spatial feature extraction advantage of convolutional neural network (CNN) with the adaptive capability and long-distance dependency extraction advantage of Transformers. HFEDB-Net consists of a high-frequency information enhancement branch and a backbone branch. In the high-frequency information enhancement branch, the high-frequency information of low-resolution and high-resolution HSI is extracted by using Laplacian pyramids, and the results serve as the input and label for the high-frequency branch. A spectral-enhanced Transformer is employed as the feature extraction method for this branch. In the backbone branch, a CNN with channel attention is utilized to extract spatial features and spectral information comprehensively. Finally, the results from both branches are combined through CNN to obtain the final reconstructed image. Additionally, the attention mechanism and encoder layers of the Transformer are respectively improved by using multi-head attention and multi-scale strategies to better extract spatial and spectral information from HSI. Experimental results demonstrate that HFEDB-Net outperforms current state-of-the-art methods in terms of quantitative evaluation metrics and visual effects on two public datasets.
Abstract: In recent years, the exacerbation of traffic congestion has sparked widespread interest in the research on traffic signal control algorithms. Current studies indicate that methods based on deep reinforcement learning (DRL) exhibit promising performance in simulated environments. However, challenges persist in their practical application, including substantial requirements for data and computational resources, as well as difficulties in achieving coordination between intersections. To address these challenges, this study proposes a novel traffic signal control algorithm based on a contextual multi-armed bandit model. In contrast to conventional algorithms, the proposed algorithm achieves efficient coordination between intersections by extracting the main arteries from the road network. Moreover, it employs a contextual multi-armed bandit model to facilitate rapid and effective traffic signal control. Finally, through extensive experimentation on both real and synthetic datasets, the superiority of the proposed algorithm over previous algorithms is empirically demonstrated.
Abstract: Cancer driver genes play a crucial role in the formation and progression of cancer. Accurate identification of cancer driver genes contributes to a deeper understanding of the mechanisms underlying cancer development and advances precision medicine. To address the heterogeneity and complexity challenges in the current field of cancer driver gene identification, this study presents the design and implementation of a cancer driver gene identification system, ACGAI, based on graph autoencoder and LightGBM. The system initially employs unsupervised learning with a graph autoencoder to grasp the complex topological structure of the biomolecular network. Subsequently, the generated embedding representations are concatenated with original gene features, forming gene-enhanced features input into LightGBM. After training, the system outputs predictive scores for each gene on the biomolecular network, achieving accurate identification of cancer driver genes. Finally, the system utilizes Web technology to create a user-friendly and highly interactive visualization interface, enabling cancer driver gene identification in the context of gene set analysis and providing biological interpretation for the identification results. Through rigorous testing, the system exhibits superior identification performance compared to other methods, demonstrating its effectiveness in identifying cancer driver genes.
Abstract: With the development of global economic integration, cross-border trade has become an important driving force for global economic development. However, it is facing issues such as data security, information silos, and information asymmetry. Based on this, this study proposes a blockchain-based scheme for data sharing and access control in cross-border trade. The scheme uses a collaborative storage mechanism of blockchain and Inter planetary file system (IPFS) to effectively reduce the storage load of blockchain. In addition, adual key regression model combined with time dimension is adopted to encrypt and store data, as well as assign access permissions by setting different time periods, which limits the unnecessary access of data users outside a certain time span. Finally, corresponding smart contracts are designed to achieve efficient management of the entire life cycle flow of data, improving sharing efficiency. The experimental results show that the proposed scheme can achieve secure data sharing in cross-border trade and fine-grained access control for users.
Abstract: Face image generation requires high realism and controllability. This study proposes a new algorithm for face image generation that is jointly controlled by text and facial key points. The text constrains the generation of faces at a semantic level, while facial key points enable the model to control the generation of facial features, expressions, and details based on given facial information. The proposed algorithm improves the existing diffusion model and introduces additional components: text processing models (CM), keypoint control networks (KCN), and autoencoder networks (ACN). Specifically, the diffusion model is a noise inference algorithm based on the diffusion theory; CM is designed based on an attention mechanism to encode and store text information; KCN receives the location information of key points, enhancing the controllability of face generation; ACN alleviates the generation pressure of the diffusion model and reduces the time required to generate samples. In addition, to adapt to generating face images, this research constructs a dataset containing 30000 face images. In the proposed algorithm, given prerequisite text and a facial keypoint image, the model extracts feature information and keypoint information from the text, generating a highly realistic and controllable target face image. Compared with mainstream methods, the proposed algorithm improves the FID index by about 5%–23% and the IS index by about 3%–14%, which proves its superiority.
Abstract: Heart rate and saturation of peripheral capillary oxygenation (SpO2) are very important physiological indicators of human health. In recent years, non-contact heart rate and SpO2 detection methods based on imaging photoplethysmography (IPPG) have gradually become a research focus as they are convenient and freely-applied. The main work is as follows. First, the study introduces the background and research significance of non-contact detection methods. Secondly, two aspects of target region detection and region of interest (ROI) are selected to summarize and clarify the research status and future improvement direction. Thirdly, the detection methods of heart rate and SpO2 are summarized from three aspects: traditional method, signal processing combined with deep learning method and end-to-end method, and the data sets used in deep learning method and the detection effects displayed in each data set are sorted out. Finally, the paper points out the problems that need to be solved and the future research direction in this field.
Abstract: In imbalanced datasets, the presence of noise and class overlapping often leads to poor performance of traditional classifiers, resulting in minority class samples being difficult to classify accurately. To improve classification performance, a method for handling imbalanced data based on shared nearest neighbor density peak clustering and ensemble filtering mechanism is proposed. This method first uses the shared nearest neighbor density peak clustering algorithm to adaptively divide the minority class samples into multiple clusters. Then, based on the density and size within the clusters, oversampling weights are allocated to each cluster. During the synthesis within clusters, the local sparsity and clustering coefficient of the samples are considered to select neighboring samples and determine the weight range of linear interpolation, thus avoiding the generation of new samples in the majority class aggregation area. Finally, an ensemble filtering mechanism is introduced to eliminate noise and hard-to-learn boundary samples to regulate the decision boundary and improve the quality of generated samples. Compared with 5 oversampling methods, this algorithm performs better overall on 8 public datasets.
Abstract: Currently, in multimodal sentiment analysis tasks, there are problems such as insufficient single modal feature extraction and lack of stability in data fusion methods. This study proposes a method of optimizing modal features that uses interpolation to solve these problems. Firstly, the interpolation-optimized BERT and GRU models are applied to extract features, and both of the models are used to mine text, audio, and video information. Secondly, an improved attention mechanism is used to fuse text, audio, and video information, thus achieving modal fusion more stably. This method is tested on the MOSI and MOSEI datasets. The experimental results show that using interpolation can improve the accuracy of multi-modal sentiment analysis tasks based on optimizing modal features. This result verifies the effectiveness of interpolation.
Abstract: Light-weight image fusion algorithm is very important for human eye observation and machine recognition. By studying the importance of visual saliency in infrared and visible image fusion, a visual saliency map (VSM)-guided MSDNet fusion network is optimized and designed based on the SDNet fusion network. Firstly, the structure and channel numbers of SDNet are reduced to accelerate training and inference speed, and the learning ability of the light-weight model is enhanced by structural parameterization and reverse parameterization techniques. Then, for model training, the loss function guided by VSM is used to achieve model self-supervised training. Finally, at the end of the training, the image reconstruction branch is deleted. So the final light-weight model is obtained by the fusion of convolution parameters. Experiments show that the light-weight network can not only ensure image fusion quality but greatly improve the speed, making its porting in mobile terminals possible.
Abstract: Primary healthcare providers lack the ability to assess the risk of vaccination for children with certain illnesses. It is a viable solution to developing a risk prediction model for pediatric vaccination, by leveraging the experience of healthcare professionals in tertiary hospitals, to assist primary healthcare providers in swiftly identifying high-risk pediatric patients. This study proposes an intelligent method for vaccine recommendations based on a knowledge graph. Firstly, a method for medical named entity recognition called ELECTRA-BiGRU-CRF, based on pre-trained language models, is proposed for named entity extraction from outpatient electronic medical records. Secondly, a vaccination ontology is designed, with relationships and attributes defined, to construct a Chinese childhood vaccination knowledge graph based on Neo4j. Finally, a method for vaccine recommendations guided by significant categories using pre-trained language models is proposed based on the constructed knowledge graph. Experimental results indicate that the proposed methods can provide diagnostic assistance to physicians and offer support for deciding whether vaccines can be administered to children with certain illnesses.
Abstract: Aiming at the problems of insufficient light, low contrast, and information loss in images taken by imaging devices at night or in low-light environments, an improved dark image enhancement network named RelightGAN is designed based on generative adversarial networks (GANs). It contains two discriminators and one generator, and the generator is jointly constrained by two sets of adversarial losses and cyclic losses to generate a better illumination layer. To enhance the recovery of image details during network training, a residual network is introduced to solve the gradient vanishing problem. At the same time, a hybrid attention mechanism CBAM structure is introduced to increase the generator’s attention to important information and structures in the image, enhancing network expression capability. By comparing the image enhanced by RelightGAN with those enhanced by other dark image enhancement networks, the peak signal-to-noise ratio (PSNR) of the former is improved by 12.81% and the structural similarity (SSIM) is enhanced by 5.95%. Experimental results show that the RelightGAN network combines the advantages of traditional algorithms and neural networks to improve dark scene images and image visibility.
Abstract: Vertical federated learning improves the value of data utilization by combining local data features from multiple parties and jointly training the target model without leaking data privacy. It has received widespread attention from companies and institutions in the industry. During the training process, the intermediate embeddings uploaded by clients and the gradients returned by the server require a huge amount of communication, and thus the communication cost becomes a key bottleneck limiting the practical application of vertical federated learning. Consequently, current research focuses on designing effective algorithms to reduce the communication amount and improve communication efficiency. To improve the communication efficiency of vertical federated learning, this study proposes an efficient compression algorithm based on embedding and gradient bidirectional compression. For the embedding representation uploaded by the client, an improved sparsification method combined with a cache reuse mechanism is employed. For the gradient information distributed by the server, a mechanism combining discrete quantization and Huffman coding is used. Experimental results show that the proposed algorithm can reduce the communication volume by about 85%, improve communication efficiency, and reduce the overall training time while maintaining almost the same accuracy as the uncompressed scenario.
Abstract: Most existing anomaly detection methods focus on algorithm efficiency and accuracy while overlooking the interpretability of detected anomalous objects. Counterfactual explanation, a research hot spot in interpretable machine learning, aims to explain model decisions by perturbing the features of the instances under study and generating counterfactual examples. In practical applications, there may be causal relationships among features. However, most existing counterfactual-based interpretability methods concentrate on how to generate more diverse counterfactual examples, overlooking the causal relationships among features. Consequently, unreasonable counterfactual explanations may be produced. To address this issue, this study proposes an algorithm to interpret anomaly via reasonable counterfactuals (IARC) that consider causal constraints. In the process of generating counterfactual explanations, the proposed method incorporates the causality between features into the objective function to evaluate the feasibility of each perturbation, and employs an improved genetic algorithm for solution optimization, thereby generating rational counterfactual explanations. Additionally, a novel measurement metric is introduced to quantify the degree of contradiction in the generated counterfactual explanations. Comparative experiments and detailed case studies are conducted on multiple real-world datasets, benchmarking the proposed method against several state-of-the-art methods. The experimental results demonstrate that the proposed method can generate highly rational counterfactual explanations for anomalous objects.
Abstract: The partially linear model, as an important type of semiparametric regression models, is widely used across various fields due to its flexible adaptability in the analysis of complex data structures. However, in the era of big data, the research and application of this model are faced with multiple challenges, with the most critical ones being computing speed and data storage. This study considers the scenario of data streams continuously observed in the form of data blocks and proposes an online estimation method for the parameters of the linear part and the unknown function of the nonlinear part in the partially linear model. This method enables real-time estimation using only the current data block and previously computed summary statistics. To verify the effectiveness, the unit data block size and the total sample size of the data streams are changed respectively in numerical simulations, so that the bias, standard error and mean squared error between the online estimation method and the traditional one can be compared. The experiments demonstrate that, compared to the traditional method, the proposed approach offers the advantages of rapid computation and unnecessary review of historical data, while being close to the traditional method in terms of mean squared error. Finally, based on the data from the China general social survey (CGSS), this study applies the online estimation method to analyze the factors influencing the quality of life of the working-age population in China. The results indicate that full-time work within the range of 30 to 60 hours per week positively contributes to improving the quality of life, providing valuable references for relevant policy formulation.
Abstract: Action recognition is an important technology in computer vision, which can be categorized into video-based and skeleton-based action recognition according to different input data. The 3D skeleton data avoids the influence of illumination, occlusion, and other factors, yielding more accurate action descriptions. Now, human action recognition based on 3D skeleton has been paid more attention. Methods for human action recognition based on a 3D skeleton can be divided into the end-to-end black-box method and the pattern recognition-based white-box method. The black-box method in deep learning involves large parameters and can learn classification knowledge from a large amount of data. However, deep learning is difficult to explain and can only provide an overall recognition result. Compared with the black-box method, the white-box method has an explainable recognition process and an adjustable algorithm. Nevertheless, some white-box methods only focus on algorithmic improvements, using formulas to represent and classify actions, without reflecting the difference and connection among actions. Therefore, this study designs a white-box method with a visible classification process. This method uses a tree structure to organize action data hierarchically, constructing an individual classification hierarchy according to the differences between the same actions and an action classification hierarchy according to the discrepancies among different actions. Various measurement algorithms are also incorporated into the system. This study selects the nearest neighbor and dynamic time warping algorithms for experiments. The advantage of a hierarchical structure is that a variety of knowledge can be implanted to it according to various requirements so that actions can be classified from different perspectives. In the experiments, key posture knowledge and human body structure knowledge are implanted into the hierarchy structure. With the implantation of knowledge, the hierarchy structure dynamically changes.
Abstract: Intelligent tongue diagnosis is of great significance in assisting doctors in medical treatment. At present, intelligent tongue diagnosis is mainly focused on the prediction and classification of single tongue image features, making it difficult to provide substantial help in the diagnostic process. To make up for this deficiency, research of accurate prediction and classification is carried out from the level of tongue image syndrome to assist doctors in diagnosing diseases. The TUNet is used to segment the tongue, and a parallel residual network PMANet integrated with the multi-attention mechanism is proposed to classify the syndrome of tongue image. the pixel accuracy (PA), mean intersection over union (MIoU) and Dice coefficient of TUNet reach 99.7%, 98.4%, and 99.2%, respectively, improved by 3.2%, 9.0%, and 4.8% compared with the baseline U-Net. In the research of tongue image syndrome classification, PMA’s total amount of parameters is 12.34M, slightly higher than that of EfficientNet, and its total amount of floating-point calculations is 1.021G, significantly lower than all compared networks. Under the background of a lower amount of both parameters and floating-point calculations, the classification accuracy of PMANet reaches 95.7%, achieving a balance between precision, parameter amount, and floating-point calculations amount. This method provides support for the research of intelligent tongue diagnosis and is expected to promote the modernization of TCM tongue diagnosis.
Abstract: To address the issue of low accuracy and susceptibility to interference from external factors in unconstrained environments, a convolution and attention double-branch parallel feature cross-fusion gaze estimation method is proposed to enhance feature fusion effectiveness and network performance. Firstly, the Mobile-Former network is enhanced by introducing a linear attention mechanism and partial convolution. This effectively improves the feature extraction capability while reducing computing costs. Additionally, a branch of the ResNet50 head pose feature estimation network, pre-trained on the 300W-LP dataset, is added to enhance gaze estimation accuracy. A Sigmoid function is used as a gating unit to screen effective features. Finally, facial images are inputted into the neural network for feature extraction and fusion, and the 3D gaze estimation direction is outputted. The model is evaluated on the MPIIFaceGaze and Gaze360 datasets, and the average angle error of the proposed method is 3.70° and 10.82°, respectively. The network model is verified to accurately estimate the 3D gaze direction and reduce computational complexity compared to other mainstream 3D gaze estimation methods.
Abstract: Traditional sleep staging models are difficult to deploy in devices with limited computing power due to high requirements of computational resources. In this study, a lightweight sleep analysis system based on single-channel EEG signals is developed, which deploys a GhostNet-optimized neural network model named GhostSleepNet to assess sleep staging and sleep quality. Users only need to use a brain loop and connect it to this system to achieve sleep staging with high accuracy in a home environment. In this system, convolutional neural networks (CNN) are responsible for extracting higher-order features, GhostNet is designed to maintain the accuracy of CNN extracted features while reducing the parameters of the model to improve the computational efficiency, and gated recurrent unit (GRU) focuses on capturing long-term dependencies and cyclic changes in sleep data. Verification of the five classification tasks on the Sleep-EDF dataset shows that the sleep staging accuracy of GhostSleepNet reaches 84.17%, which is 3%–5% lower than that of traditional sleep staging models. However, the number of FLOPs is only 5 041 111 040, and the computational complexity decreases by 20%–45%, contributing to the development of sleep staging for mobile devices.
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.