Loading...

Table of Content

    10 September 2025, Volume 45 Issue 9 Cover Download Catalog Download
    Artificial intelligence
    Review of open set domain adaptation
    Chuang WANG, Lu YU, Jianwei CHEN, Cheng PAN, Wenbo DU
    2025, 45(9):  2727-2736.  DOI: 10.11772/j.issn.1001-9081.2024091277
    Asbtract ( )   HTML ( )   PDF (859KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As a critical technique in transfer learning, domain adaptation addresses the issue of different distributions in training and test datasets well. However, traditional domain adaptation methods are typically limited to scenarios where the target-domain and source-domain datasets are with same number and types of categories. In practical applications, these scenarios are often difficult to meet. Open Set Domain Adaptation (OSDA) emerges to address this challenge. In order to fill the gap in this field and provide a reference for related research, a summary and analysis of OSDA methods emerged in recent years were conducted. Firstly, the related concepts and basic structure were introduced. Secondly, the related typical methods were sorted out and analyzed from three stages: data augmentation-oriented, feature extraction-oriented, and classifier-oriented. Finally, future development directions of OSDA were prospected.

    Survey of statistical heterogeneity in federated learning
    Hao YU, Jing FAN, Yihang SUN, Hua DONG, Enkang XI
    2025, 45(9):  2737-2746.  DOI: 10.11772/j.issn.1001-9081.2024091316
    Asbtract ( )   HTML ( )   PDF (2650KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated learning is a distributed machine learning framework that emphasizes privacy protection. However, it faces significant challenges in addressing statistical heterogeneity. Statistical heterogeneity is come from differences in data distribution across participating nodes, which may lead to problems such as model update biases, performance degradation of the global model, and instability in convergence. Aiming at the above problems, firstly, main issues caused by statistical heterogeneity were analyzed in detail, including inconsistent feature distributions, imbalanced label distributions, asymmetrical data sizes, and varying data quality. Secondly, a systematic review of the existing solutions of statistical heterogeneity in federated learning was provided, including local correction, clustering methods, client selection optimization, aggregation strategy adjustments, data sharing, knowledge distillation, and decoupling optimization, with an evaluation of their advantages, disadvantages, and applicable scenarios. Finally, future related research directions were discussed, such as device computing capacity awareness, model heterogeneity adaptation, optimization of privacy security mechanisms, and enhancement of cross-task transferability, thereby providing references for addressing statistical heterogeneity in practical applications.

    Learning behavior boosted knowledge tracing model
    Wei ZHANG, Zhongwei GONG, Zhixin LI, Peihua LUO, Lingling SONG
    2025, 45(9):  2747-2754.  DOI: 10.11772/j.issn.1001-9081.2024081153
    Asbtract ( )   HTML ( )   PDF (1516KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing Knowledge Tracing (KT) models fail to effectively utilize information about learning behaviors and ignore the differences in contributions of different learning behaviors to question-answering performance. For this reason, a Learning Behavior Boosted Knowledge Tracing (LBBKT) model was proposed. In this model, Gated Residual Network (GRN) was used to encode students’ learning behavior features into four context vectors and embed them into the model, thereby making full use of the learning behavior information (speed of question-answering, number of attempts, and prompts) to better model students’ learning processes. In addition, the students’ learning behavior features were weighted selectively by using variable selection network, and the interference of irrelevant features was suppressed through GRN, so as to enhance the influence of relevant features on students’ question-answering performance, thereby fully considering differential contributions of different learning behaviors to students’ question-answering performance. Experimental results on several public datasets show that LBBKT model outperforms the comparative KT models significantly in terms of prediction accuracy.

    Dual imputation based incomplete multi-view metric learning
    Penghuan QU, Wei WEI, Jing YAN, Feng WANG
    2025, 45(9):  2755-2763.  DOI: 10.11772/j.issn.1001-9081.2024081232
    Asbtract ( )   HTML ( )   PDF (1656KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In practical applications, multi-view metric learning has become an effective method for handling multi-view data. However, the incompleteness of multi-view data poses significant challenges for multi-view metric learning. Although some methods have attempted to address incomplete multi-view issue, they still have the following shortcomings: 1) most of the existing methods rely on k-Nearest Neighbors (kNN) of the existing samples to fill in missing data, and ignore unique characteristics of samples or views easily; 2) they only utilize the existing sample representations to calculate neighbors, and cannot fully express neighbor relationships between samples. To address these issues, a Dual imputation based Incomplete Multi-View Metric Learning method (DIMVML) was proposed. Firstly, latent features of each view were extracted using a deep autoencoder, and then missing samples were filled in by combining distribution information of samples and difference information between views. Secondly, the results were fused according to quality of the completed samples to obtain higher-quality completion results. Finally, intra-view and inter-view relationships were optimized through a loss function. Experimental results show that in clustering experiments, the proposed method achieves superior accuracy and F1 score on HandWritten, Caltech101-7, Leaves, and YouTubeFace10 datasets compared to advanced multi-view methods such as Subgraph Propagation and Contrastive Calibration (SPCC) and Latent Heterogeneous Graph Network (LHGN); in classification experiments, the proposed method outperforms other multi-view methods significantly in accuracy on CUB, ORL, and HandWritten datasets.

    Emotion recognition method compatible with missing modal reasoning
    Bing YIN, Zhenhua LING, Yin LIN, Changfeng XI, Ying LIU
    2025, 45(9):  2764-2772.  DOI: 10.11772/j.issn.1001-9081.2024091262
    Asbtract ( )   HTML ( )   PDF (1596KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of model compatibility caused by modality absence in real complex scenes, an emotion recognition method was proposed, supporting input from any available modality. Firstly, during the pre-training and fine-tuning stages, a modality-random-dropout training strategy was adopted to ensure model compatibility during reasoning. Secondly, a spatio-temporal masking strategy and a feature fusion strategy based on cross-modal attention mechanism were proposed respectively, so as to reduce risk of the model over-fitting and enhance cross-modal feature fusion effects. Finally, to solve the noise label problem brought by inconsistent emotion labels across various modalities, an adaptive denoising strategy based on multi-prototype clustering was proposed. In the strategy, class centers were set for different modalities, and noisy labels were removed by comparing the consistency between clustering categories of each modality’ features and their labels. Experimental results show that on a self-built dataset, compared with the baseline Audio-Visual Hidden unit Bidirectional Encoder Representation from Transformers (AV-HuBERT), the proposed method improves the Weighted Average Recall rate (WAR) index by 6.98 percentage points of modality alignment reasoning, 4.09 percentage points while video modality is absent, and 33.05 percentage points while audio modality is absent; compared with AV-HuBERT on public video dataset DFEW, the proposed method achieves the highest WAR, reaching 68.94%.

    Multi-scale feature fusion sentiment classification based on bidirectional cross attention
    Yiming LIANG, Jing FAN, Wenze CHAI
    2025, 45(9):  2773-2782.  DOI: 10.11772/j.issn.1001-9081.2024081193
    Asbtract ( )   HTML ( )   PDF (1855KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at limitations of the existing sentiment classification models in deep sentiment understanding, unidirectional constraints of traditional attention mechanisms, and class imbalance problem in Natural Language Processing (NLP), a sentiment classification model M-BCA (Multi-scale BERT features with Bidirectional Cross Attention) was proposed that integrates multi-scale BERT (Bidirectional Encoder Representations from Transformers) features and a bidirectional cross attention mechanism. Firstly, multi-scale features were extracted from BERT’s lower, middle, and upper layers to capture surface information, syntactic information, and deep semantic information of sentence texts. Secondly, a three-channel Gated Recurrent Unit (GRU) was utilized to further extract deep semantic features, thereby enhancing the model’s understanding ability of text. Finally, in order to promote interaction and learning between different scale features, a bidirectional cross attention mechanism was introduced, thereby strengthening interaction between multi-scale features. Additionally, to address imbalanced data issue, a data augmentation strategy was designed, and a mixed loss function was adopted to optimize the model’s learning for minority class samples. Experimental results indicate that excellent performance is achieved by M-BCA in fine-grained sentiment classification tasks. M-BCA performs significantly better than most baseline models when dealing with imbalanced multi-class sentiment datasets. Moreover, M-BCA has outstanding performance in classifying minority class samples, particularly on NLPCC 2014 and Online_Shopping_10_Cats datasets, where the Macro-Recall of M-BCA for minority classes surpasses that of all of the comparison models. It can be seen that this model achieves remarkable performance enhancements in fine-grained sentiment classification tasks and is suitable for handling imbalanced datasets.

    Judgment document summarization method combining large language model and dynamic prompts
    Binbin ZHANG, Yongbin QIN, Ruizhang HUANG, Yanping CHEN
    2025, 45(9):  2783-2789.  DOI: 10.11772/j.issn.1001-9081.2024091393
    Asbtract ( )   HTML ( )   PDF (1239KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the problems of complex case structure, redundant facts involved in cases, and wide distribution of cases in judgment documents, the existing Large Language Models (LLMs) are difficult to focus on structural information effectively and may generate factual errors, resulting in missing structural information and factual inconsistency. To this end, a judgment document summary method combining LLMs and dynamic prompts, named DPCM (Dynamic Prompt Correction Method), was proposed. Firstly, LLMs were used for single-sample learning to generate a judgment document summary. Secondly, high-dimensional similarity between the original text and the summary was calculated to detect possible missing structure or factual inconsistency problems in the summary. If a problem was found, the wrong summary was spliced with the original text, and the prompt words were added. Then, one-shot learning was performed again to correct and generate a new summary, and a similarity test was performed again. If the problem still existed, the generation and detection process would be repeated. Finally, through this iterative method, the prompt words were adjusted dynamically to optimize the generated summary gradually. Experimental results on the CAIL2020 public justice summary dataset show that compared with Least-To-Most Prompting, Zero-Shot Reasoners, Self_Consistency_Cot and other methods, the proposed method has improvements in Rouge-1, Rouge-2, Rouge-L, BERTscore, FactCC (Factual Consistency) indicators.

    Named entity recognition for sensitive information based on data augmentation and residual networks
    Li LI, Han SONG, Peihe LIU, Hanlin CHEN
    2025, 45(9):  2790-2797.  DOI: 10.11772/j.issn.1001-9081.2024081143
    Asbtract ( )   HTML ( )   PDF (1579KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Named Entity Recognition (NER) for sensitive information is a key technology of privacy protection. However, the existing NER methods face challenges in the sensitive information domain due to the scarcity of relevant datasets and the traditional techniques have problems such as low accuracy and poor portability. To address these issues, firstly, a sensitive information NER dataset, SenResume, was constructed by crawling and manually annotating text corpora containing sensitive information from the Internet. Secondly, a data augmentation model — Entity-based Masked Language Modeling (E-MLM) was proposed to utilize whole-word masking technique to generate new data samples, and expand the dataset to enhance data diversity. Thirdly, a RoBERTa-ResBiLSTM-CRF model was introduced, which combined the Robustly optimized Bidirectional Encoder Representations from Transformers approach with Whole Word Masking (RoBERTa-WWM) to extract contextual features for generating high-quality word vector representations, while ResBiLSTM (Residual Bidirectional Long Short-Term Memory) was employed to enhance text features. Finally, a multi-layer residual network was applied to improve training efficiency and model stability, and Conditional Random Field (CRF) was used for global decoding to enhance the accuracy of sequence labeling. Experimental results demonstrate that E-MLM improves dataset quality significantly, and the proposed NER model achieves the optimal performance on both the original and 1x augmented datasets, with F1 scores of 96.16% and 97.84%, respectively. It can be seen that the introduction of E-MLM and residual networks contribute to improvements in the accuracy of sensitive information NER.

    Nested named entity recognition model for wind power equipment based on differential boundary enhancement
    Dengran REN, Shuying WANG
    2025, 45(9):  2798-2805.  DOI: 10.11772/j.issn.1001-9081.2024081159
    Asbtract ( )   HTML ( )   PDF (1444KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to high nesting of entities and the characteristics of long texts in the field of wind power, a nested Named Entity Recognition model based on Differential Boundary Enhancement (DBE-NER) was proposed. Firstly, a semantic encoder module was used to obtain feature representations fusing entity’s head and tail words, entity types, and relative distances, thereby enhancing the model’s ability to capture nested semantic features. Secondly, an efficient differential semantic encoding module was designed to solve the fuzziness problem of nested entity boundaries. Thirdly, a Grouped Dilated Attention Network (GDAN) was utilized to improve the model’s effectiveness in recognizing long-text entities, nested entities, and nested boundaries. Finally, the feature score matrix was input into a span decoder to obtain positions and categories of the entities. Experimental results indicate that the F1 score of DBE-NER is improved by 0.92% and 1.07% compared to those of DiFiNet (Differentiation and Filtration Network) and CNN-NER (Convolutional Neural Network for Named Entity Recognition) models on a manually annotated dataset from a large wind power energy enterprise — WPEF dataset, and the F1 scores of DBE-NER are also increased on various public datasets.

    Data science and technology
    Multivariate time series prediction method combining local and global correlation
    Xiang WANG, Zhixiang CHEN, Guojun MAO
    2025, 45(9):  2806-2816.  DOI: 10.11772/j.issn.1001-9081.2024091267
    Asbtract ( )   HTML ( )   PDF (2188KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Concerning the insufficient integration of local and global dependencies in the existing time series models, a method integrating local and global correlations for multivariate time series prediction, namely PatchLG (Patch-integrated Local-Global correlation method) was proposed. The proposed method was based on three key components: 1) segmenting the time series into multiple patches, thereby preserving the locality of the time series while making it easier for the model to capture global dependencies; 2) utilizing the depthwise separable convolution and self-attention mechanism to model local and global correlations; 3) decomposing the time series into trend and seasonal items to perform predictions simultaneously, and combining the prediction results of these two items to obtain the final result. Experimental results on seven benchmark datasets demonstrate that PatchLG achieves average improvements of 3.0% and 2.9% in Mean-Square Error (MSE) and Mean Absolute Error (MAE), respectively, compared to the optimal baseline method PatchTST (Patch Time Series Transformer), and has low actual running time and memory usage, validating the effectiveness of PatchLG in time series prediction.

    Genetic algorithm-based community hiding method in attribute networks
    Bohan ZHANG, Le LYU, Junchang JING, Dong LIU
    2025, 45(9):  2817-2826.  DOI: 10.11772/j.issn.1001-9081.2024081158
    Asbtract ( )   HTML ( )   PDF (1994KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To counteract community detection algorithms and thereby protect node privacy, community hiding methods have garnered more and more attention. However, current mainstream community hiding algorithms only focus on the network’s topological structure, neglecting the influence of node attributes on community structure, leading to bad performance on attribute networks. In response to these issues, an Attribute network Community hiding method based on Genetic algorithm (ACG) was proposed. In this method, network topological structure and node attributes were integrated, with the core of finding the optimal edge hiding strategy by optimizing a fitness function. In ACG, while minimizing hiding costs, maximizing modularity and attribute similarity was adopted as dual metric to select and perturb the set of edges with the greatest impact on community structure, thereby attacking community detection algorithms for attribute networks effectively. Experimental results demonstrate that without changing the total number of edges and attribute information, the proposed method counters mainstream attribute community detection methods effectively; compared with other community hiding methods, ACG has advantages in counteracting classic community detection algorithms on five attribute networks.

    Knowledge-aware recommendation model combining denoising strategy and multi-view contrastive learning
    Chao LIU, Yanhua YU
    2025, 45(9):  2827-2837.  DOI: 10.11772/j.issn.1001-9081.2024081225
    Asbtract ( )   HTML ( )   PDF (2114KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    A knowledge-aware recommendation model called Fusion of Denoising Strategies and Multi-View Contrastive learning (FDSMVC), was proposed to address the issues of poor noise reduction, inadequate extraction of semantic information between items, and imbalanced utilization of information in the Knowledge Graph (KG)-based recommendation models. Firstly, noise reduction was performed on the user-item interaction graph and the knowledge graph through dropping edges selectively and masking low-weight triplets with a weighted function, respectively. Secondly, random Singular Value Decomposition (SVD), cosine similarity, k-Nearest Neighbors (kNN) sparsity, and path-based graph attention network were used to construct collaborative view, semantic view between items, and structural view, respectively. Thirdly, intra-graph, local, and global contrastive learnings were applied to multiple views. Finally, a multi-task strategy was applied to optimize the recommendation task and the contrastive learning task jointly, resulting in probability of user-item interactions. Experimental results show that on five real-world datasets: Book-Crossing, MovieLens-1M, Last.FM, Alibaba-iFashion, and Yelp2018, compared to the best baseline model, FDSMVC model achieves improvements of 1.06%-2.04% in Area Under the Curve (AUC) and 1.52%-2.06% in F1 score, and has the Recall@K also better than the best baseline model.

    Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition
    Hongjun ZHANG, Gaojun PAN, Hao YE, Yubin LU, Yiheng MIAO
    2025, 45(9):  2838-2847.  DOI: 10.11772/j.issn.1001-9081.2024081178
    Asbtract ( )   HTML ( )   PDF (1860KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the dynamic field of consumer electronics, understanding user behavior is crucial for product innovation and raising user satisfaction. Therefore, a groundbreaking multi-clustering method was proposed that combines deep learning with tensor decomposition to address challenges of data analysis and mining. Firstly, high-level features were extracted from complex heterogeneous datasets, such as for datasets of various sensors and user interactions in modern devices, deep neural networks were used to encapsulate diverse features of data sources. Secondly, tensor decomposition techniques were applied to feature extraction and clustering analysis, thereby treating each data source as a different modality within a data tensor to reveal latent structure and patterns of the data source. Finally, experiments were carried out on a dataset obtained in collaboration with an e-commerce platform, covering tens of thousands of customers. Empirical results demonstrate that the proposed tensor decomposition algorithm integrated with Convolutional Neural Network (CNN) performs well on consumer electronics-related datasets, with all accuracies over 0.7 and outstanding scores in key metrics such as purity, Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI), confirming effectiveness of the proposed method in capturing intrinsic structure and similarity of data; compared with existing methods such as Dynamic Multi-Clustering Routine (DMCR) method, Deep Multi-Modal Clustering (DMMC) method, and FAST-CNN, the proposed method shows significant advantages on multiple evaluation metrics, verifying its superiority over the comparative methods in terms of accuracy and stability, and its advantages in uncovering underlying data principles and interrelationships between heterogeneous data.

    Cyber security
    Research advances in blockchain consensus mechanisms and improvement algorithms
    Wei GAO, Lihua LIU, Bintao HE, Fang’an DENG
    2025, 45(9):  2848-2864.  DOI: 10.11772/j.issn.1001-9081.2024101420
    Asbtract ( )   HTML ( )   PDF (1000KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Consensus mechanism is the core of blockchain technology, and consensus algorithms are the specific technical means to achieve this mechanism. Consensus mechanism ensures consistency and correctness of blockchain database, and is crucial to system performance of the blockchain such as security, scalability and throughput. Therefore, firstly, from perspective of underlying storage of blockchain technology, consensus algorithms were divided into two categories: chain and graph, and working principles, optimization strategies and typical representative algorithms of different types of different categories of consensus algorithms were classified and reviewed. Then, in view of complex application background of blockchain, the mainstream improved algorithms of chain structure and graph structure consensus algorithms were sorted out respectively and comprehensively, and main line of consensus algorithm development was given, especially in terms of security, the algorithms were compared deeply, and advantages, disadvantages and possible security risks of them were pointed out. Finally, from multiple dimensions such as security, scalability, fairness and incentive strategy, challenges faced by the current blockchain consensus algorithms were discussed in depth, and their development trends were prospected, so as to provide theoretical reference for researchers.

    Blockchain covert communication method based on contract call concealment
    Wei SHE, Tianxiang MA, Haige FENG, Zhao TIAN, Wei LIU
    2025, 45(9):  2865-2872.  DOI: 10.11772/j.issn.1001-9081.2024091282
    Asbtract ( )   HTML ( )   PDF (1685KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues of insufficient concealment, long extraction time for hidden information, low hiding capacity, and single application scenarios in the existing blockchain covert communication schemes, a blockchain covert communication method based on Ethereum smart contracts and InterPlanetary File System (IPFS) technology was proposed. Firstly, IPFS was used to store long ciphertexts, and a combination of on-chain and off-chain storage was employed to compensate for the low efficiency and the high cost of blockchain storage. Secondly, by adopting the concepts of derivation relationships and codebooks, the separation of secret information and communication information was achieved, which meant that the on-chain data was index information of the ciphertext hash rather than the ciphertext hash itself, thereby further enhancing security. Thirdly, based on the index information, a suitable smart contract was customized, and the index information was disguised as normal contract call parameters to ensure both concealment and security. Moreover, the data field storage capacity for contract call transactions was relatively large, which could further enhance the information embedding of transactions. Finally, group encryption technology was introduced into the blockchain covert communication model to meet the multi-user interactive scenarios well. Experimental results show that the proposed method can further improve time efficiency and information embedding size, enhances concealment and security greatly, and can be applied to interaction scenarios with multiple receiving users.

    Blockchain-based identity management system for internet of things
    Sheping ZHAI, Pengju ZHU, Rui YANG, Jiayiteng LIU
    2025, 45(9):  2873-2881.  DOI: 10.11772/j.issn.1001-9081.2024081231
    Asbtract ( )   HTML ( )   PDF (1995KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the current Internet of Things (IoT) environment, Decentralized IDentifier (DID) management methods face multiple challenges, including linkage attacks, privacy leakage, and regulatory conflicts. There is an urgent need for a new scheme that can protect user privacy while meeting regulatory requirements. A DID scheme was proposed to address these issues. In the scheme, an identity system combining a main identifier and multiple pseudonymous identifiers was adopted, a dual-credential model was designed, thereby integrating plaintext credentials and encrypted credentials, and commitment and Zero-Knowledge Proof (ZKP) technologies were utilized to ensure the security of sensitive attributes and identity data. Furthermore, pseudonym mechanisms were applied to achieve unlinkability of identity information and defend against Sybil attacks effectively. Experimental results show that compared to the schemes such as WeIdentity, the proposed scheme reduces the credential generation time and proof generation time by 23% and 19%, respectively, demonstrating significant advantages in performance. It can be seen that the proposed DID scheme enhances user identity privacy protection, reduces identity leakage risks, and balances privacy protection with regulatory requirements, providing a solution for DID management in IoT environment.

    Verifiable searchable encryption scheme of fine-grained result by designated tester in cloud
    Runyu YAN, Rui GUO, Yongbo YAN, Guangjun LIU
    2025, 45(9):  2882-2892.  DOI: 10.11772/j.issn.1001-9081.2024081223
    Asbtract ( )   HTML ( )   PDF (3867KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In response to the issue that semi-trusted nature of cloud servers in searchable encryption may result in incorrect or incomplete search results, a verifiable searchable encryption scheme of fine-grained result by designated tester in cloud was proposed. In this scheme, data users were allowed to query keywords on encrypted datasets to retrieve files, and verification mechanism was combined to ensure data privacy protection and reliability of search results in cloud; by introducing Merkle Hash Tree (MHT) with Rank value and Counting Bloom Filter (CBF), correctness of dataset was verified, and accurate results were filtered out in a fine-grained way and the number of qualified files not returned was given, integrity of the dataset was ensured, dynamic updating of the dataset was implemented, and semantic security of selected keywords was proved under random oracle model. Simulation results demonstrate that compared to traditional certificateless verifiable search encryption schemes, the proposed scheme has lower computational overhead and higher execution efficiency in practical applications.

    Pseudo random number generator based on LSTM and separable self-attention mechanism
    Yilin DENG, Fajiang YU
    2025, 45(9):  2893-2901.  DOI: 10.11772/j.issn.1001-9081.2024091345
    Asbtract ( )   HTML ( )   PDF (2195KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues of poor quality and slow generation of pseudo random numbers generated by Generative Adversarial Network (GAN), a model LSA-WGAN-GP (Wasserstein GAN with Gradient Penalty based on LSTM and separable SA) based on Long Short-Term Memory (LSTM) and separable Self-Attention (SA) mechanism was proposed. In the model, the data were expanded from one-dimensional space to two-dimensional space, the representation method of data was improved, thereby enabling extraction of deeper-level features. And an innovative LSA (LSTM and separable Self-Attention) module was introduced, so as to integrate LSTM and SA mechanism to enhance irreversibility and unpredictability of the generated pseudo random numbers significantly. Additionally, the network structure was simplified, thereby reducing the model’s parameters effectively and improving the generation speed. Experimental results demonstrate that pseudo random numbers generated by LSA-WGAN-GP pass the National Institute of Standards and Technology (NIST) tests with a 100% success rate; compared to WGAN-GP (Wasserstein GAN with Gradient Penalty) and GAN, LSA-WGAN-GP improves P-values and pass rates in the frequency and universal test items; in terms of pseudo random number generation speed, LSA-WGAN-GP generates pseudo random numbers 164% and 975% faster than WGAN-GP and GAN, respectively. It can be seen that the proposed model ensures quality of the generated pseudo random number with reduced parameters and improved pseudo random number generation speed.

    Advanced computing
    Cloud-edge collaborative data storage and retrieval architecture for industrial scenarios
    Xuecheng QIN, Chunyan LIU, Bao LI, Yunlong ZHAO
    2025, 45(9):  2902-2912.  DOI: 10.11772/j.issn.1001-9081.2024070993
    Asbtract ( )   HTML ( )   PDF (3910KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Facing the scenarios of distributed storage and cross-domain circulation of data in various business domains in industrial scenarios, a cloud-edge collaborative data storage and retrieval architecture was proposed to address the problems of multiple and complex business systems, huge amounts of data, and the inability of some data to be uploaded to the cloud, aiming to achieve unified storage and efficient cross-domain circulation of large-scale data. In this architecture, a data encoding rule based on the RDF (Resource Description Framework) graph model and a multi-level efficient data storage strategy based on S-tree (Signature-tree) were designed to ensure that data that cannot be uploaded to the cloud are stored in the edge server, and data that can be uploaded to the cloud are stored in the cloud server. Besides, a cloud-edge collaborative storage-oriented index tree efficient collaborative retrieval method based on cloud-edge collaborative retrieval CECI-tree (Cloud-Edge Collaboration Index-tree) was proposed to improve efficiency of data retrieval effectively through the cloud-edge collaborative indexing mechanism. Experimental results of comparing the proposed architecture with methods such as RDF-3X and GRIN show that the proposed architecture performs better in terms of running efficiency and CPU utilization.

    PCIe bus transmission bandwidth optimization in embedded heterogeneous intelligent computing system
    Xubang YU, Jiwen WU, Hong XIA, Hao MO, Erhu ZHAO
    2025, 45(9):  2913-2918.  DOI: 10.11772/j.issn.1001-9081.2024091299
    Asbtract ( )   HTML ( )   PDF (1873KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, with development of Artificial Intelligence (AI) technology, deep learning algorithms and specialized AI processor chips are applied to edge and device data signal processing systems more and more widely. A key technical challenge is how to achieve high-bandwidth and low-latency data transmission between heterogeneous processors while enabling the system with high-level intelligent computing capabilities. Therefore, an embedded heterogeneous intelligent computing system was designed, which integrated Cambricon MLU220 chip, domestic Feiteng FT2000/4 CPU, and Xilinx XC7K325T Field Programmable Gate Array (FPGA). High-speed interconnection and data transmission between the system’s heterogeneous processors were realized through the PCIe (Peripheral Component Interconnect express) bus. In addition, a PCIe bus Scatter-Gather DMA (Direct Memory Access) transmission optimization technique under Linux was proposed, which improved the PCIe bus data transmission bandwidth between the CPU and FPGA heterogeneous processors effectively through a prefetch technique based on double buffering and an interrupt handling based on work queues. Test results of system’s image transmission show that when 10 grayscale images of 2 048×1 024 size are transferred between the CPU and FPGA heterogeneous processors via a PCIe2.0 X4 bus, the proposed system achieves read/write speeds of 1 610 MB/s and 1 655 MB/s in dual-channel DMA mode, respectively, achieving 81% and 83% of the theoretical PCIe2.0 X4 bus bandwidth, respectively. These results verify practicality and advancement of the designed system.

    Attribute reduction of fuzzy relation decision systems with two universes
    Xu LI, Zhanwei CHEN, Ruibo DONG, Juan LI
    2025, 45(9):  2919-2925.  DOI: 10.11772/j.issn.1001-9081.2024091312
    Asbtract ( )   HTML ( )   PDF (548KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at reduction problem in fuzzy relation decision systems, a fuzzy relation decision system with two universes and its attribute reduction concept were proposed by combining framework of the rough set theory with two universes. Firstly, the binary relations induced by conditional attributes and decision attributes were defined as fuzzy relations according to different universes, leading to introduction of the fuzzy relation decision system with two universes. Secondly, to obtain a deeper understanding of essence of reduction, the concept of approximate reduction in the fuzzy relation decision system with two universes was proposed. Thirdly, based on definition of approximate reduction, an discernibility matrix corresponding to approximate reduction was designed and constructed, and through proof of the discernibility matrix, discernibility matrix-based approximate reduction algorithms — LRFT and URFT were proposed. Finally, the feasibility and effectiveness of the proposed algorithms were further verified through experiments of comparing the classification accuracy metrics of the dataset before and after reduction.

    Association rule extraction method based on triadic fuzzy linguistic formal context
    Huaizhe ZHAO, Zheng YANG, Li ZOU, Yi LIU
    2025, 45(9):  2926-2933.  DOI: 10.11772/j.issn.1001-9081.2024081152
    Asbtract ( )   HTML ( )   PDF (1391KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Handling complex data in uncertain environments has been concerned for a long time. In order to solve the problems of dealing with multidimensional data in fuzzy linguistic environments and mining rules contained between attributes described by linguistic values in different domains, an association rule extraction method based on triadic fuzzy linguistic formal context was proposed. Firstly, a triadic fuzzy linguistic formal context was developed by combining a linguistic term set with triadic concept analysis theory. Subsequently, triadic fuzzy linguistic concepts were defined on the basis of derivation operators, and an incremental construction idea was applied to develop a knowledge discovery algorithm based on triadic fuzzy linguistic formal context, thereby acquiring conceptual knowledge with semantic information under fuzzy triadic relations, as the result, the relationships between concept knowledge were depicted through constructing a triadic fuzzy linguistic diagram. Finally, an association rule extraction method based on triadic fuzzy linguistic concepts was introduced to explore correlations between attributes, resulting in semantic rules with conditional constraints. Experimental results on real datasets of different domains show that the proposed method handles multidimensional data effectively in fuzzy linguistic environments, acquires conceptual knowledge with semantic information, and mines semantic rules with high credibility.

    Novel message passing network for neural Boolean satisfiability problem solver
    Yonghao LIANG, Jinlong LI
    2025, 45(9):  2934-2940.  DOI: 10.11772/j.issn.1001-9081.2024091362
    Asbtract ( )   HTML ( )   PDF (2084KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to optimize the structure of the Message Passing Neural Network (MPNN), reduce the number of iterations in the solving process, and improve performance of end-to-end neural Boolean SATisfiability problem (SAT) solvers, a More and Deeper Message Passing Network (MDMPN) was proposed. In this network, to pass more messages, an overall message passing module was introduced, thereby realizing transmission of additional overall messages from literal nodes to clause nodes during each message passing iteration. At the same time, to pass deeper messages, a message jumping module was incorporated to realize transmission of messages from the literal nodes to their second-order neighbors. To assess the performance and generalizability of MDMPN, it was applied to the state-of-the-art neural SAT solver QuerySAT and basic neural SAT solver NeuroSAT. Experimental results on the dataset of difficult random 3-SAT problems show that QuerySAT with MDMPN outperforms the standard QuerySAT with an accuracy improvement of 46.12 percentage points on difficult 3-SAT problem with 600 variables and iteration upper limit of 212; NeuroSAT with MDMPN also outperforms the standard NeuroSAT with an accuracy improvement of 35.69 percentage points on difficult 3-SAT problem with 600 variables and iteration upper limit of 212.

    Multimedia computing and computer simulation
    Dynamic dictionary learning based spatio-spectral fusion for noisy hyper-spectral images
    Jing YANG, Jianbin ZHAO, Lu CHEN, Haotian CHI, Tao YAN, Bin CHEN
    2025, 45(9):  2941-2948.  DOI: 10.11772/j.issn.1001-9081.2025040411
    Asbtract ( )   HTML ( )   PDF (3860KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional Hyper-Spectral Image (HSI) spatio-spectral fusion algorithms usually use static spectral dictionary, in which the dictionary learning and the image fusion are two separate processes, thereby giving poor performance when processing noisy spatio-spectral fusion tasks. To address this problem, a noisy HSI spatio-spectral fusion algorithm based on Dynamic Dictionary Learning (DDL) was proposed, which adopted an iterative strategy that updates dictionary atoms dynamically during the fusion process, thereby collaborating to complete the spatio-spectral fusion and noise removal tasks. Firstly, a coarse denoising was performed on the input HSI and the denoising result was utilized to initialize the spectral dictionary. Secondly, the sparse representation technique was employed to fuse the two input images with the above initialized dictionary, resulting an intermediate fusion image. Thirdly, the intermediate fusion image was fed back to the dictionary learning module to update the dictionary atoms continuously, thereby forming a dynamic spectral dictionary. Finally, by iterating the above process, the final output image was obtained. Simulation results on three remote sensing HSI datasets show that the proposed algorithm can remove noise effectively while improving spatial resolution of the images. At the same time, experimental results on real noisy image bands indicate that the proposed algorithm can improve visual quality of the fused images effectively. On Cuprite Mine dataset, the Peak Signal-to-Noise Ratio (PSNR) of the proposed algorithm is increased by 32.48% and 10.72% respectively compared to those of Generalized Tensor Nuclear Norm (GTNN) method and AL-NSSR method — the method of denoising first and then fusion, with Gaussian noise variance of 0.15 and amplification factor of 8.

    SAR and visible image fusion based on residual Swin Transformer
    Jin LI, Liqun LIU
    2025, 45(9):  2949-2956.  DOI: 10.11772/j.issn.1001-9081.2024081166
    Asbtract ( )   HTML ( )   PDF (5409KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the fusion research of Synthetic Aperture Radar (SAR) and visible images, the existing methods usually face the challenges of large modal differences, information loss and high computational complexity. Therefore, an SAR and visible image fusion algorithm based on residual Swin Transformer module was proposed. Firstly, Swin Transformer was used as the backbone to extract global features, and a full-attention feature coding backbone network was used to model remote dependencies. Secondly, in order to improve fusion effect, three different fusion strategies were designed: feature fusion strategy based on L1 norm of sequence matrix, fusion strategy based on image pyramid, and additive fusion strategy. Thirdly, the final fusion result was obtained by weighted averaging the three results, which adjusted pixel value and reduced noise of SAR image effectively, better retained clear details and structural information of visible image, and fused surface feature information of SAR image and visible image at different scales. Finally, many experiments were carried out on SEN1-2 dataset, QXS-SAROPT dataset, and OSdataset. Experimental results show that compared with the algorithms such as general image fusion framework based on convolutional neural network IFCNN, and Multi-level Decomposition based on Latent Low-Rank Representation (MDLatLRR), the proposed algorithm has better subjective visual effects with significant improvement in most objective evaluation indicators, and has excellent noise suppression and image fidelity capabilities while retaining source image features.

    Speech enhancement network driven by complex frequency attention and multi-scale frequency enhancement
    Jinggang LYU, Shaorui PENG, Shuo GAO, Jin ZHOU
    2025, 45(9):  2957-2965.  DOI: 10.11772/j.issn.1001-9081.2025030268
    Asbtract ( )   HTML ( )   PDF (2092KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The current speech enhancement methods have complex spectrum signals as the target signals, while the training networks usually adopt real-valued networks. During the training process, the real and imaginary parts of the signals are processed in parallel, which reduces accuracy of feature extraction and extracts semantic features insufficiently in complex frequency domain. To address these issues, a complex domain network based on Complex Frequency Attention and Multi-Scale Frequency Domain Enhancement (CFAFE) was proposed for speech enhancement based on the U-Net architecture. Firstly, Short-Time Fourier Transform (STFT) was used to convert the noisy speech time-series signal to the complex frequency domain. Secondly, aiming at the complex frequency domain features, a complex domain multi-scale frequency-enhancement module was designed, and a local feature mining module for enhanced noisy speech under complex frequency domain conditions was constructed, so as to enhance abilities of interference in the frequency domain and recognizing the expected signal features. Thirdly, a self-attention algorithm based on the complex frequency domain was designed on the basis of ViT (Vision Transformer), so as to achieve parallel complex frequency domain feature enhancement. Finally, comparative experiments and ablation experiments were conducted on the benchmark dataset VoiceBank+Demand, and transfer generalization experiments were carried out on the Timit dataset with Noise92 noise addition. Experimental results show that on the VoiceBank+Demand dataset, the proposed network outperforms Deep Complex Convolution Recurrent Network (DCCRN) by 16.6%, 10.9%, 44.4%, and 14.1%, respectively, in terms of Perceptual Evaluation of Speech Quality (PESQ), MOS prediction of the signal distortion (CSIG), MOS predictor of intrusiveness of background noise (CBAK), and MOS prediction of the overall effect (COVL) indicators; on the Timit+Noise92 dataset, compared with DCCRN model under -5 dB Signal-to-Noise Ratio (SNR) babble noise conditions, the proposed network has the PESQ and STOI (Short-Time Objective Intelligibility) increased by 29.8% and 5.2%, respectively.

    Noise and semantic prior guided low-light image enhancement algorithm
    Xuejin WANG, Leilei HUANG, Zhenhui ZHONG
    2025, 45(9):  2966-2974.  DOI: 10.11772/j.issn.1001-9081.2024081187
    Asbtract ( )   HTML ( )   PDF (2609KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    There are non-uniform distribution characteristics of brightness, noise, and contrast in low-light images, however, the existing Low-Light Image Enhancement (LLIE) algorithms fail to fully utilize these characteristics. As a result, issues such as detail loss, color distortion, and visual discontinuity may occur, affecting the visual quality of the images. To address these problems, a noise and semantic prior guided LLIE algorithm was proposed to consider characteristics of different regions in low-light images and their semantic information adaptively. Specifically, a novel Image block Classification based Global Feature Extraction network (ICGFE) was designed to extract global features, an Information Compensation based Local Feature Extraction network (ICLFE) was introduced to extract local features, and a noise prior-guided feature fusion strategy was proposed to perform adaptive enhancement operations on image regions with different characteristics. Furthermore, a new semantic prior-guided color loss function was presented to maintain consistency of instance colors. Experimental results on the public dataset LOL (LOw-Light dataset) show that the proposed algorithm improves the Peak Signal-to-Noise Ratio (PSNR) by 1.9%-89.1% and achieves good results in Structural SIMilarity (SSIM) compared to the algorithms such as Retinex and Underexposed Photo Enhancement using Deep illumination estimation (DeepUPE). It can be seen that the proposed algorithm can enhance image regions with different characteristics adaptively and has significant advantages in perspectives such as color restoration, detail and texture reconstruction, and noise suppression.

    Semi-supervised image dehazing algorithm based on teacher-student learning
    Panfeng JING, Yudong LIANG, Chaowei LI, Junru GUO, Jinyu GUO
    2025, 45(9):  2975-2983.  DOI: 10.11772/j.issn.1001-9081.2024091382
    Asbtract ( )   HTML ( )   PDF (2735KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Image dehazing is a hot topic in the field of computer vision. Acquiring large-scale, high-quality paired datasets from real world is costly and challenging. Consequently, existing methods use synthetic data for fully supervised deep learning model training, may lead to poor real-world performance of the model. To bridge the domain gap between synthetic and real domains, a semi-supervised image dehazing algorithm based on teacher-student learning was introduced. In this algorithm, a semi-supervised teacher-student learning with Exponential Moving Average (EMA) strategy was used to update the teacher model, and an end-to-end dehazing learning was performed, thereby addressing domain shift issues between synthetic and real data significantly and enhancing generalization performance of the model in real hazy scenarios. Experimental results demonstrate that the proposed algorithm achieves superior performance on two synthetic hazy image datasets SOTS (Synthetic Objective Testing Set) and Haze4K, as well as the real-world hazy image dataset URHI (Unannotated Real-world Hazy Images), while also delivering enhanced dehazing visual effects.

    Few-shot object detection algorithm based on new category feature enhancement and metric mechanism
    Jiaxiang ZHANG, Xiaoming LI, Jiahui ZHANG
    2025, 45(9):  2984-2992.  DOI: 10.11772/j.issn.1001-9081.2024081146
    Asbtract ( )   HTML ( )   PDF (2348KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues of low sensitivity to the feature parameters of new categories and difficulty in distinguishing category related and category unrelated parameters accurately of the existing few-shot object detection models, leading to unclear feature boundaries and category confusion, a Few-Shot Object Detection algorithm based on new categories Feature Enhancement and Metric Mechanism (FEMM-FSOD) was proposed. Firstly, a Cross-Domain Parameter perception Module (CDPM) was introduced to improve the neck network, thereby reconstructing re-weighting operations of channel and spatial features, and dilated convolution was combined with cross-stage information transfer and feature fusion to provide rich gradient information guidance and enhance the sensitivity of new category parameters. Meanwhile, an Integrated Correlated Multi-Feature module (ICMF) was constructed before Region of Interest Pooling (RoI Pooling) to establish correlation between features and optimize the fusion method of relevant features dynamically, thereby enhancing salient features. The introduction of CDPM and ICMF enhanced the feature representation of new categories effectively, so as to alleviate feature boundary ambiguity. Additionally, to further reduce category confusion, an orthogonal loss function based on metric mechanism, Coherence-Separation Loss (CohSep Loss), was proposed in the detection head to achieve intra-class feature aggregation and inter-class feature separation by measuring feature vector similarity. Experimental results show that compared to the baseline algorithm TFA (Two-stage Fine-tuning Approach), on PASCAL VOC dataset, the proposed algorithm improves the mAP50 (mean Average Precision (mAP) of new categories with threshold of 0.50) of 15 types of few-shot instance numbers by 5.3 percentage points; on COCO dataset, the proposed algorithm improves the mAP (mAP of new categories with threshold from 0.50 to 0.95) for 10shot and 30shot settings by 3.6 and 5.2 percentage points, respectively, realizing higher accuracy in few-shot object detection.

    Contextual semantic representation and pixel relationship correction for few-shot object detection
    Lili WEI, Lirong YAN, Xiaofen TANG
    2025, 45(9):  2993-3002.  DOI: 10.11772/j.issn.1001-9081.2024081227
    Asbtract ( )   HTML ( )   PDF (5735KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In few-shot object detection, as supporting samples are scarce, and the available class information is insufficient, it is particularly important to utilize feature information of limited samples effectively. By enriching usable semantic information in both supporting and query samples, a more comprehensive matching of information between query features and supporting features can be achieved. This is helpful for the model to understand target class in few-shot scenarios, thereby achieving object detection task effectively. Therefore, a model based on spatial context and pixel relationship was proposed. The spatial context module was designed to assist the pixels in constructing a local context region, thereby obtaining semantics of pixels in the region for the center pixel, and enriching the image feature information. In addition, to address the problem that spatial context introduces noisy information easily, the pixel context relationship module was designed to utilize original feature knowledge in the image to explore relationship between pixels and construct intra- and inter-class relationship maps, so as to correct the defect that spatial context module introduces noisy information easily. Experimental results demonstrate that when PASCAL VOC datasets is divided in three ways, the proposed model has the Average Precision (AP50) improved by 2.7, 2.0, and 1.3 percentage points, respectively, under 1-shot setting where samples are extremely sparse, compared to VFA (Variational Feature Aggregation); on MS COCO dataset, under 10-shot and 30-shot settings, the proposed model has the AP improved by 0.4 and 0.6 percentage points, respectively, compared to VFA, and the AP50 improved by 11.4 and 8.7 percentage points, respectively, compared to Meta FR-CNN (Meta Faster R-CNN). It can be seen that the proposed method improves the model’s ability to recognize new classes of samples by using limited feature information more effectively, which has reference value for improving generalization ability of the models in special scenarios where only a very small number of samples can be obtained.

    Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion
    Weigang LI, Jiale SHAO, Zhiqiang TIAN
    2025, 45(9):  3003-3010.  DOI: 10.11772/j.issn.1001-9081.2024091254
    Asbtract ( )   HTML ( )   PDF (2371KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing networks are difficult to learn local geometric shape information of point clouds effectively, and have problems such as being unable to focus on important feature structure effectively and insufficient fusion. Therefore, a point cloud classification and segmentation network based on Dual Attention Mechanism (DAM) and multi-scale fusion was proposed. Firstly, in the data feature extraction stage, geometric positions and weights of the convolution kernels were adjusted using Geometric Adaptive Convolution (GAC) dynamically, so that it was able to adapt to local geometric structure of the point cloud data dynamically, thereby capturing local features more effectively. Secondly, in order to further improve the feature expression ability, the DAM was introduced to learn and adjust weights of the feature channels and spatial information automatically, thereby enhancing feature representation of the key points. Finally, feature information of different scales was connected for effective fusion to enhance the feature learning effect, thereby making the final feature representation richer and improving classification and segmentation accuracy of the network. Experimental results on ModelNet40, ShapeNet and S3DIS datasets show that the proposed network increases the Overall Accuracy (OA) and mean Intersection over Union (mIoU) compared with PointNet++ and DGCNN (Dynamic Graph Convolutional Neural Network), improving the performance of point cloud classification and segmentation effectively.

    Adversarial sample embedded attention U-Net for 3D medical image segmentation
    Zhixiong XU, Bo LI, Xiaoyong BIAN, Qiren HU
    2025, 45(9):  3011-3016.  DOI: 10.11772/j.issn.1001-9081.2024081134
    Asbtract ( )   HTML ( )   PDF (1665KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) images are widely used in medical image depth segmentation. However, the traditional segmentation methods are affected by tumor boundary blurring and structural complexity, and ignore adversarial samples’ differentiation ability to the segmentation model, leading to challenges in achieving optimal segmentation results. To address these issues, a 3D medical image segmentation model with adversarial sample-embedded attention U-Net learning was proposed. In the model, by using the adversarial sample-embedded attention U-Net, adversarial samples were constructed through sample transformation, and tumor feature information was extracted from medical images; low-dimensional feature screening and high-dimensional feature fusion modules were introduced to purify the tumor distinguishable feature; the entire network was trained using a combined loss function based on cross-entropy, Dice loss, and contrastive loss to obtain a segmentation model rich in discriminative features. Experimental results show that on Nerve Sheath Tumor (NST) and Automated Cardiac Diagnosis Challenge (ACDC) datasets, the Dice Similarity Coefficients (DSCs) of the proposed method reach 88.14% and 91.75%, respectively, which are improved by 1.26 and 2.48 percentage points compared to those of not new U-Net (nnU-Net) method. It can be seen that the proposed method improves performance of 3D medical image segmentation with blurred tumor boundary effectively.

    Medical image segmentation network with content-guided multi-angle feature fusion
    Fang WANG, Jing HU, Rui ZHANG, Wenting FAN
    2025, 45(9):  3017-3025.  DOI: 10.11772/j.issn.1001-9081.2024081188
    Asbtract ( )   HTML ( )   PDF (2148KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the lack of traditional image segmentation algorithms to guide Convolutional Neural Network (CNN) for segmentation in the current field of medical image segmentation, a medical image segmentation Network with Content-Guided Multi-Angle Feature Fusion (CGMAFF-Net) was proposed. Firstly, grayscale images and Otsu threshold segmentation images were used to generate lesion region guidance maps through a Transformer-based micro U-shaped feature extraction module, and Adaptive Combination Weighting (ACW) was used to weight them to the original medical images for initial guidance. Then, Residual Network (ResNet) was employed to extract downsampled features from the weighted medical images, and a Multi-Angle Feature Fusion (MAFF) module was used to fuse feature maps at 1/16 and 1/8 scales. Finally, Reverse Attention (RA) was applied to upsample and restore the feature map size gradually, so as to predict key lesion regions. Experimental results on CVC-ClinicDB, Kvasir-SEG, and ISIC 2018 datasets demonstrate that compared to the existing best-performing segmentation multiscale spatial reverse attention network MSRAformer, CGMAFF-Net increases the mean Intersection over Union (mIoU) by 0.97, 0.78, and 0.11 percentage points, respectively; compared to the classic network U-Net, CGMAFF-Net improves the mIoU by 2.66, 8.94, and 1.69 percentage points, respectively, fully verifying the effectiveness and advancement of CGMAFF-Net.

    Frontier and comprehensive applications
    Trajectory tracking algorithm for mobile robots based on geometric model predictive control
    Songjian GU, Fuxiang WU, Xiangyang GAO, Mengjie YANG, Yibing ZHAN, Jun CHENG
    2025, 45(9):  3026-3035.  DOI: 10.11772/j.issn.1001-9081.2024091273
    Asbtract ( )   HTML ( )   PDF (3256KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the pose deviation issues faced by Wheeled Mobile Robots (WMRs) during trajectory tracking due to inaccurate positioning and unknown disturbances, an Enhanced Particle Swarm Optimization Mixer (EPSO-Mixer) algorithm based on Geometric Model Predictive Control (GMPC) was proposed, aiming to enhance trajectory tracking performance of WMRs. Firstly, an Enhanced Particle Swarm Optimization (EPSO) algorithm was proposed on the basis of Particle Swarm Optimization (PSO) to accelerate convergence and improve optimization capabilities. Secondly, EPSO was used to improve GMPC by selecting optimal tracking parameters according to the current deviation level, so as to reduce trajectory tracking errors effectively. Finally, by integrating the Multi-Layer Perceptron Mixer (MLP-Mixer) architecture, EPSO-Mixer algorithm was proposed, thereby further enhancing search capability for the global optimum and generating more adaptive control strategies. Simulation results show that compared with nonlinear model predictive control and classic GMPC algorithms, EPSO-Mixer GMPC improves trajectory tracking performance of WMRs under pose deviation conditions effectively with errors reduced by 8.0% - 82.3% and mitigating vibration issues during motion significantly. These results indicate that EPSO-Mixer algorithm provides more effective control strategies, thereby reducing the complexity and time cost of parameter adjustment, and enhancing the adaptability of trajectory tracking control significantly.

    Action recognition algorithm for ADHD patients using skeleton and 3D heatmap
    Chao SHI, Yuxin ZHOU, Qian FU, Wanyu TANG, Ling HE, Yuanyuan LI
    2025, 45(9):  3036-3044.  DOI: 10.11772/j.issn.1001-9081.2024091304
    Asbtract ( )   HTML ( )   PDF (2932KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder common in childhood, characterized by inattention, hyperactivity, and impulsivity, often exhibiting specific motion patterns. Traditional action recognition algorithms have problems such as low recognition accuracy and slow response when handling these specific actions. To address these issues, an action recognition algorithm for ADHD patients using skeleton and 3D heatmap was proposed, and spatial relationships between joints were represented using Gaussian distribution precisely, which preserved spatio-temporal information effectively. To overcome the limitations of single-modal data, a multimodal integration method based on skeleton and 3D heatmap was introduced. At the same time, the output features of Short 3D-CNN (3D Convolutional Neural Network) and Adaptive Graph Convolutional Network (AGCN) were fused to fully exploit the advantages of both modalities, thereby improving action recognition performance. Experimental results on the ADHD patient dataset collected by Mental Health Center of West China Hospital, Sichuan University, show that the proposed algorithm achieves the Top-1 recognition accuracy of 0.860 4 and the Top-5 recognition accuracy of 0.987 3 for eight different types of actions. Additionally, an automatic ADHD classification algorithm based on action types was proposed, which classified ADHD into head and facial action type, trunk action type, and limb action type, achieving the recognition accuracy of 75% and the response time of 5 seconds. Compared with two-stream AGCN (2s-AGCN) and PoseConv3D, the proposed algorithm demonstrates higher recognition accuracy in complex action scenarios, providing a new technical approach for personalized ADHD intervention.

    Sleep apnea detection based on universal wristband
    Jinyang HUANG, Fengqi CUI, Changxiu MA, Wendong FAN, Meng LI, Jingyu LI, Xiao SUN, Linsheng HUANG, Zhi LIU
    2025, 45(9):  3045-3056.  DOI: 10.11772/j.issn.1001-9081.2024081234
    Asbtract ( )   HTML ( )   PDF (2441KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Sleep apnea affects quality of life and health seriously. PolySomnoGraphy (PSG) is the “gold standard” for diagnosis of sleep apnea, but it is expensive and inconvenient for long-term monitoring. Based on the above, a new method based on universal smart wristband was proposed to detect sleep apnea conveniently. In the method, by analyzing heart rate, blood oxygen saturation, and sleep state data collected by the wristband, an adaptive physiological data reconstruction method and a data interpolation method were used to achieve noise filtering; in feature engineering, continuous physiological variables and categorical variables were fused to extract sleep state features deeply; in the classification module, a lightweight Gated Recurrent Unit (GRU) model was used to simplify the training process and reduce the risk of overfitting. Experimental results show that the proposed method obtains 93.68% accuracy and 93.97% recall on a 23-person dataset. Correlation analysis shows that blood oxygen saturation, body mass index, and age are confirmed as key features for sleep apnea detection. Compared with PSG, the proposed method is more suitable for long-term monitoring in a home environment.

    Customer churn prediction model integrating hierarchical graph neural network and specific feature learning
    Yanqun LU, Yiyi ZHAO
    2025, 45(9):  3057-3066.  DOI: 10.11772/j.issn.1001-9081.2025020202
    Asbtract ( )   HTML ( )   PDF (3097KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the severity of customer churn in the inclusive finance field and the shortcomings of the existing customer retention models in prediction accuracy and interpretability, a customer churn prediction model integrating Hierarchical Graph Neural Network (HGNN) and Specific Feature Learning (SFL), HGNN-SFLN (HGNN-SFL Network), was proposed to enhance the model’s prediction capability and understanding of feature interactions. Firstly, to address the data imbalance issue, an innovative hybrid sampling strategy was introduced, and feature-level weighted adjustments for different feature categories were implemented to ensure the effective utilization of all data types. Secondly, a hierarchical graph was utilized to strengthen correlations between different features, and an SFL module based on a self-attention mechanism was constructed to improve the model’s ability to process categorical features and analyze feature interaction relationships. Through this module, accurate identification of key features and effective capturing of complex interaction relationships between them were enabled by the model, thereby optimizing the prediction decision-making process. Experimental results demonstrate that the proposed model achieves optimal results on multiple real-world financial datasets compared to mainstream models such as Light GBM (Light Gradient Boosting Machine) and Deep Neural Network (DNN)in key indicators such as Area Under Curve (AUC). Furthermore, the proposed model has significant advantages over the comparison models in the accurate identification of critical churn-related features and the effective capturing of complex feature interaction relationships.

2025 Vol.45 No.9

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF