Journal of Computer Applications

Federated learning fairness algorithm based on personalized submodel and K-means clustering

Zhongrui JING, Xuebin CHEN, Yinlong JIAN, Qi ZHONG, Zhenbo ZHANG

2025, 45(12): 3747-3756. DOI: 10.11772/j.issn.1001-9081.2024121794

Asbtract ( )

HTML ( )

PDF (995KB) ( )

Figures and Tables | References | Related Articles | Metrics

Traditional Federated Learning （FL） does not consider collaborative fairness， leading to a mismatch between the reward obtained by the client and its actual contribution. To address this issue， a Federated learning fairness algorithm based on Personalized Submodel and K-means clustering （FedPSK） was proposed. Firstly， the neurons in the neural network were clustered according to their activation patterns， and only the importance of the cluster center neurons after clustering was evaluated. And the scores of the cluster center neurons were used to represent the scores of other neurons in the cluster， which reduced the time consumption of neuron evaluation. Then， the number of neurons and their labeling included in the client submodel were selected through hierarchical selection method， and a submodel with a complete neural network structure was constructed for each client. Finally， collaborative fairness was achieved by distributing submodels to the clients. Experimental results on different datasets show that FedPSK improves the correlation coefficient of fairness measurement by 2.70% compared with FedSAC （Federated learning framework with dynamic Submodel Allocation for Collaborative fairness）. In terms of time overhead， FedPSK reduces by at least 84.12% compared with FedSAC. It can be seen that FedPSK improves the fairness of FL algorithm， and reduces the time overhead of the algorithm execution greatly， verifying the efficiency of the proposed algorithm.

Multi-label classification method integrating external semantic knowledge

Jincai YANG, Qixu BAN, Xusheng YANG, Xianjun SHEN

2025, 45(12): 3757-3763. DOI: 10.11772/j.issn.1001-9081.2024121814

Asbtract ( )

HTML ( )

PDF (3195KB) ( )

Figures and Tables | References | Related Articles | Metrics

Text classification is regarded as a crucial task in Natural Language Processing （NLP） field， with multi-label classification becoming a challenge due to large label space. To address this issue， a multi-label classification method integrating external semantic knowledge was proposed， named HSGIN（Heterogeneous Semantic Gated Interaction Network）， using values markers in children’s books as a case study. Firstly， text features were extracted through SBERT （Sentence Embeddings from Siamese BERT （Bidirectional Encoder Representations from Transformers）） and Bidirectional Long Short-Term Memory （Bi-LSTM） network. Then， entities and relations in the Knowledge Graph （KG） were modeled jointly using a Heterogeneous Graph Transformer （HGT）， and label features were extracted using the prior knowledge and semantic associations. Finally， the attention mechanism was employed to fuse text features and label features， generating distinct label feature representations. These embeddings were fed into a Gated Graph Neural Network （GGNN） to capture semantic dependencies and interaction patterns among labels for prediction. Experimental results show that compared with the existing state-of-the-art comparison method BERT， the proposed method achieves increases of 2.66， 0.47， and 1.16 percentage points in precision， recall， and F1 score， respectively. The above experimental results verify the effectiveness of the proposed method. At the same time， precise analysis of values markers in children’s books helps choose healthy books for children.

Knowledge graph constrained question answering model based on hierarchical reinforcement learning

Haoxiang XU, Dunhui YU, Yichen DENG, Kui XIAO

2025, 45(12): 3764-3770. DOI: 10.11772/j.issn.1001-9081.2024121806

Asbtract ( )

HTML ( )

PDF (761KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of ignoring constraint information and the curse of dimensionality in long-path reasoning in Knowledge Graph Question Answering （KGQA）， a Knowledge Graph constrained Question Answering based on Hierarchical Reinforcement Learning （HRL）（KGQA-HRL） model was proposed. Firstly， the concept of HRL was integrated deeply， triples in the knowledge graph were decomposed， and a high-level policy as well as a low-level policy was designed， so as to mitigate the curse of dimensionality risk in reasoning paths. Secondly， to improve the accuracy of path selection， an attention-based action selection strategy and an entity selection strategy incorporating constraint information were introduced， thereby narrowing the search space for reasoning effectively. Thirdly， a question update phase was embedded between the action selection and entity selection strategies， thereby enabling secondary update of the question at each hop. Finally， in the entity selection strategy， a constraint set was constructed and constraint scores were calculated， so as to incorporate constraint information from the question， thereby enhancing the accuracy of entity selection. Experimental results on four KGQA benchmark datasets to evaluate the performance of KGQA-HRL model demonstrate that KGQA-HRL model achieves the optimal accuracy on all datasets， with an average improvement of 2.9% over the previous best model reinforcement learning based COnstrained PAth Reasoning （COPAR）. At the same time， KGQA-HRL model has outstanding performance in complex three-hop query tasks （3.6% improvement on the PQ （PathQuestion） dataset and 2.5% improvement on MetaQA dataset）， validating good reasoning capability of KGQA-HRL model.

Chinese semantic error recognition model based on hierarchical information enhancement

Yuqi ZHANG, Ying SHA

2025, 45(12): 3771-3778. DOI: 10.11772/j.issn.1001-9081.2024111694

Asbtract ( )

HTML ( )

PDF (615KB) ( )

Figures and Tables | References | Related Articles | Metrics

The semantic errors in Chinese differ from simple spelling and grammatical errors， as they are more inconspicuous and complex. Chinese Semantic Error Recognition （CSER） aims to determine whether a Chinese sentence contains semantic errors. As a prerequisite task for semantic review， the performance of recognition model is crucial for semantic error correction. To address the issue of CSER models ignoring the differences between syntactic structure and contextual structure when integrating syntactic information， a Hierarchical Information Enhancement Graph Convolutional Network （HIE-GCN） model was proposed to embed the hierarchical information of nodes in the syntactic tree into the context encoder， thereby reducing the gap between syntactic structure and contextual structure. Firstly， a traversal algorithm was used to extract the hierarchical information of nodes in the syntactic tree. Secondly， the hierarchical information was embedded into the BERT （Bidirectional Encoder Representations from Transformers） model to generate character features， the Graph Convolutional Network （GCN） was adopted to utilize these character features for the nodes in the graph， and after graph convolution calculation， the feature vector of the entire sentence was obtained. Finally， a fully connected layer was used for one-class or multi-class semantic error recognition. Results of semantic error recognition and correction experiments conducted on the FCGEC （Fine-grained corpus for Chinese Grammatical Error Correction） and NaCGEC （Native Chinese Grammatical Error Correction） datasets show that， on the FCGEC dataset， in the recognition task， compared with the baseline model： HIE-GCN improves the accuracy by at least 0.10 percentage points and the F1 score by at least 0.13 percentage points in the one-class error recognition； in the multi-class error recognition， the accuracy is improved by at least 1.05 percentage points and the F1 score is improved by at least 0.53 percentage points. Ablation experimental results verify the effectiveness of hierarchical information embedding. Compared with Large Language Models （LLMs） such as GPT and Qwen， the proposed model’s overall performance in recognition is significantly higher. In the correction experiment， compared to the sequence-to-sequence direct error correction model， the recognition-correction two-stage pipeline improves the correction precision by 8.01 percentage points. It is also found that in the correction process of LLM GLM4， providing the model with hints on the sentence’s error type increases the correction precision by 4.62 percentage points.

Aspect sentiment triplet extraction model with multi-view linguistic features and sentiment lexicon

Zhengyue ZHANG, Juhong PENG, Zixu DING, Xinyu FAN, Changyu HU

2025, 45(12): 3779-3785. DOI: 10.11772/j.issn.1001-9081.2024121863

Asbtract ( )

HTML ( )

PDF (729KB) ( )

Figures and Tables | References | Related Articles | Metrics

In Natural Language Processing （NLP） tasks， Aspect Sentiment Triplet Extraction （ASTE） aims to identify the relationships among aspect terms， opinion terms， and sentiment polarity in text， serving as a crucial step in realizing fine-grained sentiment analysis. In current mainstream methods， end-to-end models generally suffer from insufficient understanding of linguistic features and poor handling of the sparsity in sentiment expressions， which limits models’ accuracy and robustness. At the same time， pipeline models are prone to error propagation. To address these issues， an ASTE model with Multi-View Linguistic Features and Sentiment Lexicon （MVLF-SL） was proposed. In this model， multi-view linguistic features were utilized to enhance the model’s ability to understand context and implicit semantics， while additional prior knowledge of sentiment was provided by a sentiment lexicon. Firstly， Graph Convolutional Network （GCN） was used to represent multi-view linguistic features and obtain enhanced linguistic features. Secondly， a dynamic fusion strategy was adopted to integrate the enhanced linguistic features with the sentiment lexicon. Thirdly， multi-layer GCN was employed to enhance the feature representations of aspect and opinion terms by incorporating adjacency relations and node features. Finally， a Boundary-Driven Table-Filling （BDTF） method， improved with a Biaffine Attention （BA） mechanism， was used for decoding and extracting the triplets. Experimental results on four subsets （14res， 14lap， 15res， and 16res） of the ASTE-DATA-V2 dataset show that compared with the BDTF model， MVLF-SL has the F1-scores improved by 0.57， 2.08， 2.20， and 1.74 percentage points， respectively. It can be seen that the proposed model achieves better performance in ASTE， and fully utilizes linguistic features and external sentiment knowledge.

Multi-sentiment analysis of network public opinion and review text in food safety based on topics and blogs

Xingchen LYU, Weijun LIN, Hongxing HUANG

2025, 45(12): 3786-3795. DOI: 10.11772/j.issn.1001-9081.2024111712

Asbtract ( )

HTML ( )

PDF (907KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues that the sentiments of review texts in food safety network public opinions are various and depend on the topic and blog information， a Multi-Sentiment analysis model for review texts integrating topics and blogs named TBR-MSAM （Topic Blog Review-Multi-Sentiment Analysis Model） was proposed. Firstly， a Topic Blog Review-Feature Extraction （TBR-FE） module was constructed by using RoBERTa （Robustly optimized BERT （Bidirectional Encoder Representations from Transformers）^{pretraining approach） and deep learning models， and was used to extract contextual features of topic， blog and review information respectively. Secondly， a Topic Blog Review-Interactive Attention Feature Fusion （TBR-IAFF） module was built to conduct pairwise interactions between topic-review and blog-review to obtain interaction features and allocate weights reasonably， thereby exploring the complex relationships among topics， blogs and reviews. Thirdly， a Topic Blog Review-Cross Feature Fusion （TBR-CFF） module was constructed to conduct in-depth feature fusion on multiple pieces of information， thereby exploring users’ potential sentimental features. Finally， Softmax was used to classify the four sentiment polarities of review texts in food safety network public opinions. Experimental results on three constructed food safety network public opinion datasets show that compared to the optimal baseline model without topic and blog information， TBR-MSAM achieves at least 5.0 and 5.8 percentage points improvements in Macro-F1 and accuracy， respectively； compared to the optimal baseline model with topic and blog information， TBR-MSAM achieves the Macro-F1 and accuracy enhanced by at least 0.2 and 1.1 percentage points， respectively； compared to the optimal baseline model with topic， blog， and review text information， TBR-MSAM achieves the Macro-F1 and accuracy increased by at least 11.7 and 10.0 percentage points， respectively. The above verifies the effectiveness of TBR-MSAM in multi-sentiment classification task for food safety network public opinion.}

Multimodal named entity recognition under causal intervention

Jiana MENG, Chenhao BAI, Di ZHAO, Bolin WANG, Linlin GAO

2025, 45(12): 3796-3803. DOI: 10.11772/j.issn.1001-9081.2024111681

Asbtract ( )

HTML ( )

PDF (902KB) ( )

Figures and Tables | References | Related Articles | Metrics

Multimodal Named Entity Recognition （MNER） task aims to recognize entities with specific meanings from the joint data of text and images. However， current methods have shortcomings in dealing with the two problems of data bias and modality gap. The data bias can cause harmful biases to mislead the attention module to focus on false correlations in the training data， thereby damaging generalization ability of the model； and the modality gap will hinder the establishment of correct semantic alignment between text and image， thereby affecting performance of the model. A method of Multimodal Named Entity Recognition under Causal intervention （CMNER） was proposed to solve these two problems. In the method， causal intervention theory was utilized to use backdoor intervention in the text modality to deal with observable confounding factors， and use frontdoor causal intervention in the image modality to deal with confounding factors that cannot be observed directly， so as to mitigate the harmful effects of data bias. At the same time， the Mutual Information （MI） correlation theory was combined to shorten the semantic “distance” between text and image. The entity recognition performance of the proposed method was verified in the multimodal domain. Experimental results on the Twitter-2015 and Twitter-2017 datasets show that CMNER method has the F1-scores reached 76.00% and 88.60%， respectively. Compared with the sub-optimal method， they are improved by 0.58 and 0.53 percentage points， respectively， achieving the optimal level. It can be seen that CMNER method can alleviate data bias and reduce modality gap effectively， thereby enhancing the performance of MNER tasks.

CnnPRL： progressive representation learning method for speech emotion recognition

Yonghong FAN, Heming HUANG

2025, 45(12): 3804-3812. DOI: 10.11772/j.issn.1001-9081.2024111704

Asbtract ( )

HTML ( )

PDF (2403KB) ( )

Figures and Tables | References | Related Articles | Metrics

Speech Emotion Recognition （SER） aims to equip computers with the ability of identifying emotional states in speech signals accurately， and how to represent emotional features in speech efficiently is always a key research focus in SER. Currently， most research efforts are dedicated to leveraging deep learning methods to learn optimal features directly from raw speech or spectrogram data. This way is good at extracting more comprehensive representation information， but it may overlook the learning of more refined information within specific features and may not guarantee feature interpretability. Therefore， a Progressive Representation Learning method for SER based on Convolutional neural network （CnnPRL） was proposed to extract interpretable fine-grained emotional features progressively by using Convolutional Neural Network （CNN） based on acoustic features of speech. Firstly， interpretable shallow features were extracted manually and the optimal feature set was selected. Secondly， a cascaded CNN and dynamic fusion structure were proposed to refine the shallow features and learn deep emotional representations. Finally， a parallel heterogeneous CNN was constructed to extract complementary features at different scales to achieve multi-feature fusion by utilizing a fusion module， capture multi-granularity features， and integrate deep emotional information from different feature scales. Under the premise of ensuring time complexity， experimental results on the datasets IEMOCAP （Interactive EMOtional dyadic motion CAPture database）， CASIA （Institute of Automation， Chinese Academy of Sciences）， and EMODB （Berlin EMOtional DataBase） demonstrate that compared to the comparison methods such as SpeechFormer++， TLFMRF （Two-Layer Fuzzy Multiple Random Forest） and TIM-Net （Temporal-aware bI-direction Multi-scale Network）， CnnPRL achieves at least 0.86， 2.92， and 1.46 percentage points improvement in Weighted Average Recall （WAR） index， respectively， proving the effectiveness of CnnPRL. Ablation experimental results demonstrate that each module of CnnPRL contributes the overall performance improvement of the model.

Open-set source cell-phone identification method based on feature interaction and representation enhancement

Feng YUE, Yang PENG, Zhaopin SU, Guofu ZHANG, Chensi LIAN, Bo YANG, Zhen FANG

2025, 45(12): 3813-3819. DOI: 10.11772/j.issn.1001-9081.2024121815

Asbtract ( )

HTML ( )

PDF (759KB) ( )

Figures and Tables | References | Related Articles | Metrics

Multimedia forensics tasks based on cell-phone speech has always been a key research hotspot. However， the existing speech-based cell-phone identification tasks are all confined to the closed-set mode， which means that the training set and the test set share the same category set， which cannot guarantee the recognition accuracy for cell-phones of unknown categories， leading to the difficulty in applications of the existing methods to the unknown cell-phones. Therefore， an Open-set Source Cell-phone Identification method based on Feature interaction and representation enhancement （FireOSCI） was proposed. Firstly， a global information extraction block named GlobalBlock was designed on the basis of the multi-head attention block Fastformer for better capturing the global information from the whole speech sample and obtaining rich device feature information. Secondly， a local feature extraction block named LocalBlocks was presented on the basis of SE-Res2Block （Squeeze-Excitation Res2Block） to focus on enhancing cell-phone information related features and suppressing the features that are not related to the source cell-phone identification. Thirdly， an attention mechanism based feature fusion mechanism was designed to fuse global features with multi-layer local features deeply. Finally， a source cell?phone confirmation network was designed on the basis of attention pooling to improve the recognition accuracy in open-set mode. Comparison experimental results on cell-phone speech dataset with 13 different cell-phone brands and 86 different cell-phone models show that the proposed method can achieve identification of unknown categories of cell-phones， and provide a referable technical solution for the open-set recognition of speech-based source cell-phones.

Scientific document summarization model based on multi-graph neural network and graph contrastive learning

Hongyan ZHAO, Lihua GUO, Chunxia LIU, Riyun WANG

2025, 45(12): 3820-3828. DOI: 10.11772/j.issn.1001-9081.2024121751

Asbtract ( )

HTML ( )

PDF (795KB) ( )

Figures and Tables | References | Related Articles | Metrics

Long document summarization generation faces challenges such as capturing inter-sentence relationships， long-range dependencies， and efficient encoding and extraction of document information， which has always been a difficult task in the field of natural language processing. At the same time， scientific documents， characterized by multiple chapters and paragraphs with complex hierarchical structures， further increases the difficulty of the summarization task of scientific documents. To address these issues， a scientific document Summarization model based on Multi-Graph Neural Network （GNN） and Graph Contrastive Learning （GCL）（MGCSum） was proposed. Firstly， for a given input document， intra-sentence and inter-sentence relationships were modeled by using homogeneous GNN and heterogeneous GNN， respectively， so as to generate initial sentence representations. Secondly， these sentence representations were fed into a multi-head HyperGraph ATtention network （HGAT）， where self-attention mechanisms were used to fully capture relationships between nodes and edges， thereby updating and learning inter-sentence representations. Thirdly， a GCL module was introduced to enhance global topic awareness， thereby improving the semantic consistency and discriminability of sentence representations. Finally， a Multi-Layer Perceptron （MLP） and a normalization layer were applied to calculate a score for determining whether a sentence should be selected for summarization. Experimental results on the PubMed and ArXiv datasets indicate that the MGCSum outperforms most baseline models. Specifically， on the PubMed dataset， MGCSum achieves the ROUGE-1， ROUGE-2， and ROUGE-L of 48.97%， 23.15%， and 44.09%， respectively， with improvements of 0.20， 0.71， and 0.26 percentage points， respectively， compared to the existing state-of-the-art model HAESum （Hierarchical Attention graph for Extractive document Summarization）. It can be seen that by integrating multi-GNN and GCL， MGCSum captures hierarchical structural information and inter-sentence relationships more effectively， enhancing the accuracy and semantic consistency of summarization， and demonstrating its advantages in scientific document summarization tasks.

Metaphor detection model based on linguistic multi-incongruity

Tianlong ZHENG, Rui DONG, Yating YANG, Bo MA, Lei WANG, Xi ZHOU

2025, 45(12): 3829-3838. DOI: 10.11772/j.issn.1001-9081.2024121797

Asbtract ( )

HTML ( )

PDF (1388KB) ( )

Figures and Tables | References | Related Articles | Metrics

A metaphor detection model based on linguistic multi-incongruity was proposed to tackle the metaphor occurrence problem caused by the incongruity between the target sentence meaning and the core meaning of the target word in a specific context where a target word has multiple semantic meanings （polysemy）， which is ignored by the existing metaphor detection research. Firstly， in the feature encoding module， two separate encoders were employed to encode the feature information such as the target sentence meaning， the core meaning of the target word， and its contextual meaning. Then， in the multi-incongruity modeling module， three linguistic methods — Selectional Preference Violation （SPV）， Metaphor Identification Procedure （MIP）， and Semantics Usage Comparison （SUC） — were utilized to conduct unified modeling of incongruity features. Finally， metaphor detection was performed through a metaphor identification module. Furthermore， to validate Chinese metaphor detection performance， a Chinese word-level metaphor detection dataset named META-ZH was constructed through a data annotation method of combining LoRA （Low-Rank Adaptation） fine-tuned Large Language Model （LLM） with manual correction. Experimental results show that the proposed model achieves F1 values improvement of 0.8， 1.3， 1.5， and 2.3 percentage points， respectively， compared to the optimal baseline model on the VUA All， VUA Verb， MOH-X， and META-ZH metaphor detection datasets. It can be seen that the proposed model enhances performance in metaphor detection by fully utilizing linguistic multi-incongruity.

CovMW-net： robust text matching method based on meta-weight network

Dongwei ZHANG, Zheng YE, Jun GE

2025, 45(12): 3839-3846. DOI: 10.11772/j.issn.1001-9081.2024121841

Asbtract ( )

HTML ( )

PDF (719KB) ( )

Figures and Tables | References | Related Articles | Metrics

In text matching tasks， the complexity and diversity of textual data often lead to issues of lacking robustness during training. Traditional methods to address the lack of robustness， such as data augmentation and regularization， can be effective， but are often only applicable to specific types of noise or disturbances， and require a lot of computational resources. Therefore， a method based on Meta-Weight network （MW-net） — Meta-Weight network improved by the Covariance matrix （CovMW-net） was proposed. Firstly， the weight parameters and loss functions were adjusted by learning adaptively， thereby realizing rapid and reasonable weight distribution. Then， by controlling the weights of samples， the impacts of samples on training effects were magnified or diminished， and ultimately the training robustness was enhanced. The meta-learning framework of MW-net was inherited by CovMW-net， thereby saving computational resources. At the same time， by CovMW-net， through incorporating covariance matrices， deep feature extraction for samples in each category was conducted， and the covariance matrices of these features were calculated to measure minority class data， thereby mitigating the negative impacts of long-tail distributions caused by random sampling from meta-datasets in MW-net. Experimental results on the Clothing1M dataset show that CovMW-net outperforms the original method MW-net by 0.86 percentage points in accuracy and outperforms all comparative methods. In addition， on the Large-scale Chinese Question Matching Corpus （LCQMC） and Baidu Question-answer matching dataset （BQ）， CovMW-net has the accuracy improvements between 4 and 6 percentage points mostly compared to the baseline. It can be seen that CovMW-net is effective in dealing with biases in meta-datasets and is feasible for application in research on the robustness of text matching.

Multi-label text classification method of power customer service work orders integrating feature enhancement and contrastive learning

Jing ZHOU, Zhenyang TANG, Hui DONG, Xin LIU

2025, 45(12): 3847-3854. DOI: 10.11772/j.issn.1001-9081.2024121747

Asbtract ( )

HTML ( )

PDF (1371KB) ( )

Figures and Tables | References | Related Articles | Metrics

Multi-Label Text Classification （MLTC） of power customer service work orders has important significance for enhancing service efficiency and user satisfaction. Aiming at the problems of insufficient modeling of label relationships and class imbalance in MLTC of power customer service work orders， an MLTC method of power customer service work orders integrating feature enhancement and contrastive learning was proposed. Firstly， the text features of customer service work orders were extracted by using a pre-trained language model. Then， an innovative text feature enhancement method was developed by integrating global encoding module of multi-head attention mechanism and local encoding module of Convolutional Neural Network （CNN）. Finally， an MLTC framework of contrastive learning enhanced K-Nearest Neighbor （KNN） algorithm was introduced， positive samples were generated by using the R-Drop （Regularized Dropout） method， while negative samples were reweighted， and supervised contrastive learning loss function was incorporated during training to enhance the quality of neighbors retrieved by the KNN mechanism during inference， thereby effectively mitigating the negative impact of sample imbalance. Experimental results indicate that the proposed method achieves a micro-averaged F1 score of 92.17% on the power customer service work order dataset， surpassing the BERT （Bidirectional Encoder Representations from Transformers） model by 1.62 percentage points. Additionally， the proposed method achieves the micro-averaged F1 scores of 75.2% and 88.5%， respectively， on the public MLTC datasets AAPD and RCV1-V2， demonstrating the application value of improving work order processing accuracy and the service effectiveness in complex MLTC tasks.

High-frequency enhanced time series prediction model based on multi-layer perceptron

Changsheng ZHU, Chen YANG, Wenfang FENG, Peiwen YUAN

2025, 45(12): 3855-3863. DOI: 10.11772/j.issn.1001-9081.2024121818

Asbtract ( )

HTML ( )

PDF (1361KB) ( )

Figures and Tables | References | Related Articles | Metrics

The prediction quality of simple linear models for time series forecasting often surpasses that of deep models such as Transformers. However， on datasets with a large number of channels， deep models， particularly Multi-Layer Perceptron （MLP）， can outperform simple linear models. Aiming at the differences in error power spectrum between simple linear models and MLPs in time series forecasting， an High-frequency enhanced time series prediction model based on multi-layer perceptron — HiFNet （High-Frequency Network） was proposed. Firstly， the fitting capability of MLPs within low-frequency bands was utilized. Then， the Adaptive Series Decomposition （ASD） module and the grouped linear layer were adopted to address the overfitting issue of MLPs in high-frequency bands and the issue of channel independence strategy failing to handle the channel redundancy effectively， thereby enhancing the robustness of MLPs in high-frequency band. Finally， experiments were conducted to HiFNet on standard datasets in the fields of meteorology， power， and transportation. The results demonstrate that the Mean Squared Error （MSE） of HiFNet is reduced by up to 23.6%， 10.0%， 35.1%， and 6.5%， respectively， compared to those of NLinear， RLinear， SegRNN （Segment Recurrent Neural Network）， and PatchTST （Patch Time Series Transformer）. At the same time， the grouped linear layer alleviates the impact of the channel redundancy by learning low-rank representations related to channels.

Anomaly detection method based on cumulative probability fluctuation and automated clustering

Jun ZENG, Yinghua TONG, Defang WANG

2025, 45(12): 3864-3871. DOI: 10.11772/j.issn.1001-9081.2024121792

Asbtract ( )

HTML ( )

PDF (1735KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the increasing complexity of multidimensional data features， the existing anomaly detection methods have limitations in capturing feature distribution. At the same time， traditional clustering and statistical methods encounter greater challenges in parameter selection， which limit the improvement of detection performance together. To address this issue， an anomaly detection method based on cumulative probability fluctuation and automated clustering was proposed. Firstly， cumulative probability fluctuation of the features was calculated to represent the Gaussian mixture distribution of the features， and the features were compression transformed according to the cumulative probability fluctuation values. Secondly， deep reinforcement learning was employed to search optimal clustering parameters in Density-Based Spatial Clustering of Applications with Noise （DBSCAN）， and the compression transformed dataset was clustered. Finally， the clustering results of the data were combined with the cumulative probability fluctuation values of the data features to determine data point anomalies. Experimental results show that the average precision， recall， F1-score， and Area Under ROC （Receiver Operating Characteristic） Curve （AUC） of the proposed method on six experimental datasets are 36.39%， 2.73%， 14.90%， and 4.84% higher than those of the best performing method among the comparison methods. It can be seen that the proposed method improves the comprehensive performance of anomaly detection for data with multi-dimensional complex features effectively without selecting parameters manually.

Review of blockchain technology applications in carbon emission trading system

Jingfeng WEI, Zhongyuan YAO, Shuosen MA, Chao WANG, Shangkun GUO, Ziqiang ZHU, Xueming SI

2025, 45(12): 3872-3880. DOI: 10.11772/j.issn.1001-9081.2024121837

Asbtract ( )

HTML ( )

PDF (656KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the problems of insufficient transaction transparency， poor data security， and low market efficiency in Carbon Emission Trading （CET）， optimization methods of blockchain technology in CET systems were reviewed. Firstly， the specific applications of blockchain technology in CET were summarized， including key technologies and main contributions. Secondly， an in-depth analysis of the challenges faced by carbon emission management and trading was conducted， such as data privacy protection and system scalability problems， and optimization methods of blockchain technology were explored. Thirdly， the integrated application of blockchain and Internet of Things （IoT） technologies was studied to achieve real-time collection and secure storage of carbon emission data. Finally， the application of game theory in the formulation of CET strategies and the innovative practices of blockchain technology in carbon asset management were discussed. The research results show that blockchain technology improves the data transparency and security of CET systems significantly， reduces human intervention， lowers transaction costs， and enhances market efficiency. However， challenges such as performance， privacy protection， and regulatory adaptation are still faced by the systems. In conclusion， the application of blockchain technology in CET systems not only improves the systems’ transparency， security， and efficiency， but also provides reliable technical support for achieving global carbon emission reduction goals.

Blockchain coin mixing scheme based on one-time ring signature

Yilin CHEN, Xiaoyu LI

2025, 45(12): 3881-3887. DOI: 10.11772/j.issn.1001-9081.2024121768

Asbtract ( )

HTML ( )

PDF (650KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problems of the current blockchain coin mixing system that are difficult to resist the duplicate transfer attack of users， the leakage of information by the mixing center， and the attack of forged transfer while protecting the privacy of user transactions， a blockchain coin mixing scheme based on one-time ring signature was proposed. Firstly， funds were deposited into the mixing center and a request to join a ring group was made by the user. Then， after being verified by the mixing center， a one-time ring signature was used by the user to apply for a transfer. Finally， the signature and transfer instruction were verified by the mixing center， and the transfer was executed. The nature of one-time ring signature makes it possible for the mixing center to confirm whether the signature is from the user ring group， but not from which user， and one-time ring signature can only be verified once， so while protecting the user’s privacy， the user cannot send the transfer instruction repeatedly， and the mixing center cannot forge the transfer instruction. Besides， hybrid encryption technology was used in the communication between the user and the mixing center， which effectively prevented third-party attackers breaching the sending/verification of signatures and obtaining transaction privacy. Experimental results show that the average response time of the proposed scheme increases linearly with the increase of the number of users， and the response time increases by about 10 ms for each additional ten users， and there will be no sharp decline of the system performance or even system paralysis caused by the increase in the number of users， which can support multiple users in the coin mixing network to complete the transfer efficiently and smoothly under the premise of protecting the privacy of transactions. In the case of the same number of users， the proposed scheme has the response time decreased by about 60 ms compared with CoinJoin， and decreased by about 80 ms compared with CoinShuffle， which is not much different from that of Blindcoin and Blindmixing， but has the advantages of simple implementation and high security. It can be seen that the proposed scheme has practical and application values in protecting the privacy and property security of blockchain users.

Encrypted traffic classification method based on federated prototypical incremental learning

Ruilong CHEN, Peng YI, Tao HU, Youjun BU

2025, 45(12): 3888-3895. DOI: 10.11772/j.issn.1001-9081.2024111702

Asbtract ( )

HTML ( )

PDF (1069KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep learning has been widely applied to encrypt traffic classification， but it still faces challenges such as user data privacy protection and sustainable learning capability. To address these issues， a Federated Prototypical Incremental learning method for Encrypted Traffic Classification （FPI-ETC） was proposed. During the local model training phase on the client， the Softmax classifier of the local model was replaced with a prototypical classifier to mitigate the prediction bias caused by the Softmax classifier. In the new task phase， the old class prototype vectors were utilized by the clients to generate multiple examples of the old class， thereby preventing the local model from forgetting previously learned knowledge； the class prototype vectors uploaded by the clients were weighted and aggregated by the server to achieve iterative update of the class prototypes. Experimental results indicate that when the number of tasks is 5 and the client sampling rate is 0.6， the final global accuracy of FPI-ETC on the ISCX VPN-nonVPN dataset is enhanced by 9.93 to 33.45 percentage points compared to those of the existing methods， and the final global accuracy of FPI-ETC on the USTC-TFC2016 dataset is enhanced by 5.06 to 10.92 percentage points compared to those of the existing methods， verifying that FPI-ETC can address the catastrophic forgetting problem in dynamically updated encrypted network environments effectively.

Cyber anti-mapping method based on adaptive perturbation

Chengyi WANG, Lei XU, Jinyin CHEN, Hongjun QIU

2025, 45(12): 3896-3908. DOI: 10.11772/j.issn.1001-9081.2024121733

Asbtract ( )

HTML ( )

PDF (1770KB) ( )

Figures and Tables | References | Related Articles | Metrics

The intelligent cyber mapping methods based on Deep Reinforcement Learning （DRL） model the cyber mapping process as a Markov Decision Process （MDP） and train the attacking agents using error-driven learning to identify critical network paths and obtain network topology information. However， traditional cyber anti-mapping methods are usually based on fixed rules， making them difficult to face the dynamic behavioral strategies of DRL agents during the mapping process. Therefore， a cyber anti-mapping method based on adaptive perturbation， named AIP （Adaptive Interference Perturbation）， was proposed to defend against intelligent cyber mapping attacks. Firstly， the traffic conditions were predicted by using historical traffic sequence information， the gradient information was calculated according to the differences between the predicted conditions and real traffic data， and the gradient information was used to generate adversarial perturbations， which were injected back into the original traffic samples to produce adversarial examples. Then， a feature reconstruction method combining traffic posture and routing state was adopted to optimize the sparse dictionary dynamically through iteration， thereby realizing sparse transformation of traffic data. Finally， the sparse adversarial traffic was used as the observable traffic information of the network topology， and the defense performance of the AIP method was evaluated by analyzing the changes in the link-weight distribution assigned by the mapping agent and the variations in network latency. Experimental results show that compared to traditional perturbation defense methods such as Fast Gradient Sign Method （FGSM）and Random Attack （RA）， AIP increases the attacker’s susceptibility to perturbations significantly when the network traffic intensity exceeds 25%， resulting in greater changing amplitude in the link weights of the network topology and a noticeable impact on network delay. Furthermore， compared with Static Honeypot Deployment （SHD） and Dynamic Honeypot Deployment based on Q-Learning （DHD-Q） methods， according to the comparison of delay trends， AIP demonstrates continuous confusion of attackers， making it difficult to identify critical network paths， which ensures network delays remained within a controlled range and achieves better performance in defense efficiency and stability.

Intrusion detection method with multi-stage fusion for internet of medical things

Haoqun ZHENG, Lizhi CAI, Kang YANG, Xiaoyu WANG

2025, 45(12): 3909-3915. DOI: 10.11772/j.issn.1001-9081.2024121844

Asbtract ( )

HTML ( )

PDF (822KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems that the intrusion detection methods of Internet of Medical Things （IoMT） rely on the balance of data samples， the misuse detection based on supervised learning cannot cope with unknown attacks， and the false alarm rate of anomaly detection based on unsupervised learning is high， an intrusion detection method with multi-stage fusion for IoMT was proposed. Firstly， a feature extraction method that added header information and payload to the bidirectional flow features was adopted to reduce the dependence on the balance of data samples. Then， a three-stage intrusion detection framework was designed by combining supervised and unsupervised learning methods. In the framework， the unsupervised learning AutoEncoder （AE） model was used to filter benign traffic and detect unknown attacks， and the supervised learning hybrid model of Convolutional Neural Network （CNN）， Gated Recurrent Unit （GRU）， and Attention mechanism （Attention） was used to detect known attacks and reduce false alarms， so as to improve the detection performance. Experimental results show that Multi-stage fusion for IoMT Intrusion Detection System （MTIDS） constructed by the proposed method achieves 99.96% detection accuracy and 93.78% F1 value on the CICIoMT2024 and CICIoT2023 datasets， which are higher than those of intrusion detection models of single supervised or unsupervised learning methods such as AE. Specifically， MTIDS has an improvement of 0.82 percentage points in accuracy and 5.58 percentage points in F1 value compared to the best comparison model AE， which validates the accuracy of the proposed method in detecting known and unknown attacks.

Workflow task optimization and energy-efficient offloading method for computing power network

Lin WEI, Shihao ZHANG, Mengyang HE

2025, 45(12): 3916-3924. DOI: 10.11772/j.issn.1001-9081.2024111676

Asbtract ( )

HTML ( )

PDF (1230KB) ( )

Figures and Tables | References | Related Articles | Metrics

In Computing Power Network （CPN）， User Device （UE） with limited computational capability and battery capacity rely on external computing nodes for collaborative task processing. The existing study mainly focuses on direct Workflow Task （WT） offloading， but the following key challenges are faced： 1） long waiting latency and high energy consumption caused by task dependencies； 2） high-power mode maintained by UE persistently when precursor task data required to be cached on UE； 3） complexity of offloading decisions increased by uncertainties in resource states within CPN dynamic environments； 4） the difficulty in achieving efficient balance caused by multi-objective conflicts between task completion time and energy consumption. To address these challenges， a Dynamic Optimization and Offloading for Workflow Task （DOOWT） was developed to improve energy efficiency. In the algorithm， the Workflow Structure Optimization （WSO） algorithm was utilized to rearrange the task graph， so as to reduce task waiting latencies， thereby lowering overall energy consumption； a Dynamic-Based Task Offloading （DBTO） algorithm based on Deep Deterministic Policy Gradient （DDPG） was employed to adjust offloading strategies dynamically， thereby enhancing computational performance and resource utilization in CPN. Experimental results demonstrate that compared with conventional methods such as Random unloading （Random）， the proposed method reduces the WT waiting latency by 60%， shortens the average WT completion latency by 79%， and decreases the overall energy consumption by 82%. It can be seen that this method provides theoretical and technical support for the optimization and scheduling of energy consumption-sensitive tasks.

Combined prediction model optimized by transit search algorithm

Jun YAO, Ming LIU

2025, 45(12): 3925-3930. DOI: 10.11772/j.issn.1001-9081.2024121756

Asbtract ( )

HTML ( )

PDF (3534KB) ( )

Figures and Tables | References | Related Articles | Metrics

Facing resource wastage and performance challenges in cloud platform resource scheduling， especially the low prediction accuracy of cloud resource prediction due to the difficulty in selecting hyperparameters manually for Long Short-Term Memory （LSTM） network models， a combined prediction model optimized by Transit Search （TS） algorithm named TS-ARIMA-LSTM was proposed. The combined prediction model integrates the AutoRegressive Integrated Moving Average （ARIMA） model with the LSTM model. Firstly， TS algorithm was used to optimize the hyperparameters of the LSTM model， including the neuron counts in three layers and the transmission delays. Then， the optimized LSTM model was used for preliminary prediction， and the ARIMA model was used to correct the error of the LSTM prediction. Finally， the prediction results of the ARIMA and LSTM models were combined to obtain the final prediction value. Experimental results on the public Alibaba Cloud dataset Cluster-trace-v2018 show that the proposed model optimized by the TS algorithm improves the prediction accuracy significantly compared to the traditional single prediction models ARIMA and LSTM， as well as the combined prediction model ARIMA-LSTM. Specifically， compared to the best-performing ARIMA-LSTM model among the baseline models， the proposed model has the Mean Square Error （MSE） decreased by 49.72%， the Root Mean Square Error （RMSE） decreased by 29.24%， and the Mean Absolute Error （MAE） decreased by 33.94%. It can be seen that the application of the proposed model in cloud resource prediction demonstrates high prediction accuracy， offering a new pathway for improving cloud platform task scheduling strategies.

Extremely large-scale MIMO channel estimation in hybrid field based on adaptive gradient matching pursuit algorithm

Zhanjun LIU, Yunpeng SONG, Shengbao WANG

2025, 45(12): 3931-3938. DOI: 10.11772/j.issn.1001-9081.2024121805

Asbtract ( )

HTML ( )

PDF (856KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the problem of high complexity and low accuracy in hybrid-field channel estimation faced by eXtremely Large-scale Multiple-Input Multiple-Output （XL-MIMO） systems in 6th Generation wireless communication technology （6G） networks， an Adaptive Gradient Matching Pursuit （AGMP） algorithm was proposed. Firstly， the angular-domain transformation matrix was used to estimate far-field components， and the polar-domain transformation matrix was used to estimate near-field components， thereby transforming the channel estimation problem into a sparse reconstruction problem. Then， during the component estimation process， the Least Mean Square （LMS） algorithm was combined with an adaptive gradient search strategy to optimize path component estimation through dynamic adjustment of step-size parameters， and the Minimum Mean Squared Error （MMSE） target was approximated iteratively， thereby optimizing the channel estimation process. Finally， the complete hybrid-field channel was reconstructed by using angular-domain and polar-domain transformation matrices， thereby achieving accurate hybrid-field channel estimation. Simulation results demonstrate that in low-Signal-to-Noise Ratio （SNR） environments， the proposed algorithm improves the achievable rate by 20.46% approximately compared to Orthogonal Matching Pursuit （OMP） algorithm. Furthermore， as the number of User Equipment （UE） antennas increases， the Normalized Mean Squared Error （NMSE） of the proposed algorithm is reduced by 1.2 dB approximately compared to that of OMP algorithm in multi-antenna environment. It can be seen that the proposed algorithm achieves superior estimation performance in low-SNR and multi-antenna UE environments.

Low-latency neighbor selection scheme for blockchain networks based on multi-stage propagation

Gongli LI, Xiaodi CHEN, Lu LI

2025, 45(12): 3939-3946. DOI: 10.11772/j.issn.1001-9081.2024111678

Asbtract ( )

HTML ( )

PDF (793KB) ( )

Figures and Tables | References | Related Articles | Metrics

Blockchain relies on an unstructured Peer-to-Peer （P2P） overlay network for the propagation of transactions and blocks. In this network structure， propagation is delayed， and the long-tail propagation problem is significant， which lead to inconsistencies in the information stored by nodes， that is the phenomenon of blockchain forks. Forks not only waste computational resources in the entire blockchain network， but also introduce a series of security issues. To reduce propagation delays in blockchain networks， a Neighbor Selection scheme based on Multi-stage Propagation （NSMP） was proposed to optimize the network topology by selecting neighbor nodes. Firstly， the nodes’ Outbound neighbors were divided into strong and weak propagators based on two factors： propagation ability and proximity， and different neighbor selection schemes were applied at different stages of network propagation， thereby reducing propagation hops and shortening propagation time. At the same time， the long-tail propagation problems in both existing and default schemes were further solved. Finally， the propagation ability of nodes was quantified by a fitting function based on node local characteristics， proximity information of the nodes was quantified using the Ping protocol， and the designed scheme was tested through simulation experiments using the network simulator SimBlock. Experimental results show that NSMP reduces the fork rate by 52.17% compared to the default scheme， demonstrating the feasibility and effectiveness of NSMP. Besides， according to the simulation data of experiments， the optimal parameter setting for the distribution of neighbor node proximity was determined.

Secure and reliable service function chain deployment based on encoder-decoder structured reinforcement learning

Xiang KUANG, Zhen MA, Wanchun ZHU, Zhi ZHANG, Yunfei CUI

2025, 45(12): 3947-3956. DOI: 10.11772/j.issn.1001-9081.2024111677

Asbtract ( )

HTML ( )

PDF (1035KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to allocate limited network resources in cloud computing efficiently to ensure Quality of Service （QoS） while improving resource utilization and management efficiency， a security and reliability driven Encoder-Decoder-based Deep Reinforcement Learning （ED-DRL） method was proposed for Service Function Chain （SFC） deployment. In the method， the SFC deployment was regarded as a Markov Decision Process （MDP）， a Graph ATtention network （GAT） encoder and a Gated Recurrent Unit （GRU） decoder were employed to extract network topology features and inter-node dependencies effectively， and an Asynchronous Advantage Actor-Critic （A3C） algorithm was combined to generate SFC deployment strategies dynamically. To address security and reliability requirements， the reward function was redesigned to guide the policy network in selecting optimal resources. Simulation results demonstrate that ED-DRL achieves an acceptance rate of 70.7% and an average revenue of 0.063 5， outperforming comparison methods such as Continuous-Decision scheme relying on Reinforcement Learning （CDRL）.

Point cloud filtering based on adaptive feature extraction and feature fusion

Weigang LI, Dong WANG, Yongqiang WANG, Jinling LI

2025, 45(12): 3957-3963. DOI: 10.11772/j.issn.1001-9081.2024111617

Asbtract ( )

HTML ( )

PDF (2618KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the limitations of existing filtering methods in dealing with point cloud models with high complex geometric structure， where feature blurring artifacts degrade filtering performance， a point cloud filtering network based on adaptive feature extraction and feature fusion strategy named PFRNet （Point cloud Feature Regularization fusion Network） was proposed. Firstly， the feature information between different neighborhoods was learned by the adaptive spatial feature extractor， so as to capture the local neighborhood features of different dimensions and reduce the loss of local details. Then， the global bilinear response was introduced from the local information of the point cloud through the local feature regularized fusion， and it was regularized and fused to weaken the common features of the point cloud and enhance the sharp features. Finally， the self-correlation attention decoder was used to enhance the connection between different neighborhoods in the decoding process， improve the global perception ability of the model， and better extracting local geometric features. Experimental results show that compared with Pointfilter， PFRNet reduces Chamfering Distance （CD） and Mean Square Error （MSE） of 7.45% and 4.99%，respectively. Visualization results further confirm that PFRNet generates a point cloud model closer to the real one than other methods.

Hyperspectral band selection algorithm based on Mahalanobis distance and Gibbs-Markov random field spatial filtering

Bo YUAN, Xiantong HUANG

2025, 45(12): 3964-3970. DOI: 10.11772/j.issn.1001-9081.2024121732

Asbtract ( )

HTML ( )

PDF (813KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of limited classification accuracy due to insufficient mining of regular texture features in band selection of hyperspectral remote sensing images of crop planting areas， a band selection algorithm based on Mahalanobis distance and Gibbs-Markov Random Field （GMRF） spatial filtering was proposed. Firstly， for the regular texture features commonly found in crop planting areas， spatial filtering of hyperspectral images was performed by establishing a GMRF model， which retained and strengthened the texture features while reducing noise and redundant information， and enhanced the differences between ground object features. Then， a category separability metric was established on the basis of Mahalanobis distance combined with the ratio method， the contribution value of each band to the metric was calculated， and the bands were ranked according to the contribution values， thereby the specified number of top-ranked bands were selected as the output of the algorithm. The Indian Pines hyperspectral dataset， which contains a large number of crop planting areas， was used for band selection and maximum likelihood classification experiments， and the results show that compared with the optimal performance indexes of the three reference algorithms： genetic algorithm， successive projections algorithm， and density peak clustering algorithm， the proposed algorithm’s average correlation， overall classification accuracy and Kappa coefficient were improved by 3.37%， 2.90% and 6.70%， respectively. It can be seen that the proposed algorithm integrates crop spatial texture and spectral covariance features effectively， providing a feature selection scheme with clear physical interpretation for crop classification and growth monitoring in precision agriculture.

Light-adaptive image fusion algorithm based on gradient enhancement and text guidance

Chao WEI, Wei YE, Guangjian SHENG, Lei ZHANG

2025, 45(12): 3970-3977. DOI: 10.11772/j.issn.1001-9081.2024111619

Asbtract ( )

HTML ( )

PDF (5081KB) ( )

Figures and Tables | References | Related Articles | Metrics

A light-adaptive image fusion algorithm based on gradient enhancement and text guidance was developed to address the limitations of existing fusion algorithms that cause loss of detailed information， edge degradation， and unclear salient feature under complex lighting environments. Firstly， a feature extraction module based on gradient enhancement and linear spatial equations was constructed to extract global feature with linear computational complexity along with enhancing edge gradient information. Secondly， scene description text was embedded to guide the fusion network to generate fused images of different styles in different lighting environments， so that the robustness of the fusion algorithm in complex lighting environments was improved. Finally， a Gradient Enhanced Fusion Module （GEFM） based on cross-attention mechanism was designed to achieve gradient enhancement and fusion of multimodal information. Experimental results on three benchmark datasets including TNO， MSRS （MultiSpectral Road Scenarios）， LLVIP （Low-Light Visible-Infrared Paired） demonstrate that the proposed algorithm outperforms comparative algorithms such as LRRNet（Low-Rank Representation Network）， CAMF（Class Activation Mapping Fusion）， DATFuse（Dual Attention Transformer Fusion）， UMF-CMGR（Unsupervised Misaligned Fusion via Cross-Modality image Generation and Registration） and GANMcC（GAN with Multi-classification Constraints） in five quantitative metrics. Specifically， the Spatial Frequency （SF） and Visual Information Fidelity （VIF） metrics were improved by 22%， 59%， 61% and 31%， 53%， 37%， respectively. The algorithm effectively reduces edge blurring and ensures that fused images maintain high clarity and contrast under different lighting environments.

Information compensation-based panoramic image super-resolution reconstruction network

Yu FAN, Chunyi CHEN, Xiaojuan HU, Yanfeng LI, Haiyang YU, Ripei ZHANG, Yunbiao LIU

2025, 45(12): 3978-3986. DOI: 10.11772/j.issn.1001-9081.2024111684

Asbtract ( )

HTML ( )

PDF (3226KB) ( )

Figures and Tables | References | Related Articles | Metrics

Panoramic images， due to their unique projection format， suffer from severe geometric distortions. The existing 2D image super-resolution networks fail to account for the geometric distortion characteristics of panoramic images， making them unsuitable for super-resolution reconstruction of such images. Unlike 2D super-resolution networks， panoramic image super-resolution models must focus on the feature differences across different latitude regions and address issues such as insufficient feature capture at different scales and insufficient learning of contextual information. To address the above issues， an Information Compensation-based Panoramic image Super-resolution reconstruction network （ICPSnet） was proposed. Firstly， based on the geometric characteristics of panoramic images， a position awareness mechanism was introduced to calculate the position weight of each pixel in the latitude direction， thereby enhancing the model’s attention to different latitude regions. Secondly， to address the insufficient feature extraction issue at diverse scales， a Cross-Scale Collaborative Attention （CSCA） module was designed， which utilized a multi-kernel convolutional attention mechanism of different receptive fields to obtain rich cross-scale features. Additionally， to improve quality of the reconstructed image， an Information Compensation （IC） block was designed to enhance the network’s ability to learn contextual information by improving the Atrous Spatial Pyramid Pooling （ASPP）. Experimental results on two benchmark datasets， ODI-SR and SUN360， show that when the amplification factor is 4 and 8， ICPSnet improves the Weighted-to-Spherically-uniform Peak Signal-to-Noise Ratio （WS-PSNR） by 0.14 dB， 0.64 dB， and 0.25 dB， 0.26 dB， respectively， compared to current state-of-the-art OSRT （Omnidirectional image Super-Resolution Transformer）. It can be seen that compared to other networks， ICPSnet has superior visual performance with reconstructed images better representing the texture details of high-latitude regions.

High-quality sonar image generation method based on multi-scale feature fusion

Jing HUANG, Xin PENG, Wenhao LI, Kai HU, Teng WANG, Yamin HUANG, Yuanqiao WEN

2025, 45(12): 3987-3994. DOI: 10.11772/j.issn.1001-9081.2024121742

Asbtract ( )

HTML ( )

PDF (2757KB) ( )

Figures and Tables | References | Related Articles | Metrics

Due to the inherent characteristics of sonar imaging principles and the interference of complex underwater environments， underwater sonar images generally suffer from insufficient resolution and missing target details. To address these issues， a high-quality sonar image generation method based on multi-scale feature fusion was proposed. Firstly， the Residual Dense Blocks （RDBs） were used to extract image features at shallow level， thereby capturing basic texture and contour information， and establishing spatial layout of the image. Secondly， a Multi-Scale Attention feature extraction module （MSA） was designed to focus on key features at different scales adaptively and further enhance the expression of key features while suppressing redundant information expression through the attention mechanism. Finally， a discriminator network was constructed using a pixel-by-pixel discrimination strategy based on spectral normalization， which improved the reconstruction ability of complex object contours and details. Experimental results on an underwater sonar image dataset show that the proposed method achieves relative improvements of 6.7% and 5.4%， respectively， in Peak Signal-to-Noise Ratio （PSNR） and Structural SIMilarity （SSIM） metrics compared to the existing representative method ESRGAN （Enhanced Super-Resolution Generative Adversarial Network）. It can be seen that the proposed method improves the generation performance on underwater sonar image dataset effectively.

No-reference image quality assessment algorithm based on saliency features and cross-attention mechanism

Yang DENG, Tao ZHAO, Kai SUN, Tong TONG, Qinquan GAO

2025, 45(12): 3995-4003. DOI: 10.11772/j.issn.1001-9081.2024121866

Asbtract ( )

HTML ( )

PDF (1393KB) ( )

Figures and Tables | References | Related Articles | Metrics

Image data in actual business scenarios usually presents the characteristics of rich content and complex distortion performance， which is a great challenge to the generalization of objective Image Quality Assessment （IQA） algorithms. In order to solve this problem， a No-Reference IQA （NR-IQA） algorithm was proposed， which is mainly composed of three sub-networks： Feature Extraction Network （FEN）， Feature Fusion Network （FFN）， and Adaptive Prediction Network （APN）. Firstly， the global view， local patch， and saliency view of the sample were input into the FEN together， and the global distortion， local distortion， and saliency features were extracted by Swim Transformer. Then， the cascaded Transformer encoder was used to fuse the global distortion features and local distortion features， and the potential correlation patterns of the two were explored. Inspired by the human visual attention mechanism， the saliency features were used in the FFN to activate the attention module， so that the module was able to pay additional attention to the visual salient region， so as to improve the semantic parsing ability of the algorithm. Finally， the prediction score was calculated by the dynamically constructed MultiLayer Perceptron （MLP） regression network. Experimental results on main stream synthetic and real-world distortion datasets show that compared with the DSMix （Distortion-induced Sensitivity map-guided Mixed augmentation） algorithm， the proposed algorithm improves the Spearman Rank-order Correlation Coefficient （SRCC） by 4.3% on TID2013 dataset， and the Pearson Linear Correlation Coefficient （PLCC） by 1.4% on KonIQ dataset. The proposed algorithm also demonstrates excellent generalization ability and interpretability， which can deal with the complex distortion performance in business scenarios effectively， and can make adaptive prediction according to the individual characteristics of the sample.

HG-YOLO： lightweight and high-precision enhancement pose detection network

Jiali CUI, Yongji LIU, Zihe LI, Han ZHENG

2025, 45(12): 4004-4011. DOI: 10.11772/j.issn.1001-9081.2024121819

Asbtract ( )

HTML ( )

PDF (940KB) ( )

Figures and Tables | References | Related Articles | Metrics

In human pose detection task， the existing deep learning networks have the problems of insufficient detection precision， complex network parameters and high computational cost， which seriously limit their applications. To solve these problems， a lightweight and high-precision enhancement pose detection network HG-YOLO （High-precision and Ghost YOLO） was proposed. Aiming at the problem of insufficient detection precision， the Transformer-based detection network RT-DETR （Real-Time DEtection TRansformer） was integrated into the backbone part of HG-YOLO， and the Large Separable Kernel Attention （LSKA） module was embedded into the backbone， which improved feature extraction ability of the network to cope with the complex scenarios without increasing the memory occupation and computational complexity， thus improving the human pose detection precision. Aiming at the problem of complex network parameters and high computational cost， the lightweight Ghost convolution module was introduced to replace some of the standard convolutions， and furthermore， a shared convolution detection head was designed in the detection head part of HG-YOLO， which reduced the convolution computation through the parameter and weight sharing mechanism， thus reducing number of parameters and computational complexity of the network. Experimental results on the COCO （Common Objects in Context） 2017-Keypoints dataset and the CrowdPose dataset show that compared to the benchmark YOLOv8-Pose network， HG-YOLO reduces the parameters by 32% and the floating-point operations by 18%； and when the scale is s （small）， on the COCO 2017-Keypoints dataset， HG-YOLO has the AP50 （Average Precision at OKS （Object Keypoint Similarity） of 0.50） improved by 0.8 percentage points， on the CrowdPose dataset， HG-YOLO has the AP improved by 2.9 percentage points. It can be seen that HG-YOLO is not only lightweight but also has high detection precision， which is an excellent network model in the field of human pose detection.

Hand pose estimation based on mask prompts and attention

Jianhua REN, Jiahui CAO, Di JIA

2025, 45(12): 4012-4020. DOI: 10.11772/j.issn.1001-9081.2024111715

Asbtract ( )

HTML ( )

PDF (2368KB) ( )

Figures and Tables | References | Related Articles | Metrics

Hand pose estimation is an important research direction in computer vision. Traditional methods are susceptible to complex background interference， while deep learning methods， despite being more robust， still face difficulties in multi-hand scenarios and fine-grained detail recognition. Therefore， a hand pose estimation method based on mask prompts and attention mechanisms， named HMCA（Hand Mask Prompts and Attention）， was proposed. Firstly， hand mask maps， generated via object detection and semantic segmentation， were used to suppress background noise and provide prior information. Secondly， a Parallel Attention Block （PAB） and a Multi-path Residual Block （MRB） were designed to extract multi-scale features， thereby enhancing complex hand pose recognition ability， reducing computational complexity， and preventing gradient vanishing. Thirdly， the hand mask maps were utilized to guide the model to focus on hand regions， thereby addressing issues such as multi-hand and occlusion. Finally， a penalty term was incorporated into the regression loss to constrain keypoint prediction and accelerate model convergence. Experimental results show that the proposed method outperforms other methods with best performance on both the Area Under the Curve （AUC） and the Mean Per Joint Position Error （MPJPE） under varying thresholds in single-hand， multi-hand， and occlusion scenarios. On the RHD （Rendered Handpose Dataset）， an AUC of 93.22% and a MPJPE of 2.15 are achieved under varying thresholds； on the CMU Panoptic dataset， an AUC of 91.38% and a mean hand keypoint error of 2.06 are reported under varying thresholds.

Multi-scale small target detection algorithm for UAV perspective based on channel-prior multi-scale cross-axis attention-YOLO

Hailin XIAO, Bo TIAN, Bin HU, Xiangting KONG, Yuanyuan WU, Renyu MA, Zhongshan ZHANG

2025, 45(12): 4021-4029. DOI: 10.11772/j.issn.1001-9081.2024121811

Asbtract ( )

HTML ( )

PDF (1792KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of current low accuracy issue in small target detection from Unmanned Aerial Vehicle （UAV） perspective， a multi-scale small target detection algorithm from UAV perspective based on Channel-Prior-Multi-Scale cross-axis attention-YOLO （CPMS-YOLO） was proposed. Firstly， a multi-scale attention module named CPMS （Channel-Prior Multi-Scale cross-axis attention） was incorporated into the backbone network， and the module was designed to better extract and emphasize useful features in complex backgrounds. With this module， the algorithm was able to learn the location details of the region of interest more easily and improve the feature extraction ability of small targets at different scales in complex backgrounds. Secondly， the Backbone network and feature fusion network were restructured by adding a feature layer with the enriched small target semantic information， and the fusion module MultiSEAM （Multi-scale Separated and Enhancement Attention Module） was adopted to complement contextual feature information for each other， thereby detecting and recognizing small targets better. Thirdly， a C2f-RFE （C2f-Receptive Field Enhancement） module was designed to improve the deep C2f （Faster Implementation of CSP Bottleneck with 2 convolutions） module in the Neck network， so as to expand the receptive field of the feature map， thereby realizing more accurate， faster， and multi-angle localization of target features， and thus enhancing small target detection ability. Finally， a loss function named WIoUv3 （Wise-IoU （Intersection over Union） v3） was introduced to optimize the loss weights of small targets dynamically， so as to solve the difference problem between positive and negative samples in the bounding box regression task， thereby further improving the detection ability for small targets. Experimental results on the public dataset VisDrone2019 show that compared to the baseline algorithm YOLOv8s， the proposed algorithm improves the precision， recall， mAP₅₀（mean Average Precision at IoU threshold of 50%）， and mAP_50-95 （mean Average Precision at IoU thresholds from 50% to 95%） by 5.9， 5.8， 6.3， and 3.6 percentage points， respectively. It can be seen that the multi-scale small target detection algorithm for UAV perspective based on CPMS-YOLO can capture and recognize small targets more accurately.

Pneumonia X-ray image classification model by MV2-Transformer with feature fusion

Jinru PING, Ziwen SUN

2025, 45(12): 4030-4036. DOI: 10.11772/j.issn.1001-9081.2024121745

Asbtract ( )

HTML ( )

PDF (878KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the difficulty of extracting features from lesion areas in pneumonia X-ray images and the limited lightweight degree of the existing models， a Feature Fusion MV2-Transformer （FFMV2-Transformer） pneumonia X-ray image classification model was proposed. Firstly， the lightweight network MobileNetV2 （Mobile Network Version 2） was employed as the backbone network， with the Coordinate Attention （CA） mechanism embedded in the inverted residual bottleneck blocks， so as to enhance the model’s ability to extract features from lesion areas by embedding positional information into channel information. Secondly， a Local and Global Feature Fusion Module （LGFFM） was proposed to combine local features extracted by convolutional layers with global features captured by Transformer， thereby enabling the model to capture detailed and holistic information of lesion areas simultaneously， and further improving the model’s semantic feature extraction capabilities. Finally， a Cross-layer Feature Fusion Module （CFFM） was proposed to combine the spatial information from shallow features enhanced by the spatial attention mechanism with the semantic information from deep features enhanced by the channel attention mechanism， thereby obtaining rich contextual information. To verify the model’s effectiveness， ablation experiments and comparison experiments were conducted on a pneumonia X-ray dataset. The results show that compared to MobileViT （Mobile Vision Transformer） model， FFMV2-Transformer model achieves improvements of 1.09， 0.31， 1.91， 1.08 and 0.40 percentage points in accuracy， precision， recall， F1-score and AUC （Area Under ROC （Receiver Operating Characteristic） Curve）， respectively. It can be seen that FFMV2-Transformer model extracts lesion area features from pneumonia X-ray images effectively while realizing model lightweighting.

Medical image segmentation network based on improved TransUNet with efficient channel attention

Ming DENG, Jinfan XU, Hongxiang XIAO, Xiaolan XIE

2025, 45(12): 4037-4044. DOI: 10.11772/j.issn.1001-9081.2024111673

Asbtract ( )

HTML ( )

PDF (1903KB) ( )

Figures and Tables | References | Related Articles | Metrics

Medical image segmentation plays a crucial role in clinical applications such as computer-aided diagnosis and surgical navigation， aiming to extract different organs and lesions from complex medical images accurately. However， the existing U-shaped network architecture suffers from the problems such as high information redundancy in skip connections and high computational complexity. To address these challenges， a lightweight medical image segmentation network named ES-TransUNet （Efficient channel attention and Simple-TransUNet） was proposed. In the network， the Criss-Cross Attention （CCA） mechanism was introduced in the encoder to capture long-range dependencies and the multi-head attention structure in Transformer was optimized， so as to lighten the model. Dynamic upsampling （Dysample） module was introduced in the decoder to improve upsampling efficiency. At the same time， in order to reduce the information redundancy in skip connections， the Simple COntextual Transformer （SCOT） block was introduced to filter out redundant features. Experimental results on the Synapse multi-organ segmentation and ACDC datasets demonstrate that ES-TransUNet achieves 2.37 and 1.57 percentage points improvements， respectively， in Dice Similarity Coefficient （DSC） compared to TransUNet； and reduces the Hausdorff Distance （HD） by 9.69 approximately on the Synapse dataset. Additionally， the results of comparing proposed network with state-of-the-art medical segmentation models indicate that ES-TransUNet maintains high segmentation accuracy while reducing model parameters and computational complexity significantly， and improves inference efficiency. It can be seen that ES-TransUNet is more satisfied the practical requirements in real-time medical image segmentation.

Skin lesion image segmentation based on dual-path attention mechanism and multi-scale information fusion

Sihao WANG, Duzhen ZHANG, Changchang YANG

2025, 45(12): 4045-4054. DOI: 10.11772/j.issn.1001-9081.2024111669

Asbtract ( )

HTML ( )

PDF (2248KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address issues of blurred skin lesion boundaries， hair interference， and varying lesion sizes， a skin lesion segmentation network based on dual-path attention mechanism and multi-scale information fusion was proposed. Firstly， a residual gated attention module based on depthwise separable convolution， named DGConv （Depthwise Gate Convolution）， was designed in the encoder to capture local lesion information. Secondly， a Multi-scale Contextual relationship Extraction Module （MCEM） was introduced at the bottleneck， which employed horizontal average pooling and vertical average pooling to model contextual information， and integrated multi-scale features captured by residual dilated convolution pyramid module to further enhance global lesion information understanding. Thirdly， a dual-path attention module was designed at the skip connections to refine lesion details， and a Multi-Scale feature Fusion Enhancement （MSFE） module was utilized to enrich feature details transmitted in current stage by fusing multi-stage information. Finally， a feature Fusion Module （FM） was designed in the decoder to solve receptive field mismatch problem at the same stage， progressively combining encoder outputs and feature information transferred by skip connection to achieve the final segmentation results. Experiments on the ISIC2017 （International Skin Imaging Collaboration） and ISIC2018 datasets demonstrated that the proposed network outperformed suboptimal networks in skin lesion segmentation， with the Dice improvements of 0.09 and 1.09 percentage points， respectively， and the Intersection over Union （IoU） improvements of 0.14 and 1.76 percentage points， respectively. Compared to the classical U-Net， the Dice was improved by 5.13 and 3.84 percentage points， respectively， and the IoU was improved by 7.74 and 6.04 percentage points， respectively， fully proving the advantage and effectiveness of the proposed network.

Task allocation of unmanned aerial vehicle for rural last-mile delivery based on reinforcement learning

Xiaojuan CHEN, Wei ZHANG

2025, 45(12): 4055-4063. DOI: 10.11772/j.issn.1001-9081.2024111670

Asbtract ( )

HTML ( )

PDF (1075KB) ( )

Figures and Tables | References | Related Articles | Metrics

The difficulty， long delivery time， and high cost of last-mile delivery in rural areas make efficient and accurate last-mile delivery scheduling solutions particularly important. Aiming at the task allocation problem of multiple logistics Unmanned Aerial Vehicles （UAVs） in rural distribution scenarios， a multi-objective UAV task allocation model was established by considering the payload capacity of UAVs and the maximum flight distance of UAVs comprehensively， with the goal of minimizing the flight distance， dispatched quantity of UAVs and not violating time windows. Firstly， based on reinforcement learning， to address the problem of high dimensionality in task allocation， an encoder and attention mechanism were introduced to simplify the state space effectively. Secondly， the global-local search strategy was combined to explore the solution space while avoiding getting stuck in the local optimum， thereby improving the quality of the solution. Finally， further analysis was conducted on the parameter weight settings， and the optimal combination of weight coefficients for sub-objective functions was obtained through experiments. Simulation results show that compared to the Hybrid Q-learning network based Method （HQM）， Adaptive Large Neighborhood Search algorithm （ALNS）， Q-learning algorithm （Q-learning）， and Genetic Algorithm （GA） in terms of the obtained final path length， the proposed algorithm SG-HQM （Sine and Gaussian HQM） reduced it by 8.35%， 9.88%， 10.29%， and 12.48%， respectively.

Multi-stage distribution adaptation model for cross-subject motor imagery EEG decoding

Min HE, Tianjian LUO

2025, 45(12): 4064-4072. DOI: 10.11772/j.issn.1001-9081.2024111698

Asbtract ( )

HTML ( )

PDF (1157KB) ( )

Figures and Tables | References | Related Articles | Metrics

Motor Imagery ElectroEncephaloGraph （MI-EEG） signals play a significant role in non-invasive Brain-Computer Interface （BCI） and have been utilized in clinical rehabilitation training widely. As one of the subjective paradigms， MI-EEG has high sample collection costs and large individual differences with complex time variability and low signal-to-noise ratio， so that constructing cross-subject MI-EEG decoding models have become a critical research focus. However， most of the recent cross-subject decoding models adopt the single-stage adversarial learning strategy， and only consider to learn deep representations with marginal and conditional distribution minimization， which constrain the MI-EEG decoding performance seriously. Therefore， a Multi-Stage Distribution Adaptation （MSDA） model was proposed for cross-subject MI-EEG decoding. Firstly， sample covariance was employed to align marginal distribution differences between subjects. Secondly， marginal distribution-invariant deep representations were obtained through pre-trained feature extractor and domain discriminator. Finally， a joint distribution-invariant mapping of deep representations was constructed using L²-distance， and such mapping and classifiers were trained alternately to learn joint distribution-invariant deep representations and used for cross-subject MI-EEG decoding. In MSDA model， distribution adaptation between subjects were conducted in three stages， including sample’s marginal distribution， deep representations’ marginal distribution and deep representations’ joint distribution， thereby addressing the challenge of single-stage distribution adaptation effectively. Experimental results on the BCI competition IV-2a and BCI Competition IV-2b public datasets demonstrate that MSDA model surpasses the latest decoding models in both accuracy and Kappa coefficient. The above indicates that MSDA model enhances the learning ability of cross-subject domain-invariant deep representations， which offers a new option for building MI-BCI.

Table of Content