Journal of Computer Applications

DU-FastGAN： lightweight generative adversarial network based on dynamic-upsample

Guoyu XU, Xiaolong YAN, Yidan ZHANG

2025, 45(10): 3067-3073. DOI: 10.11772/j.issn.1001-9081.2024101535

Asbtract ( )

HTML ( )

PDF (3450KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， Generative Adversarial Networks （GANs） have been widely used for data augmentation， which can solve the problem of insufficient training samples effectively and has important research significance for model training. However， the existing GAN models for data augmentation have problems such as high requirements for datasets and unstable model convergence， which can lead to distortion and deformation of the generated images. Therefore， a lightweight GAN based on dynamic-upsample — DU-FastGAN （Dynamic-Upsample-FastGAN） was proposed for data augmentation. Firstly， a generator was constructed through a dynamic-upsample module， which enables the generator to use upsampling methods of different granularities based on the size of the current feature map， thereby reconstructing textures， and enhancing overall structure and local detail quality of the synthesis. Secondly， in order to enable the model to better obtain global information flow of images， a weight information skip connection module was proposed to reduce the disturbance of convolution and pooling operations on features， thereby improving the model’s learning ability for different features， and making details of the generated images more realistic. Finally， a feature loss function was given to improve the quality of the model generation by calculating relative distance between the corresponding feature maps during the sampling process. Experimental results show that compared with methods such as FastGAN， MixDL （Mixup-based Distance Learning）， and RCL-master （Reverse Contrastive Learning-master）， DU-FastGAN achieves a maximum reduction of 23.47% in FID （Fréchet Inception Distance） on 10 small datasets， thereby reducing distortion and deformation problems in the generated images effectively， and improving the quality of the generated images. At the same time， DU-FastGAN achieves lightweight overhead with model training time within 600 min.

Multimodal adversarial example generation method for Chinese text classification

Yongping WANG, Yao LIU, Xiaolin ZHANG, Jingyu WANG, Lixin LIU

2025, 45(10): 3074-3082. DOI: 10.11772/j.issn.1001-9081.2024091307

Asbtract ( )

HTML ( )

PDF (2802KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the single important word localization method and transformation strategy in the existing Chinese text adversarial example generation methods， which leads to the problem that it is difficult to improve success rate of the attack and the quality of adversarial examples， a multimodal adversarial example generation method for Chinese text classification was proposed from the perspectives of morphology， pronunciation， and semantics of Chinese characters. In the stage of calculating word importance， the mask model and model output were used to obtain confidence probabilities， and discrete nature of the predicted word was calculated as the sensitivity of the position， and finally the two were combined to determine the perturbation priority. In the adversarial transformation stage， a multimodal attack strategy combining the phonological and semantic features of Chinese characters was designed to generate the adversarial examples， and the candidate examples were generated by the lexicon， the Convolutional Neural Network （CNN）-based character pattern similarity comparison model and the Masked Language Model （MLM）. Experimental results show that the proposed method can achieve 33.2%-65.8% attack success rate against robust BERT （Bidirectional Encoder Representations from Transformers） and RoBERTa （Robustly optimized BERT pretraining approach） models. It can be seen that the generated adversarial examples can improve the robustness of the model through adversarial training.

Federated class-incremental learning method of label semantic embedding with multi-head self-attention

Hu WANG, Xiaofeng WANG, Ke LI, Yunjie MA

2025, 45(10): 3083-3090. DOI: 10.11772/j.issn.1001-9081.2024101458

Asbtract ( )

HTML ( )

PDF (1290KB) ( )

Figures and Tables | References | Related Articles | Metrics

Catastrophic forgetting poses a significant challenge to Federated Class-Incremental Learning （FCIL）， leading to performance degradation of continuous tasks in FCIL. To address this issue， an FCIL method of Label Semantic Embedding （LSE） with Multi-Head Self-Attention （MHSA） — ATTLSE （ATTention Label Semantic Embedding） was proposed. Firstly， an LSE with MHSA was integrated with a generator. Secondly， during the stage of Data-Free Knowledge Distillation （DFKD）， the generator with MHSA was used to produce more meaningful data samples， which guided the training of client models and reduced the influence of catastrophic forgetting problem in FCIL. Experiments were carried out on the CIFAR-100 and Tiny_ImageNet datasets. The results demonstrate that the average accuracy of ATTLSE is improved by 0.06 to 6.45 percentage points compared to LANDER （Label Text Centered Data-Free Knowledge Transfer） method， so as to solve the catastrophic forgetting problem to certain extent of continuous tasks in FCIL.

Learning method of residual network for heavy-tailed noisy image classification

Zhiyu GONG, Shitong WANG

2025, 45(10): 3091-3100. DOI: 10.11772/j.issn.1001-9081.2024101407

Asbtract ( )

HTML ( )

PDF (2362KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that image classification accuracy of Residual Network （ResNet） drops due to influence of unknown heavy-tailed noise， a Multi-distribution Heavy-Tailed Noise Adaptive ResNet （MHTNA-ResNet） model was proposed. Firstly， to reduce impact of heavy-tailed noise on final predictions， a Multi-distribution Heavy-Tailed Noise Adaptive layer （MHTNA） was designed， which created noise templates using various heavy-tailed distributions to perturb clean training data， thereby enabling ResNet to get recognition capabilities for heavy-tailed noisy images through training. Secondly， MHTNA was trained adaptively， updated noise template parameters were solved by using the maximum likelihood estimation method， and the noise templates were regenerated according to these parameters， so as to ensure that the noise is always heavy-tailedly distributed. Finally， during testing， the MHTNA was abandoned and heavy-tailed noise attacks were performed to test images， thereby evaluating the model’s capability of noise resistance. Experimental results demonstrate that compared to PRIME model， the proposed model has the classification accuracy improved by an average of 3.86， 7.10 and 5.46 percentage points， respectively， on the CIFAR10， CIFAR100 and MINI-ImageNet datasets facing heavy-tailed noise attacks. It can be seen that the proposed model can improve ResNet’s robustness against heavy-tailed noise interference effectively.

Unsupervised contrastive learning for Chinese with mutual information and prompt learning

Peng HUANG, Jiayu LIN, Zuhong LIANG

2025, 45(10): 3101-3110. DOI: 10.11772/j.issn.1001-9081.2024101464

Asbtract ( )

HTML ( )

PDF (1564KB) ( )

Figures and Tables | References | Related Articles | Metrics

Unsupervised contrastive learning for Chinese faces multiple challenges： first， the structure of Chinese sentences is highly flexible and the semantic ambiguity is high， which make it difficult for models to capture deep semantic features accurately； second， on small-scale datasets， the feature-expression ability of contrastive learning models is insufficient， and effective semantic representations are hard to be learned fully； third， redundant noise may be introduced by the data augmentation process， further enhancing the instability of training. These issues limit the performance of models in Chinese semantic understanding jointly. To solve these problems， an unsupervised contrastive learning method for Chinese with Mutual Information （MI） and Prompt Learning （CMIPL） was proposed. Firstly， data augmentation approach of prompt learning was adopted to construct the sample pairs required for contrastive learning， so that all text information and order were maintained， text diversity was increased， the input structure of samples was standardized， and prompt templates were provided for input samples as context to guide the model to learn fine-grained semantics more deeply. Secondly， based on the output representation of the pre-trained language model， a prompt template denoising method was used to remove the redundant noise introduced by data augmentation. Finally， the structural information of positive samples was incorporated into the model training system， so that MI of the attention tensor of the augmented view was calculated， and the attention MI was introduced into the loss function. By minimizing the loss function， the attention distribution of the model was optimized， and alignment of the augmented view structure was maximized， so as to enable the model to better narrow the distance between positive pairs. Comparison experiments were conducted on few-shot data constructed from three public Chinese text similarity datasets： ATEC， BQ， and PAWSX. The results show that the proposed method has the best average performance， especially when the training data size is small. When using 1% and 10% sample size， compared with the baseline contrastive learning model SimCSE （Simple Contrastive learning of Sentence Embeddings）， CMIPL has the average accuracy and the Spearman’s Rank correlation coefficient （SR） increased by 3.45， 4.07 and 1.64， 2.61 percentage points， respectively， verifying the effectiveness of CMIPL in the field of unsupervised few-shot contrastive learning for Chinese.

Nested named entity recognition by contrastive learning with boundary information

Jintao FAN, Yanping CHEN, Caiwei YANG, Chuan LIN

2025, 45(10): 3111-3120. DOI: 10.11772/j.issn.1001-9081.2024101525

Asbtract ( )

HTML ( )

PDF (2573KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the following two major drawbacks of existing Contrastive Learning （CL） methods for the nested Named Entity Recognition （NER） tasks： 1） candidate entities by greedily enumerating in contrastive learning lack contextual semantics and boundary information， 2） unnecessary noise and invalid information increases computational burden and weakens contrastive learning performance， a two-stage NER framework was proposed. In the first stage， candidate entity boundaries were generated by the boundary recognition model， and candidate entities were integrated by the boundary integration module to minimize unnecessary negative candidates. Attention cues were inserted on both sides of the candidate entities to generate corresponding candidate entity texts， allowing the model to perceive contextual semantics and boundary information. In the second stage， a bi-encoder framework mapped candidate entity texts and entity label annotations into the same vector representation space through contrastive learning， with the comparison objects being sentences with attention cues rather than candidate entities. In addition， a classification parameter matrix with label semantics was designed to enrich the model’s understanding of candidate entities.Experimental results show that compared with Binder method， the proposed method improves the F1 values of 1.22， 3.42 and 2.31 percentage points， respectively， on three nested datasets： GENIA， ACE2005 and ACE2004， which verifies the effectiveness of the proposed method for tasks of nested NER.

Entity-relation extraction strategy in Chinese open-domains based on large language model

Yonggang GONG, Shuhan CHEN, Xiaoqin LIAN, Qiansheng LI, Hongming MO, Hongyu LIU

2025, 45(10): 3121-3130. DOI: 10.11772/j.issn.1001-9081.2024101536

Asbtract ( )

HTML ( )

PDF (3025KB) ( )

Figures and Tables | References | Related Articles | Metrics

Large Language Models （LLMs） face issues of unstable extraction performance in Entity-Relation Extraction （ERE） tasks in Chinese open-domains， and have low precision in recognizing texts and annotated categories in certain specific fields. Therefore， a Chinese open-domain entity-relation extraction strategy based on LLM， called Multi-Level Dialog Strategy for Large Language Model （MLDS-LLM）， was proposed. In the strategy， the superior semantic understanding and transfer learning capabilities of LLMs were used to achieve entity-relation extraction through multi-turn dialogues of different tasks. Firstly， structured summaries were generated by using LLM based on the structured logic of open-domain text and a Chain-of-Thought （CoT） mechanism， thereby avoiding relational and factual hallucinations generated by model as well as the problem of inability to consider subsequent information. Then， the limitations of the context window were reduced through the use of a text simplification strategy and the introduction of a replaceable vocabulary. Finally， multi-level prompt templates were constructed on the basis of structured summaries and simplified texts， the influence of the parameter temperature on ERE was explored using LLaMA-2-70B model， and the Precision， Recall， F1 value （F1）， and Exact Match （EM） values of entity-relation extraction by LLaMA-2-70B model were tested before and after applying the proposed strategy. Experimental results demonstrate that the proposed strategy enhances the performance of LLM in Named Entity Recognition （NER） and Relation Extraction （RE） on five different domain Chinese datasets such as CL-NE-DS， DiaKG， and CCKS2021. Particularly on the DiaKG and IEPA datasets， which are highly specialized with poor zero-shot test results of model， compared to few-shot prompt test， the model has the precision of NER improved by 9.3 and 6.7 percentage points respectively with EM values increased by 2.7 and 2.2 percentage points respectively， and has the precision of RE improved by 12.2 and 16.0 percentage points respectively with F1 values increased by 10.7 and 10.0 percentage points respectively， proving that the proposed strategy enhances performance of LLM in ERE effectively and solves problem of unstable model performance.

Distribution adaptation and dynamic curriculum pseudo-label framework for semi-supervised fire detection

Lei WANG, Jie HU, Bo PENG

2025, 45(10): 3131-3137. DOI: 10.11772/j.issn.1001-9081.2024101452

Asbtract ( )

HTML ( )

PDF (1610KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges in semi-supervised object detection due to the lack of fire image labels and the complexity and diversity of background， a Distribution Adaptation and Dynamic Curriculum Pseudo-Label framework for Semi-supervised Fire Detection （DADCPL-SFD） was proposed， which consisted of four parts： teacher-student Mutual Learning （ML） framework， Soft Label （SL）， distribution adaptation and dynamic curriculum pseudo-label. Firstly， the semi-supervised learning paradigm of teacher-student mutual learning framework was adopted to replace the fully supervised learning paradigm of YOLOv5-l for the scenario with few data labels. Secondly， soft labels were used to obtain more effective pseudo-label positive examples and optimize the semi-supervised learning process. Thirdly， the distribution adaptation loss was introduced to reduce data distribution difference between the source domain and the target domain， thereby ensuring consistent model performance across different domains. Finally， a dynamic curriculum pseudo-label strategy， inspired by the concept of curriculum learning， was designed to adjust the threshold according to the pseudo-label generation condition dynamically during different training periods， so as to filter more reasonable pseudo-labels. Experimental results on the Dataset for Fire and Smoke detection （DFS） at various supervision ratios （1%， 2%， 5%， and 10%） show that compared to the supervised learning， the proposed framework has the mean Average Precision （mAP） improved by an average of 5.32 percentage points， and the Average Precision （AP） at an Intersection over Union （IoU） threshold of 0.5 improved by an average of 11.87 percentage points， which fully demonstrates the efficiency and accuracy of DADCPL-SFD.

Analysis and prediction model of Weibo public opinion heat by integrating BERT and X-means algorithm

Zhangtao JIANG, Xin LI, Shihao ZHANG, Xinyang ZHAO

2025, 45(10): 3138-3145. DOI: 10.11772/j.issn.1001-9081.2024091371

Asbtract ( )

HTML ( )

PDF (2525KB) ( )

Figures and Tables | References | Related Articles | Metrics

In public opinion discovery and prediction on social media platforms such as Weibo， “fake hotspots” created by internet trolls will affect analysis accuracy. To reflect Weibo public opinion heat accurately， a Weibo public opinion heat analysis and prediction model integrating BERT （Bidirectional Encoder Representations from Transformers） and X-means algorithm， called BXpre， was proposed， which was designed to integrate attribute features of the participating users and time domain features of the heat changes， thereby improving prediction accuracy of heat. Firstly， Weibo original posts and interaction user data were preprocessed， and the fine-tuned StructBERT model was used to classify these data， determining the relevance between interaction users and the original posts. This relevance was used as a reference value for calculating users’ contribution weights to the heat growth of the posts. Secondly， interaction users were clustered according to their features by using X-means algorithm， and trolls were filtered based on the resulting cluster states. After that， a weight penalty mechanism targeting troll samples was introduced， and a Weibo heat index model was further constructed by combining label relevance. Finally， cosine similarity of the second derivative of the prior heat value varying with time and real data was calculated to predict future changes in Weibo heat. Experimental results show that BXpre has the Weibo public opinion heat rankings produced by the model closer to the real data under different user scales. Under mixed-scale test conditions， BXpre has the prediction correlation index reached 90.88%， which is improved by 12.71， 14.80， and 11.30 percentage points compared with three traditional methods based on LSTM （Long Short-Term Memory） network， XGBoost （eXtreme Gradient Boosting） algorithm， and TDR （Temporal Difference Ranking） separately， and is improved by 9.76 and 11.95 percentage points， respectively， compared with ChatGPT and Wenxin Yiyan.

Multimodal harmful content detection method based on weakly supervised modality semantic enhancement

Jinwen LIU, Lei WANG, Bo MA, Rui DONG, Yating YANG, Ahtamjan Ahmat, Xinyue WANG

2025, 45(10): 3146-3153. DOI: 10.11772/j.issn.1001-9081.2024101453

Asbtract ( )

HTML ( )

PDF (1447KB) ( )

Figures and Tables | References | Related Articles | Metrics

Proliferation of multimodal harmful content on social media harms public interests and disrupts social order severely at the same time， highlighting the urgent need for effective detection methods of this content. The existing researches rely on pre-trained models to extract and fuse multimodal features， often neglect the limitations of general semantics in harmful content detection tasks， and fail to consider complex， dynamic combinations of harmful content. Therefore， a multimodal harmful content detection method based on weakly Supervised modality semantic enhancement （weak-S） was proposed. In the proposed method， weakly supervised modality information was introduced to facilitate the harmful semantic alignment of multimodal features， and a low-rank bilinear pooling-based multimodal gated integration mechanism was designed to differentiate the contributions of various information. Experimental results show that the proposed method achieves the F1 value improvements of 2.2 and 3.2 percentage points， respectively， on Harm-P and MultiOFF datasets， outperforming SOTA （State-Of-The-Art） models and validating the significance of weakly supervised modality semantics in multimodal harmful content detection. Additionally， the proposed method has improvement in generalization performance for multimodal exaggeration detection tasks.

Image caption method based on Swin Transformer and multi-scale feature fusion

Ziyi WANG, Weijun LI, Xueyang LIU, Jianping DING, Shixia LIU, Yilei SU

2025, 45(10): 3154-3160. DOI: 10.11772/j.issn.1001-9081.2024101478

Asbtract ( )

HTML ( )

PDF (2194KB) ( )

Figures and Tables | References | Related Articles | Metrics

Image caption methods based on Transformer use multi-head attention to calculate attention weights on the entire input sequence， and lack hierarchical feature extraction capabilities. Additionally， two-stage image caption methods limit model performance. To address the above issues， an image caption method based on Swin Transformer and Multi-Scale feature Fusion （STMSF） was proposed. In the encoder of this method， Agent Attention was used to maintain global context modeling capability while improving computational efficiency. In the decoder of this method， Multi-Scale Cross Attention （MSCA） was proposed to combine cross-attention and depthwise separable convolution， which obtained multi-scale features and fused multi-modal features better. Experimental results on the MSCOCO dataset show that compared to SCD-Net （Semantic-Conditional Diffusion Network） method， STMSF has the BLEU4 （BiLingual Evaluation Understudy with 4-grams） and CIDEr （Consensus-based Image Description Evaluation） metrics improved by 1.1 and 5.3 percentage points， respectively. The above comparison experimental results as well as ablation experimental results show that the proposed single-stage STMSF can improve model performance effectively and generate high-quality image caption sentences.

Spatio-temporal context network for 3D human pose estimation based on graph attention

Zhengdong ZENG, Ming ZHAO

2025, 45(10): 3161-3169. DOI: 10.11772/j.issn.1001-9081.2024101489

Asbtract ( )

HTML ( )

PDF (3822KB) ( )

Figures and Tables | References | Related Articles | Metrics

According to recent research on human pose estimation， making full use of potential 2D pose space information to acquire representative characteristics can produce more accurate 3D pose results. Therefore， a spatio-temporal context network based on graph attention mechanism was proposed， which includes Temporal Context Network with Shifted windows （STCN）， Extremity-Guided global graph ATtention mechanism network （EGAT）， and Pose Grammar-based local graph attention Convolution Network （PGCN）. Firstly， STCN was used to transform the 2D joint position in long sequence into potential features of human pose in single sequence， which aggregated and utilized the long-range and short-range human pose information effectively， and reduce the computational cost significantly. Secondly， EGAT was presented for computing global spatial context effectively， so that human extremities were treated as “traffic hubs”， and bridges were established for information exchange between them and other nodes. Thirdly， graph attention mechanism was utilized for adaptive weight assignment to perform global context computation on human joints. Finally， PGCN was designed to utilize Graph Convolution Network （GCN） for computing and modeling local spatial context， thereby emphasizing the motion consistency of symmetrical nodes of human and the motion correlation structure of human bones. Evaluations of the proposed model were conducted on the two complex benchmark datasets： Human3.6M and HumanEva-Ⅰ. Experimental results demonstrate that the proposed model has superior performance. Specifically， when the input frame length is 81， the proposed model achieves a Mean Per Joint Position Error （MPJPE） of 43.5 mm on the Human3.6M dataset， which represents a 10.5% reduction compared to that of the state-of-the-art algorithm MCFNet （Multi-scale Cross Fusion Network）， showcasing higher accuracy.

Human parsing method with aggregation of generalized contextual features

Jiaqi YUAN, Rong HUANG, Aihua DONG, Shubo ZHOU, Hao LIU

2025, 45(10): 3170-3178. DOI: 10.11772/j.issn.1001-9081.2024101527

Asbtract ( )

HTML ( )

PDF (5857KB) ( )

Figures and Tables | References | Related Articles | Metrics

Human parsing aims to achieve fine-grained part segmentation in human images. Some human parsing methods enhance part representations by aggregating contextual features， but the scope of such contextual aggregation is often limited. To address this issue， a generalized contextual features aggregation based human parsing method was proposed. In this method， guided by prior knowledge of human topological structure， the contextual features were aggregated from the current image globally， and the aggregation scope was extended to other images. This extended scope was defined as generalized context. For the current image， a Cross-Stripe Attention Module （CSAM） was designed to aggregate global contextual features within the image. In this module， the human topological structure prior within the image was described through the part distribution， and the contextual features were aggregated along horizontal and vertical stripes with the above as guidance. For other images， a Region-aware Batch Attention Module （RBAM） was designed to aggregate the inter-image contextual features at batch level. Due to the constraints of human topological structure， the positional deviations of similar parts between batch human images are in a certain range. This enabled RBAM to learn the spatial offsets between similar parts of different human images， and based on these offsets， features were aggregated from similar part regions in other images along the batch dimension. Quantitative comparison results show that the proposed method improves the mean Intersection over Union （mIoU） by 0.43 percentage points compared to Dual-Task Mutual Learning （DTML） on the LIP （Look Into Person） dataset. Visualization experimental results demonstrate that the proposed method aggregates global features of the current image and part features of other images from the generalized context.

Lightweight human pose estimation based on merge state space model

Zhuoran LI, Hua LI, Tong WANG, Chaozhe JIANG

2025, 45(10): 3179-3186. DOI: 10.11772/j.issn.1001-9081.2024091351

Asbtract ( )

HTML ( )

PDF (2113KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the field of Human Pose Estimation （HPE）， heatmap-based methods suffer from the problems of big quantization error， high computational complexity， and the need to post-process the heatmap. To address the above issues， with SimCC method of coordinate regression as a baseline， a lightweight HPE model based on Merge State Space Model （MSSM） was proposed， namely Lite-SimCC. Firstly， ShuffleNet V2 was adopted as the backbone network to replace the original HRNet （High-Resolution Net）， which simplified to a structure of single-branch form and realized lightweight model. Secondly， to reduce the loss of precision， a large kernel convolution was introduced to extract global feature information. Thirdly， an MSSM was further designed to handle both local and full long sequence features， so as to enhance representational ability of the key points. Finally， a soft-label based loss function was proposed to replace the traditional one-hot loss calculation method. Experimental results show that compared with the baseline method SimCC， Lite-SimCC has the parameters decreased by 87.1%， and the Average Precision （AP） improved by 1.4% on COCO2017 test set， and it is proved on MPII dataset that Lite-SimCC reduces parameters of the model effectively while guaranteeing detection precision.

Data augmentation method for abnormal elevator passenger behaviors based on dynamic graph convolutional network

Shixiong KUANG, Junbo YAO, Jiawei LU, Qibing WANG, Gang XIAO

2025, 45(10): 3187-3194. DOI: 10.11772/j.issn.1001-9081.2024101445

Asbtract ( )

HTML ( )

PDF (3930KB) ( )

Figures and Tables | References | Related Articles | Metrics

The problems of low accuracy and poor generalization performance in recognizing abnormal behaviors of elevator passengers are due to the lack of sufficient diverse abnormal behavior data. To address these issues， a Dynamic Graph Convolutional Network-based Behavior data Augmentation （DGCN-BA） method was proposed. Firstly， a dynamic graph convolutional network was constructed to capture spatial relationships and motion correlations among different human joints in the behaviors of elevator passengers. Secondly， these features were utilized to enhance pose data， thereby generating richer and more reasonable pose sequences. Finally， the pose sequences were used to construct human actions in a virtual elevator scene， and lot of abnormal behavior video data for elevator passengers were generated. To validate the effectiveness of DGCN-BA， experiments were conducted on public datasets Human3.6M， 3DHP， MuPoTS-3D， and a self-constructed dataset. Experimental results show that compared to data augmentation methods JMDA （Joint Mixing Data Augmentation） and DDPMs （Denoising Diffusion Probabilistic Models）， DGCN-BA reduces the Mean Per Joint Position Error （MPJPE） on the Human3.6M dataset by 2.9 mm and 1.5 mm， respectively. It can be seen that DGCN-BA can complete pose estimation tasks more effectively， generates diverse and reasonable abnormal behavior data， and improves the recognition effect of video-based elevator passenger abnormal behaviors significantly.

Attribute-based entity alignment algorithm for decentralized data storage in large-scale institutions

Zeyi CAO, Yan CHANG, Renxin LAI, Shibin ZHANG, Zhi QIN, Lili YAN, Xuejian ZHANG, Yuanhao DI

2025, 45(10): 3195-3202. DOI: 10.11772/j.issn.1001-9081.2024091388

Asbtract ( )

HTML ( )

PDF (2210KB) ( )

Figures and Tables | References | Related Articles | Metrics

The data entities stored in large-scale decentralized institutions have issues such as data redundancy， missing information， and inconsistency， which requires integration through entity alignment. Most existing entity alignment methods rely on structural information of entities and perform alignment through subgraph matching. However， the lack of structural information in decentralized data storage will lead to poor alignment results. To address this issue and support identification of important data， a single-layer graph neural network-based attribute-based entity alignment model was proposed. Firstly， a single-layer graph neural network was utilized to avoid interference from secondary neighbor node information. Secondly， an attribute weighting method based on information entropy was designed to distinguish importance of the attributes in the initial stage quickly. Finally， an attention mechanism-based encoder was constructed to represent importance of different attributes in alignment from both local and global perspectives， thereby providing a more comprehensive representation of entity information. Experimental results indicate that on two decentralized storage datasets， the proposed model improves the Hits@1 by 5.24 and 2.03 percentage points， respectively， compared to the suboptimal models， demonstrating superior alignment performance of the proposed model over other entity alignment methods.

Multi-target node hiding method based on permanence

Le LYU, Bohan ZHANG, Junchang JING, Dong LIU

2025, 45(10): 3203-3213. DOI: 10.11772/j.issn.1001-9081.2024091314

Asbtract ( )

HTML ( )

PDF (2801KB) ( )

Figures and Tables | References | Related Articles | Metrics

Although community detection can reveal underlying structural characteristics of the network and relationships between nodes deeply， it also raises privacy leakage issues. Community hiding methods can resist community detection algorithms effectively， thereby achieving privacy protection of network node information. However， most of the traditional community hiding methods only focus on privacy protection of a single target or community in the network， there is a lack of a method that can hide any target set. In order to solve the above problems， a Based on Permanence-loss Maximization for multiple target Nodes Hiding （BPMNH） method was proposed. In the method， the set of target nodes to be hidden was able to be configured freely， and permanence loss maximization scheme was provided according to the network scale adaptively， thereby achieving hiding of multiple target nodes in different communities with minimal network topology disturbance cost. On eight datasets such as Karate， the experimental results show that BPMNH is better than three baseline methods such as Modularity Based Attack （MBA） in terms of hiding effect， network structure and comprehensive deception effect， validating the superiority of the proposed method in multi-target node hiding.

Time series anomaly detection based on frequency domain enhanced graph variational learning

Yuhe XIA, Xiaodong WANG, Qixue HE

2025, 45(10): 3214-3220. DOI: 10.11772/j.issn.1001-9081.2024101438

Asbtract ( )

HTML ( )

PDF (1460KB) ( )

Figures and Tables | References | Related Articles | Metrics

Time series anomaly detection is an important research topic in the field of time series analysis. Due to complex spatio-temporal dependencies and randomness of the multivariate time series in real industrial scenarios， many existing anomaly detection methods for single dependency modeling cannot learn data features effectively. In addition， ignoring frequency domain information will lead to incomplete model feature representation. To address the above problems， a time series anomaly detection model based on frequency domain enhanced graph variational learning network — FeGvL （Frequency-domain enhancement Graph-variational Learning） was proposed. Firstly， after the block operation， the dependency in time dimension was modeled by self-attention. Secondly， the graph relationship features after frequency domain enhancement were mapped to the latent space. Finally， the graph aggregation attention network was used to extract features between entities， and the temporal dependency was combined to achieve generalized variational reconstruction. Experimental results on public datasets PSM （Pooled Server Metrics）， SWaT （Secure Water Treatment） and WADI （WAter DIstribution） show that the F1 value of FeGvL is higher than those of seven advanced anomaly detection methods such as GDN （Graph Deviation Network）， TranAD （Transformer-based Anomaly Detection）， and GReLeN （Graph relational Learning Network）， and the average F1 value of FeGvL is 1.7 percentage points higher than that of the second-best model GReLeN. It can be seen that the proposed method can capture spatio-temporal dependencies effectively， provide representation capabilities， and has high anomaly detection accuracy.

Survey of federated learning based on differential privacy

Shufen ZHANG, Benjian TANG, Zikun TIAN, Xiaoyang QING

2025, 45(10): 3221-3230. DOI: 10.11772/j.issn.1001-9081.2024101505

Asbtract ( )

HTML ( )

PDF (1487KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the rapid development of artificial intelligence， the risk of user privacy disclosure is becoming serious increasingly. Differential privacy is a key privacy protection technology， which prevents personal information leakage by introducing noise into data， while Federated Learning （FL） allows joint training of models without exchanging data to protect data security. In recent years， differential privacy technology and FL are used together to give full play of their respective advantages： differential privacy ensures privacy protection in the process of data use， while FL improves the generalization ability and efficiency of the model through distributed training. Aiming at the privacy security problem of FL， firstly， the latest research progress of FL based on differential privacy was summarized and compared systematically， including different differential privacy mechanisms， FL algorithms and application scenarios. Secondly， special attention was paid to application approaches of differential privacy in FL， including data aggregation， gradient descent， and model training， and the advantages and disadvantages of various technologies were analyzed. Finally， the existing challenges and development directions of this field were summarized in detail.

Low-latency DDoS attack detection based on hybrid feature selection

Lixia XIE, Jiamin WANG, Hongyu YANG, Ze HU, Xiang CHENG

2025, 45(10): 3231-3240. DOI: 10.11772/j.issn.1001-9081.2024101457

Asbtract ( )

PDF (2282KB) ( )

References | Related Articles | Metrics

Many Distributed Denial of Service （DDoS） attack detection methods focus on improving model performance， but ignore the influence of traffic sample distribution and feature dimension on detection performance， resulting in the model learning redundant information. To address the problems of network traffic class imbalance and feature redundancy， a Hybrid Feature Selection method based on Multiple Evaluation Criteria （HFS-MEC） was proposed. Firstly， the Pearson Correlation Coefficient （PCC） and Mutual Information （MI） were considered comprehensively to select the correlation features. Then， the Sequential Backward Selection （SBS） algorithm based on Variance Inflation Factor （VIF） was designed to reduce the feature redundancy and further reduce the feature dimension. At the same time， to balance the detection performance and computation time， a Low-latency DDoS attack detection model based on Simple Recurrent Unit （SRU）（L-DDoS-SRU） was designed. Experiments were carried out on the CICIDS2017 and CICDDoS2019 datasets. The results show that HFS-MEC reduces the feature dimensions from 78 and 88 to 31 and 41， respectively； on the CICDDoS2019 dataset， L-DDoS-SRU reduces the detection time to only 40.34 seconds with a recall of 99.38%， which is improved by 8.47% compared to that of Long Short-Term Memory （LSTM）， and is increased by 9.76% compared to that of Gated Recurrent Unit （GRU）. The above verifies that the proposed method improves the detection performance and reduces the detection time effectively.

TDRFuzzer： fuzzing method for industrial control protocols based on adaptive dynamic interval strategy

Xuejun ZONG, Bing HAN, Guogang WANG, Bowei NING, Kan HE, Lian LIAN

2025, 45(10): 3241-3251. DOI: 10.11772/j.issn.1001-9081.2024091331

Asbtract ( )

HTML ( )

PDF (4461KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of low Test Case Acceptance Rate （TCAR） and lack of diversity in application of fuzzing in Industrial Control Protocols （ICPs）， a fuzzing method for ICPs based on adaptive dynamic interval strategy was proposed. Recurrent Neural Network （RNN） was added to self-attention mechanism in Transformer to construct a protocol feature extraction model； RNN was used to extract local features of the data through a sliding window， and the self-attention mechanism was introduced to carry out global feature extraction， so as to ensure the TCAR； the residual connection was added between the attention blocks to transfer the weight scores and improve the computational efficiency； a dynamic interval strategy was generated to adjust sampling range of the model at any time step， so as to increase diversity of the test cases； in the testing process， the field adaptive importance function was constructed to locate the key variant fields. Based on the above method， a fuzzing framework TDRFuzzer was designed and experimentally evaluated using three industrial protocols： Modbus TCP， S7 comm， and Ethernet/IP. The results show that compared to three models： GANFuzzer， WGANFuzzer， and PeachFuzzer， TDRFuzzer has the TCAR increased significantly， and the Vulnerability Detection Rate （VDR） increased by 0.073， 0.035， and 0.150 percentage points， respectively. This indicates that TDRFuzzer has stronger vulnerability mining capability for ICPs.

Bayesian membership inference attacks for generative adversarial networks

You SHANG, Xianghua MIAO

2025, 45(10): 3252-3258. DOI: 10.11772/j.issn.1001-9081.2024101523

Asbtract ( )

HTML ( )

PDF (1395KB) ( )

Figures and Tables | References | Related Articles | Metrics

Currently， there is a controversy about relationship between accuracies of Membership Inference Attacks （MIAs） in Generative Adversarial Networks （GANs） and generalization ability of the generative model itself， and thus effective attack ways are difficult to be widely applied， which limits the improvement of generative models. To solve the above problem， a Bayesian Estimation （BE）-based gray-box MIA scheme was proposed to match parameters in gray-box scenarios efficiently for optimal attacks. Firstly， training frameworks of the target and shadow models were designed under black-box conditions to obtain parameter knowledge required for the attack model. Then， the attack model was trained by combining and utilizing this effective parameter information to update the objective function continuously. Finally， the trained attack model was applied to MIA. Experimental results show that the attack accuracy of the gray-box attack scheme based on BE is improved by 15.89% and 21.64% respectively in average， compared to those of the existing white-box and black-box attack schemes. The above research achievements demonstrate a direct link between parameter exposure and Attack Success Rate （ASR）， and provide a direction for developing defensive strategies in this field.

MATOS： UAV swarm assisted moving-aware adaptive-parallel computing task offloading system

Jian SUN, Wei ZHANG, Baoquan MA, Zhuiwei WU, Xiaohuan YANG, Tao WU

2025, 45(10): 3259-3269. DOI: 10.11772/j.issn.1001-9081.2024101431

Asbtract ( )

HTML ( )

PDF (4356KB) ( )

Figures and Tables | References | Related Articles | Metrics

Unmanned Aerial Vehicle swarm （UAV swarm）， integrated with 5G networks， serves as a swarm flying tool carrying computational resources， providing additional computing power support for Mobile Edge Computing （MEC） networks. For semi-connected networks， where there are challenges such as lack of infrastructure computing power， massive task data， uneven distribution of mobile Internet of Things （IoT） devices， and complex communication scenarios using Orthogonal Frequency Division Multiple Access （OFDMA） technology， a Moving-aware Adaptive-parallel Task Offloading System （MATOS） was proposed. The system was comprised of a ground equipment layer， an Unmanned Aerial Vehicle （UAV） layer， and an edge computing layer， aiming to reduce task offloading latency and energy consumption， thus enhancing the task offloading success rate. In the proposed system， UAV swarm was utilized as Airborne Base Station （ABS） to handle task offloading and relay services. Firstly， to improve the task transmission quality between ground devices and the UAV swarm， a task collaborative collection mechanism was proposed by combining task attributes with mobile perception idea of regional service devices. Secondly， an Adaptive-parallel Genetic Ant Colony Optimization （AGACO） task offloading mechanism was proposed， and UAV swarm track planning idea was integrated to achieve load balancing for ABS and reduce task offloading latency. Finally， by jointly optimizing UAV swarm track planning， task offloading latency， and task offloading energy consumption， the task offloading success rate was improved. Experimental results show that MATOS reduces the flight energy consumption by 40% at most compared to energy efficient edge cloud architecture hieRarchical cloudlEt-baSed aERial Vehicle systEm （RESERVE）， Smart and Trusted Multi-UAV Task Offloading system （STMTO）， UAV Edge Computing IoT Network （UECIN）， Multi-UAV Assisted Offloading System （MAOS）， and Mobility-aware Online Task Offloading （MOTO）； compared with RESERVE， MATOS has the task offloading latency reduced by 38.8% at most， and the task offloading energy consumption reduced by 44.1% at most， which verifies the superiority of MATOS.

Task-based assistive robot path planning in nursing home scenarios

Yu WANG, Mingyue ZHAO, Xiaolin ZHOU

2025, 45(10): 3270-3276. DOI: 10.11772/j.issn.1001-9081.2024101534

Asbtract ( )

HTML ( )

PDF (3805KB) ( )

Figures and Tables | References | Related Articles | Metrics

The global aging issue is becoming severe increasingly， and the field of elderly care services is facing a challenge of manpower shortage， urgently requiring the introduction of robot technology with intelligent decision-making capabilities. To solve the autonomous path planning problem of assistive robots under a multi-task mechanism in nursing home scenarios， an improved Soft Actor-Critic （SAC） reinforcement learning decision-making algorithm was proposed. Firstly， an obstacle contour reconstruction method based on virtual circles was introduced， which reduced the complexity of environmental modeling and enhanced radar detection efficiency. Then， to tackle the difficulty of reinforcement learning algorithms in optimizing strategies from scratch when solving complex tasks in a continuous state space， Whale Optimization Algorithm （WOA） was integrated with SAC algorithm to obtain WOA-SAC algorithm. At the same time， by constructing an auxiliary supervision mechanism to provide directional guidance for the learning process， the decision-making capability was improved while the convergence was accelerated significantly. Finally， task planning was conducted on the basis of daily needs of the elderly， and model training was completed in environments composed of fixed tasks with static and dynamic obstacles as well as emergent random tasks. Simulation results demonstrate that compared to the traditional SAC algorithm， WOA-SAC algorithm reduces the average path length by 10.42%， increases the success rate by 6.66%， and decreases the average step size by 29.63%. It can be seen the significant enhancement of WOA-SAC algorithm in the learning efficiency and decision-making capability of SAC algorithm， addressing the autonomous path planning problems in multi-task mechanisms effectively.

Flow-based lightweight high-quality text-to-speech conversion method

Lianqing WEN, Ye TAO, Yunlong TIAN, Li NIU, Hongxia SUN

2025, 45(10): 3277-3283. DOI: 10.11772/j.issn.1001-9081.2024091244

Asbtract ( )

HTML ( )

PDF (1340KB) ( )

Figures and Tables | References | Related Articles | Metrics

The development of Non-AutoRegressive Text-To-Speech （NAR-TTS） models has made it possible to synthesize high-quality speech rapidly. However， prosody of the synthesized speech still needs improvement， and the one-to-many problem between text units and speeches leads to difficulties in generating Mel spectra with rich prosody and high quality. Additionally， there is a redundancy of neural networks in the existing NAR-TTS models. To address these issues， a high-quality， lightweight NAR-TTS method based on flows， named AirSpeech， was proposed. Firstly， the texts were analyzed to obtain speech feature encodings of different granularities. Secondly， attention mechanism-based techniques were used to align these feature encodings， thereby extracting prosodic information from the mixed encoding. In this process， Long-Short Range Attention （LSRA） mechanisms and single network technology were utilized to make feature extraction lightweight. Finally， a flow-based decoder was designed， which reduced the model’s parameters and peak memory significantly， and by introducing Affine Coupling Layer （ACL）， the decoded Mel spectra were more detailed and natural. Experimental results indicate that AirSpeech outperforms BVAE-TTS and PortaSpeech methods in terms of Structural SIMilarity （SSIM） and Mean Opinion Score （MOS） metrics， achieving a balance between high quality of the synthesized speech and lightweight nature of the model.

Monaural speech enhancement with heterogeneous dual-branch decoding based on multi-view attention

Gengzangcuomao, Heming HUANG

2025, 45(10): 3284-3293. DOI: 10.11772/j.issn.1001-9081.2024101463

Asbtract ( )

HTML ( )

PDF (1988KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of insufficient acoustic feature extraction， channel information loss， and amplitude-phase compensation difficulties in mainstream encoder-decoder structures， a monaural speech enhancement model of heterogeneous dual-branch decoding for monaural speech enhancement — HDBMV （Heterogeneous Dual-Branch with Multi-View） was proposed by combining speech features from different dimensions. In the model， the performance of monaural speech enhancement was improved through mechanisms such as an Information Fusion Encoder （IFE）， Time-Frequency Residual Conformer （TFRC） module， Multi-View Attention （MVA） module， and Heterogeneous Dual-Branch Decoder （HDBD）. Firstly， amplitude and multiple features were processed by IFE jointly， thereby capturing both global dependencies and local correlations to generate compact feature representations. Secondly， TFRC module was used to capture correlations along both time and frequency dimensions effectively， while reducing the computational complexity. Thirdly， MVA module was used to reconstruct information across both channel and time-frequency domains， thereby further enhancing ability of the model to represent information in multiple views and levels. Finally， HDBD was used to process amplitude features and refine multiple features separately， thereby solving the amplitude-phase compensation problem and improving the decoding robustness. Experimental results show that HDBMV achieves the Perceptual Evaluation of Speech Quality （PESQ） of 3.00， 3.12， and 2.09， respectively， and the Short-Time Objective Intelligibility measure （STOI） of 0.96， 0.97， and 0.81， respectively， on the public dataset VoiceBank+DEMAND， the large-scale dataset DNS Challenge 2020， and the self-built Tibetan dataset BodSpeDB. It can be seen that with the smallest number of parameters and high computational efficiency， HDBMV obtains the best speech enhancement performance and strong generalization ability.

Multi-stage point cloud completion network based on adaptive neighborhood feature fusion

Weigang LI, Wenjie CAO, Jinling LI

2025, 45(10): 3294-3301. DOI: 10.11772/j.issn.1001-9081.2024101437

Asbtract ( )

HTML ( )

PDF (2578KB) ( )

Figures and Tables | References | Related Articles | Metrics

Point cloud completion aims to reconstruct a high-quality complete point cloud from incomplete point cloud data. However， most existing point cloud completion networks have limitations in capturing local features and reconstructing details， resulting in poor performance of the generated point cloud in terms of local details and completion accuracy. To address these issues， a multi-stage point cloud completion Network based on Adaptive Neighborhood Feature Fusion （ANFF-Net） was proposed. Firstly， the neighborhood selection of key points was adjusted by the feature extractor adaptively to adapt to different shapes of point clouds， so as to capture spatial relationships between points with different semantics effectively， thereby reducing loss of the local details. Then， a local perception Transformer was used by the feature expander to further expand local feature information of the neighboring points， thereby improving the network’s ability to recover details. Finally， a cross-attention mechanism was applied by the point cloud generator to propagate local feature information of the incomplete point cloud selectively， and a folding module was used to refine the local regions gradually， thereby enhancing detail retention of the completed point cloud significantly and generating more consistent geometric details. Experimental results show that ANFF-Net improves the average completion accuracy by 9.68% compared to ProxyFormer on the ShapeNet55 dataset and achieves good completion performance on the PCN and KITTI datasets. Visualization results indicate that the point clouds generated by ANFF-Net have finer granularity and are closer to the ground truth in shape.

Self-supervised point cloud anomaly detection method based on point cloud reconstruction

Jianfeng YANG, Bin CHEN, Yuxuan LI

2025, 45(10): 3302-3310. DOI: 10.11772/j.issn.1001-9081.2024091347

Asbtract ( )

HTML ( )

PDF (3524KB) ( )

Figures and Tables | References | Related Articles | Metrics

As industrial production environments become more and more complex， demand for three-dimensional point cloud industrial anomaly detection is increasing. Although the two-dimensional anomaly detection methods based on pre-trained network have significant effects， generalization ability of the three-dimensional point cloud pre-training network is limited， which leads to poor effect of this kind of point cloud anomaly detection methods. To improve performance of the three-dimensional point cloud anomaly detection， Point-ReAD， an anomaly detection method based on point cloud reconstruction， was proposed. The proposed method consists of an anomaly simulation module， a point cloud reconstruction network， and an anomaly discrimination module. In specific， during training phase， anomalous point clouds were created from normal point cloud maps by the anomaly simulation module， with the normal point clouds served as self-supervised signals to guide the learning process of the reconstruction network； in the point cloud reconstruction network， Group Attention Module （GAM） was used to design complex structural information for point cloud integration， thereby capturing geometric and semantic features in point clouds effectively； in the inference phase， the tested point clouds were input to the reconstruction network to generate reconstructed point clouds， and anomalies were located accurately through the anomaly discrimination module by comparing the point clouds before and after reconstruction. Experimental results show that Point-ReAD achieves the PC-AUROC （PointCloud level Area Under the Receiver Operator characteristic Curve） and the point-level AUPRO （Area Under the Per-Region Overlap） of 95.49% and 94.66%， respectively， on MVTec 3D-AD dataset， which are improved by 0.89， 1.27 percentage points， compared to subprior method 3DR?M （3D Discriminatively trained Reconstruction Anomaly Embedding Model）.

Self-supervised image denoising based on blind-ring network and random recovery mask

Zhenyuan LIANG, Songlin JIANG, Songhao ZHU

2025, 45(10): 3311-3319. DOI: 10.11772/j.issn.1001-9081.2024091383

Asbtract ( )

HTML ( )

PDF (2478KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing self-supervised image denoising methods based on blind-spot networks often suffer from severe loss of image information due to limitations in the network structure. To solve this problem， firstly， a self-supervised image denoising method was proposed， which improved the traditional blind-spot network into a Blind-Ring Network （BRN）， so as to further reduce spatial correlation of the noise. Then， to address the issue of image information loss caused by the traditional mask strategies， a Random Recovery Mask （RRM） strategy was proposed， thereby reducing the information loss while enhancing detail information of the denoising results. Finally， a dual constraint loss function was proposed to prevent over-fitting of the model while preserving important information in the image effectively. Experimental results show that compared with the sub-optimal self-supervised image denoising method based on BRN， the proposed method improves the Peak Signal-to-Noise Ratio （PSNR） by 0.17 dB， Structural SIMilarity （SSIM） by 0.007， and reduces the Image Patch Perceptual Similarity （IPPS） by 0.006， on the SIDD validation dataset， verifying its superior denoising performance.

Adaptive face recognition in low light scenarios based on feature fusion

Shumin WANG, Shenlin LI, Xiangling ZHOU

2025, 45(10): 3320-3327. DOI: 10.11772/j.issn.1001-9081.2024101517

Asbtract ( )

HTML ( )

PDF (2766KB) ( )

Figures and Tables | References | Related Articles | Metrics

Images in real-world scenarios are affected easily by external lighting conditions or camera parameters， resulting in issues such as low overall brightness， poor visual effects， and much noise. These problems lead to difficulties in subsequent face recognition tasks， thereby causing engineering challenges. Therefore， an adaptive low-light face recognition network based on feature fusion， named LLANet （Low Light Adaptive face recognition Network）， was proposed with four parts： a decomposition subnet， a restoration subnet， an adjustment subnet， and a backbone network. Low-light and normal-light images were used as inputs. Firstly， based on Retinex theory， the input low-light and normal-light images were decomposed into the corresponding illumination and reflection maps. The illumination map was input into the adjustment subnet， where an attention mechanism was introduced to focus on lighting features， thereby enhancing the performance of low-light image enhancement and ensuring quality of the enhanced images. At the same time， the reflection map was input into the restoration subnet for detail restoration and noise reduction， thereby addressing degradation and noise issues of the reflection map in low-light images. And features of output of the adjustment and restoration subnets were fused to obtain the enhanced feature map. Then， to accomplish downstream face recognition tasks as well as prevent overfitting of lighting features and inaccuracies in face feature extraction， a weighted feature fusion strategy was adopted to combine the original face features extracted by the backbone network with the enhanced feature map， resulting in a feature map with richer information. Finally， an Adversarial Data Augmentation （ADA） strategy was introduced to generate more hard samples during training， thereby addressing the ill-posed problem while reducing the influence of alignment errors caused by low-light images during face detection phase， as a result， the network performance was further improved. Experimental results on CASIA-FaceV5， SoF， and YaleB low-light face datasets demonstrate that LLANet has the recognition rates reached 94.67%， 98.22%， and 97.24%， respectively， which are improved by 2.14， 1.58， and 2.10 percentage points on the three datasets， respectively， compared with ARoFace （Alignment Robust Face）. It can be seen that LLANet achieves high recognition accuracy in low-light scenarios.

Camouflaged object detection by boundary mining and background guidance

Zhonghua LI, Gengxin ZHONG, Ping FAN, Hengliang ZHU

2025, 45(10): 3328-3335. DOI: 10.11772/j.issn.1001-9081.2024091324

Asbtract ( )

HTML ( )

PDF (2003KB) ( )

Figures and Tables | References | Related Articles | Metrics

Since the camouflaged object is highly similar to the background， it is easily confused by background features， making it difficult to distinguish boundary information and extract object features. Current mainstream Camouflaged Object Detection （COD） algorithms mainly study the camouflage object itself and its boundaries， ignoring relationship between the image background and the object， and the detection results are not ideal in complex scenes. To this end， in order to explore potential connection between background and object， an camouflaged object detection algorithm by mining boundaries and background was proposed， called I2DNet （Indirect to Direct Network）. The algorithm consists of five parts： in the encoder， the initial raw data was processed； in the Boundary-guided feature Extracting and Mining Framework （BEMF）， more refined boundary features were extracted through feature processing and feature mining； in the Latent-feature Exploring Framework based on Background guidance （LEFB）， more salient features were explored through multi-scale convolution while based on attention， the Hybrid Attention Module （HAM） was designed to enhance selection of background features； in the Information Supplement Module （ISM）， the detailed information lost during feature processing was made up； in the Multi-task Co-segmentation Decoder （MCD）， the features extracted from different tasks and modules were fused efficiently and the final prediction results were output. Experimental results show that the proposed algorithm is better than the other 15 state-of-the-art models on three widely used datasets； especially on CAMO dataset， the proposed algorithm has the mean absolute error index dropped to 0.042.

Gravity data denoising method based on multilevel wavelet residual network

Yali XUE, Zhongmin XU, Shihao LIU

2025, 45(10): 3336-3341. DOI: 10.11772/j.issn.1001-9081.2024101545

Asbtract ( )

HTML ( )

PDF (3689KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to reduce the influence of interference noise on gravity measured data and further improve the accuracy of gravity data processing， a gravity data denoising method based on Multilevel Wavelet Residual Network （MWRNet） was proposed， which combined wavelet transform and neural network to realize removal of noise components in gravity data. Firstly， the gravity data was decomposed by wavelet transform， and then a neural network was utilized for noise extraction， while the Residual Channel Attention （RCA） module was introduced to enhance noise extraction ability of the network. The proposed gravity data denoising method was tested using simulated data and measured data， and experimental results show that the proposed method has better results compared with other gravity data denoising algorithms. In specific， with noise level of 50， in Peak Signal-to-Noise Ratio （PSNR） and Structure SIMilarity （SSIM）， the proposed method improves over 21.8% and 9.3%， respectively， compared to the traditional denoising algorithm BM3D （Block-Matching and 3D filtering）. Compared to the deep learning-based denoising algorithms DnCNN （Denoising Convolutional Neural Network） and MWCNN （Multi-level Wavelet Convolutional Neural Network）， PSNR and SSIM are improved respectively.

Pavement defect detection algorithm with enhanced morphological perception

Jiahui ZHANG, Xiaoming LI, Jiaxiang ZHANG

2025, 45(10): 3342-3352. DOI: 10.11772/j.issn.1001-9081.2024101511

Asbtract ( )

HTML ( )

PDF (8072KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of low detection accuracy and high missed detection rate caused by the narrow lateral， multi-scale， and long-range dependency characteristics of pavement defect morphology， a pavement defect detection algorithm improved by YOLOv8_n with enhanced morphological perception was proposed. Firstly， an Edge-Enhancement Focus Module （EEFM） was introduced in the backbone fusion stage， a strip pooling kernel was used to capture directional and position-aware information， thereby enhancing edge details in deep features and improving representation ability of elongated features. Secondly， a Dual Chain Feature Redistribution Pyramid Network （DCFRPN） was designed to reconstruct the fusion method， so as to provide multi-scale features with extensive perception and rich localization information， thereby improving fusion ability for multi-scale defects. Additionally， a Morphological Aware Task Interaction Detection Head （MATIDH） was constructed to enhance task interaction between classification and localization， thereby adjusting data representation dynamically and integrating multi-scale strip convolutions to optimize the classification and regression of elongated defects. Finally， a PWIoU （Penalized Weighted Intersection over Union） loss function was proposed to allocate gradient gains dynamically for prediction boxes of different qualities， thereby optimizing the regression of bounding boxes. Experimental results show that on the RDD2022 dataset， compared to YOLOv8_n， the proposed algorithm has the precision and recall improved by 3.5 and 2.3 percentage points， respectively， and the mean Average Precision （mAP） at 50% Intersection over Union （IoU） increased by 3.2 percentage points， verifying the effectiveness of the proposed algorithm.

Concrete pavement crack detection network with progressive context interaction and attention mechanism

Xuehui YIN, Linlin FU, Shangbo ZHOU

2025, 45(10): 3353-3362. DOI: 10.11772/j.issn.1001-9081.2024101486

Asbtract ( )

HTML ( )

PDF (2962KB) ( )

Figures and Tables | References | Related Articles | Metrics

To ensure road quality and safety， automated crack detection is crucial for the maintenance of concrete pavement. To address the issue of pixel information loss caused by excessive down-sampling in the existing deep learning-based crack detection methods， a concrete crack detection network based on progressive context interaction and attention mechanisms was proposed. Firstly， with an optimized UNet++ as the backbone， asymmetric convolution blocks were applied to enhance feature extraction ability. Secondly， Progressive Context Interaction Mechanism （PCIM） was introduced to capture and fuse multi-scale features of adjacent feature maps efficiently. Thirdly， in the feature enhancement phase， the Attention Combination （AC） approach was used to improve feature representation capability. Finally， in the feature fusion phase， a Multi-Semantic Attention Dynamic Fusion Module （MADFM） was utilized to enhance detail recovery and retention effects. Test results on three public datasets show that compared to DeepCrack， CrackFormer， and PAF-Net （Progressive and Adaptive feature Fusion Network）， the proposed network achieves superior performance. Specifically， the proposed network has the F-score improved by 1.33， 5.07， and 3.93 percentage points， respectively， on the DeepCrack test set； enhanced by 3.04， 4.35， and 0.82 percentage points， respectively， on the Crack500 test set； and increased by 3.03， 6.00， and 4.73 percentage points， respectively， on the CFD test set. These results verify fully that the proposed network achieves enhanced accuracy in crack detection and has excellent robust performance on different test sets.

Few-shot insulator defect detection method based on transfer learning

Hong ZHANG, Kangkang XIE, Xia NING, Wanying SONG

2025, 45(10): 3363-3370. DOI: 10.11772/j.issn.1001-9081.2024091322

Asbtract ( )

HTML ( )

PDF (4354KB) ( )

Figures and Tables | References | Related Articles | Metrics

In order to solve the problem that deep learning defect detection methods require many labeled samples for training and insulator defect samples are difficult to obtain， a few-shot insulator defect detection method based on transfer learning was proposed. Firstly， an Efficient Multi-scale Attention （EMA） mechanism was added to the backbone network to enhance the model’s ability to represent target features. Secondly， a hierarchical sampling-based Region Proposal Network （RPN） was constructed to select anchors in the feature pyramid uniformly， thereby improving the model’s ability to capture new class objects at different scales. Finally， the classification heads were decoupled， and the positive and negative samples were processed by the positive and negative heads， respectively， so that the model was able to adapt to the new class of objects more effectively. Experimental results show that compared with the baseline method TFA （Two-stage Fine-tuning Approach）， on public dataset PASCAL VOC， the proposed method improves the mean Average Precision （mAP）（with IoU （Intersection over Union） of 0.5） of new class by 9.5 percentage points on average； on the insulator defect dataset， the proposed method has the mAP₅₀ in detection tasks of 1-shot， 5-shot， 10-shot， 20-shot， and 30-shot increased by 15.8， 12.2， 17.4， 7.3 and 7.1 percentage points， respectively.

Uncertainty-aware unsupervised medical image registration model based on evidential deep learning

Yiming WANG, Shiyuan LI, Nanqing LIAO, Qingfeng CHEN

2025, 45(10): 3371-3380. DOI: 10.11772/j.issn.1001-9081.2024101442

Asbtract ( )

HTML ( )

PDF (3014KB) ( )

Figures and Tables | References | Related Articles | Metrics

Uncertainty quantification in medical image registration is crucial for doctors to evaluate risk in real-world clinical applications. Recently， deep unsupervised learning-based medical image registration models have shown certain effects， but there is a lack of methods to estimate appearance uncertainty during registration， which will affect registration accuracy and trustworthiness. In addition， in real-time application scenarios， medical image registration models need to be highly accurate and fast in inference at the same time， which is difficult to be achieved by the existing models. To address these issues， an uncertainty-aware unsupervised medical image registration model based on Evidential Deep Learning （EDL） — EvidentialMorph was proposed to apply EDL， an uncertainty quantification approach without additional computational cost， to unsupervised medical image registration. Firstly， the Deformation Vector Field （DVF） was learnt and obtained through a registration backbone network module with a U-net architecture. Then， the Normal-Inverse Gamma （NIG） distribution of the registered image was learnt and obtained through an improved Spatial Transformer Network （STN） module — evidential STN module， thereby calculating the registered image and its appearance uncertainty directly. Experiments were carried out on Hippocampus， LPBA40， and IBSR18 Magnetic Resonance Imaging （MRI） datasets. The results show that in registration accuracy， EvidentialMorph improves the Dice Similarity Coefficient （DSC） and Normalized Cross-Correlation （NCC） coefficient by 3.31% and 2.75% at most， respectively， over CLMorph model； and in inference time， EvidentialMorph reduces 85 ms. The above results verify that EvidentialMorph can obtain effective uncertainty quantification quickly and improves registration accuracy， offering potential for real-time medical image registration scenarios and improving registration effects.

7T ultra-high field magnetic resonance parallel imaging algorithm based on residual complex convolution network

Zhaoyao GAO, Zhan ZHANG, Liangliang HU, Guangyu XU, Sheng ZHOU, Yuxin HU, Zijie LIN, Chao ZHOU

2025, 45(10): 3381-3389. DOI: 10.11772/j.issn.1001-9081.2024101501

Asbtract ( )

HTML ( )

PDF (4071KB) ( )

Figures and Tables | References | Related Articles | Metrics

Parallel imaging techniques can help solving problems of radiofrequency energy deposition and image inhomogeneity， reducing scan time， lowering motion artifacts， and accelerating data acquisition in ultra-high field Magnetic Resonance Imaging （MRI）. To enhance feature extraction ability to MRI complex-valued data and reduce wrap-around artifacts caused by under-sampling in parallel imaging， a Residual Complex convolution scan-specific Robust Artificial-neural-networks for K-space Interpolation （RCRAKI） was proposed. In the algorithm， the raw under-sampled MRI scan data was taken as input， and the advantages of both linear and nonlinear reconstruction methods were combined with a residual structure. In the residual connection part， convolution was used to create a linear reconstruction baseline， while multiple layers of complex convolution were utilized in the main path to compensate for baseline defects， ultimately reconstructing Magnetic Resonance （MR） images with fewer artifacts. Experiments were conducted on data acquired from a 7T ultra-high field MR device developed by the Institute of Energy of Hefei Comprehensive National Science Center， and RCRAKI was compared with residual scan-specific Robust Artificial-neural-networks for K-space Interpolation （rRAKI） under a sampling rate of 40 Automatic Calibration Signals （ACSs） and 8 speedup ratio for mouse imaging quality across different anatomical planes. Experimental results show that in sagittal plane， the proposed algorithm has the Normalized Root Mean Squared Error （NRMSE） decreased by 59.74%， the Structural SIMilarity （SSIM） increased by 0.45%， and the Peak Signal-to-Noise Ratio （PSNR） increased by 13.04%； in axial plane， the proposed algorithm has the NRMSE decreased by 7.97%， the SSIM improved slightly （by 0.005%）， and the PSNR increased by 1.09%； in coronal plane， the proposed algorithm has the NRMSE decreased by 35.03%， the PSNR increased by 5.60%， and the SSIM increased by 0.98%. It can be seen that RCRAKI performs well on all the different anatomical planes of MRI data， can reduce the influence of noise amplification at high speedup ratio， and reconstruct MR images with clearer details.

SAMCP： lightweight fine-tuned SAM method for colon polyp segmentation

Na LIU, Jun FENG, Yiru HUO, Hongyang WANG, Liu YANG

2025, 45(10): 3390-3398. DOI: 10.11772/j.issn.1001-9081.2024101555

Asbtract ( )

HTML ( )

PDF (3276KB) ( )

Figures and Tables | References | Related Articles | Metrics

Precise segmentation of colon polyps in gastrointestinal endoscopy images holds significant clinical value. However， the traditional segmentation methods often struggle with capturing enough fine details and rely on large-scale data heavily， leading to poor performance when addressing complex polyp morphologies. Although Segment Anything Model （SAM） has notable progress in natural image segmentation， the ideal effect in polyp segmentation task cannot be achieved by SAM methods due to domain differences between natural and medical images. To address this issue， a lightweight fine-tuning method based on SAM architecture was proposed， named Segment Anything Model for Colon Polyps （SAMCP）. In this method， a streamlined adapter module focusing on channel-dimension information was introduced， a joint loss function was simplified using Dice and Intersection over Union （IoU）， and parameters of the original image encoder and prompt encoder were frozen during training to enhance polyp segmentation performance with low training cost. Experimental results on three public datasets comparing SAMCP with nine advanced methods demonstrate that SAMCP outperforms other SAM methods. Specifically， SAMCP improves the Dice and IoU values by 56.7% and 84.5%， respectively， on the Kvasir-SEG dataset， by 46.0% and 86.0%， respectively， on the CVC-ClinicDB， and by 95.3% and 122.2%， respectively， on the CVC-ColonDB dataset， surpassing the current best performance of SAM-based methods. With the introduction of point-based prompts， even with a single click， SAMCP can also outperform other SAM-based methods. The above validates that SAMCP performs well in handling complex shapes and local details， providing physicians with more precise segmentation guidance.

Multi-view difficult airway recognition based on discriminant region guidance

Songlin WU, Guangchao ZHANG, Yuan YAO, Bo PENG

2025, 45(10): 3399-3406. DOI: 10.11772/j.issn.1001-9081.2024101404

Asbtract ( )

HTML ( )

PDF (2164KB) ( )

Figures and Tables | References | Related Articles | Metrics

Difficult Airway （DA） is a critical preoperative risk factor in clinical surgery， and its accurate recognition faces numerous challenges， such as small dataset size， severe class imbalance， and insufficient single-view recognition capability. Aiming at these issues， a multi-view DA recognition model， DRG-MV-Net （Discriminative Region Guided Multi-View Net）， was proposed. In the first stage of the model， the Discriminative Region Guidance Module （DRGM） was employed to detect and emphasize key discriminative regions in facial views automatically using Class Activation Mapping （CAM）， thereby generating two types of data augmented images with specific features. In the second stage of the model， features of each view were extracted using ResNet-18 backbone network integrating Dilated-Convolution Block Attention Module （D-CBAM）， and multi-view feature integration was performed via the Multi-View Cross Fusion Module （MCFM）. Besides， Focal Loss and layered hybrid sampling were combined to mitigate the class imbalance phenomenon. Evaluated results on the constructed clinical dataset demonstrate that the proposed model achieves a G-Mean of 77.22%， an F1-Score of 43.88%， a Matthews Correlation Coefficient （MCC） of 38.73%， and an Area Under the receiver operating Characteristic curve （AUC） of 0.740 7. Compared with the recent DA recognition model MCE-Net （Multi-view Contrastive representation prior and Ensemble classification Network）， the proposed model has the G-Mean， F1-Score， and MCC improved by 2.41， 2.34， and 3.41 percentage points， respectively； compared with the baseline model ResNet-18， the proposed model has these metrics improved by 4.85， 6.85， and 8.25 percentage points， respectively， verifying the effectiveness of the proposed model in DA recognition on small， imbalanced datasets and providing new insights and methods for solving complex DA recognition.

Table of Content