Journal of Computer Applications

Subgraph-aware contrastive learning with data augmentation

Wen LI, Kairong LI, Kai YANG

2026, 46(1): 1-9. DOI: 10.11772/j.issn.1001-9081.2025010110

Asbtract ( )

HTML ( )

PDF (727KB) ( )

Figures and Tables | References | Related Articles | Metrics

Graph Neural Network （GNN） is an effective graph representation learning method for processing graph structure data. However， the performance of GNN in practical applications is limited by the problem of missing information. On the one hand， the graph structure is usually sparse， making it difficult for the model to learn node features adequately. On the other hand， model training is limited because supervised learning relies on sparse label data， making it difficult to obtain robust node representations. To address these problems， a Subgraph-aware Contrastive Learning with Data Augmentation （SCLDA） model was proposed. Firstly， the relationship scores among nodes were obtained by learning the original graph through link prediction， and the edges with the highest scores were added to the original graph to generate the enhanced graph. Secondly， local subgraphs of the original and enhanced graphs were sampled by using target nodes respectively， and the target nodes of subgraphs were input to the shared GNN encoder， so as to generate the target node embeddings at subgraph level. Finally， the mutual information between similar instances was maximized on the basis of contrastive learning of the target nodes from the two perspective subgraphs. Experimental results of node classification on six public datasets Cora， Citeseer， Pubmed， Cora_ML， DBLP， and Photo show that SCLDA model improves the accuracy over the traditional GCN model by about 4.4%， 6.3%， 4.5%， 7.0%， 13.2% and 9.3%， respectively.

Personalized federated learning method based on model pre-assignment and self-distillation

Kejia ZHANG, Zhijun FANG, Nanrun ZHOU, Zhicai SHI

2026, 46(1): 10-20. DOI: 10.11772/j.issn.1001-9081.2025010115

Asbtract ( )

HTML ( )

PDF (1406KB) ( )

Figures and Tables | References | Related Articles | Metrics

Federated Learning （FL） is a distributed machine learning method that utilizes distributed data for model training while ensuring data privacy. However， it performs poorly in scenarios with highly heterogeneous data distributions. Personalized Federated Learning （PFL） addresses this challenge by providing personalized models for each client. However， the previous PFL algorithms primarily focus on optimizing client local models， while ignoring optimization of server global model. Consequently， server computational resources are not utilized fully. To overcome these limitations，FedPASD， a PFL method based on model pre-assignment and self-distillation， was proposed. FedPASD was operated in both server-side and client-side aspects. On server-side， client models for the next round were pre-assigned targetedly， which not only enhanced model personalization performance， but also utilized server computational resources effectively. On client-side， models were trained hierarchically and fine-tuned using self-distillation to better adapt to local data distribution characteristics. Experimental results on three datasets， CIFAR-10，Fashion-MNIST， and CIFAR-100 of comparing FedPASD with classic algorithms such as FedCP （Federated Conditional Policy），FedPAC （Personalization with feature Alignment and classifier Collaboration）， and FedALA （Federated learning with Adaptive Local Aggregation） as baselines demonstrate that FedPASD achieves higher test accuracy than those of baseline algorithms under various data heterogeneity settings. Specifically，FedPASD achieves a test accuracy improvement of 29.05 to 29.22 percentage points over traditional FL algorithms and outperforms the PFL algorithms by 1.11 to 20.99 percentage points on CIFAR-100 dataset； on CIFAR-10 dataset，FedPASD achieves a maximum accuracy of 88.54%.

Data augmentation scheme based on conditional generative adversarial network in federated learning

Yinlong JIAN, Xuebin CHEN, Zhongrui JING, Qi ZHONG, Zhenbo ZHANG

2026, 46(1): 21-32. DOI: 10.11772/j.issn.1001-9081.2024121817

Asbtract ( )

HTML ( )

PDF (2735KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges of slow convergence and low model accuracy in non-independent and identically distributed （Non-IID） scenarios， this paper proposed a Data Augmentation scheme based on conditional Generative Adversarial Network in Federated learning （FDA-GAN）. First， a conditional generator for class selection was designed， adding an independent network module to each class and using the label as conditional information to more accurately extract specific features for each class. Second， a client selection strategy covering all classes was proposed. Based on the comprehensive reward of the clients， a client set containing as many classes as possible was selected for training， ensuring that the Generative Adversarial Network （GAN） could learn the complete class distribution. Finally， generated samples were used to augment the local datasets of the clients， optimizing the feature composition of the local data and reducing bias between clients. Experimental results show that under DIRichlet distributed （DIR） data partitioning， compared to CAP-GAN （Collaborated gAme Parallel learning based on GAN）， FDA-GAN improves the MNIST Score （MNIST inception Score） and Mode Score by 2.67 and 1.08 respectively， and reduces the FID （Fréchet Inception Distance） and MMD （Maximum Mean Discrepancy） scores by 55.12 and 2.56 respectively； in different Non-IID scenarios， the FedAvg （Federated Averaging） and FedProx （Federated Proximal） algorithms， when combined with FDA-GAN， converge within 50 communication rounds， with accuracy improvements of at least 30.36 percentage points. This demonstrates that FDA-GAN can improve the quality and diversity of generated samples， and when combined with baseline algorithms， it can significantly improve the accuracy and convergence speed of the federated model.

Federated split learning optimization method under edge heterogeneity

Hao YU, Jing FAN, Yihang SUN, Yadong JIN, Enkang XI, Hua DONG

2026, 46(1): 33-42. DOI: 10.11772/j.issn.1001-9081.2024121840

Asbtract ( )

HTML ( )

PDF (856KB) ( )

Figures and Tables | References | Related Articles | Metrics

Federated learning （FL）， as a privacy-preserving distributed learning framework， has solved the data silo problem effectively. However， the heterogeneous characteristics of Internet of Things （IoT） terminal devices in real-world scenarios， particularly device performance variations and Non-Independent and Identically Distributed （Non-IID） data properties， will cause degradation of the model performance and convergence speed. To address these challenges， a federated split learning optimization method under edge heterogeneity named FedCRS （Federated Cluster-based Round Splitting） was proposed. Firstly， an adaptive clustering strategy based on device performance was developed to cluster the clients dynamically， and assign customized sub-models for the clusters to balance computational loads， thereby solving straggler effect caused by device heterogeneity. Then， a ring-topology cyclic model transfer mechanism was created， where local model fusion of intra-cluster clients was implemented to alleviate client drift caused by data heterogeneity while enhancing global model robustness and generalization capability significantly. Experimental results on three label-heterogeneous datasets （FMNIST， CIFAR-10， and CIFAR-100） demonstrate that compared with five baseline methods （FedAvg （Federated Averaging）， FedProx， MOON （MOdel-cONtrastive learning）， SplitFed， and SplitMix （Split Mixing））， FedCRS has the best accuracy on all the datasets， and has the accuracy improvements of at least 8.7， 11.1， and 2.1 percentage points， respectively； and on FMINST， CIFAR-10 datasets， FedCRS has the convergence acceleration reached 78.1%and13.2%， respectively. It can be seen that FedCRS is effective in optimizing model accuracy and convergence speed under edge heterogeneous environments， indicating good practical application prospects.

Knowledge tracking model based on concept association memory network with graph attention

Fan HE, Li LI, Zhongxu YUAN, Xiu YANG, Dongxuan HAN

2026, 46(1): 43-51. DOI: 10.11772/j.issn.1001-9081.2025010065

Asbtract ( )

HTML ( )

PDF (1369KB) ( )

Figures and Tables | References | Related Articles | Metrics

Tracking students' historical interactions to predict their future performance is a critical research focus in the field of Knowledge Tracing （KT）. Recent KT methods aim to explore students' learning patterns and evolving knowledge states to provide personalized learning guidance， but ignore the richness of exercises themselves. Additionally， with the emergence of new disciplines and interdisciplinary fields， Graph Neural Network （GNN） -based KT methods face challenges in the issues such as broadening the scope of concept associations and modeling students' learning behaviors. To address these challenges， a novel Knowledge Tracing （KT） model was proposed， termed the Knowledge Tracking model based on concept association Memory network with Graph Attention （GAMKT）. The GAMKT is capable of modeling students' exercise interaction sequences， tracking their knowledge states， and capturing global features of related concepts from the exercise-concept graph. Moreover， a forgetting gate and higher-order information extraction were incorporated into the model to realistically simulate students' exercise-solving processes. Experimental results on the Junyi， ASSIST09， and Static2011 datasets demonstrate that， compared with seven baseline models including Graph-based Knowledge Tracing （GKT）， GAMKT achieves average improvements of approximately 2.1% in Area Under the Curve （AUC） and 2.4% in Accuracy， indicating that GAMKT outperforms the baseline methods on datasets with well-structured knowledge.

Bimodal fusion method for constructing spatio-temporal knowledge graph in smart home space

Fei WANG, Ye TAO, Jiawang LIU, Wei LI, Xiugong QIN, Ning ZHANG

2026, 46(1): 52-59. DOI: 10.11772/j.issn.1001-9081.2025010114

Asbtract ( )

HTML ( )

PDF (1127KB) ( )

Figures and Tables | References | Related Articles | Metrics

The development of smart home field relies on the construction of a rich spatial-temporal knowledge graph to support the design and execution of downstream tasks. However， constructing a spatio-temporal knowledge graph of smart home space faces challenges such as diverse data sources， low data quality， and limited scale. Therefore， a bimodal knowledge extraction framework integrating relative location information of description documents and user behavior logs was proposed to mine multi-modal information in device description documents and user behavior logs fully， so as to achieve efficient and accurate knowledge extraction and graph construction. The framework was composed of two parts： firstly， a method based on Relative Position Layout Matching （RPLM） was proposed to utilize the relative position characteristics of device description documents to match image with text in device description documents correctly. At the same time， the ontology model of description documents was designed and integrated with the Large Language Model （LLM） to extract structured information and construct the knowledge graph of description documents. Secondly， the Functional Correlation Analysis （FCA） algorithm and the Device Usage Behavior Processing （DUBP） algorithm were designed to extract function associated device information from user behavior logs and construct the spatio-temporal knowledge graph of home space. Finally， LayoutLMv3， ERNIE-Layout， GeoLayoutLM， and other models were selected as benchmark models， and the verification was carried out on a self-built Chinese Manual Document Layout Analysis （CMDLA） dataset， a synthesized user behavior log dataset， and three public document analysis datasets. The results show that the proposed framework outperforms the baseline methods in terms of accuracy and efficiency of knowledge extraction on the family domain dataset， achieving an accuracy of 96.39%， which is 0.97 percentage points higher than that of the suboptimal GeoLayoutLM method. It demonstrates significant advantages in heterogeneous data fusion and spatiotemporal modeling tasks.

Zero-shot re-ranking method by large language model with hierarchical filtering and label semantic extension

Xinran XIE, Zhe CUI, Rui CHEN, Tailai PENG, Dekun LIN

2026, 46(1): 60-68. DOI: 10.11772/j.issn.1001-9081.2025010082

Asbtract ( )

HTML ( )

PDF (885KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the challenges of insufficient label semantic understanding， vague relationship modeling， and high computational costs of Large Language Models （LLMs） in zero-shot re-ranking tasks， a hierarchical filtering and label semantic extension method named HFLS （Hierarchical Filtering and Label Semantics） was proposed. In the method， by constructing a multi-level label semantic extension path， a progressive prompting strategy “keyword matching → semantic association → domain knowledge integration” was designed to guide LLMs in deep relational reasoning. At the same time， a hierarchical filtering mechanism was introduced to reduce computational complexity while retaining high-potential candidate documents. Experimental results indicate that on seven benchmark datasets such as TREC-DL2019， HFLS achieves average gains of 21.92%， 13.43% and 8.59%， respectively， in NDCG（Normalized Discounted Cumulative Gain）@10 compared to Pointwise methods like Pointwise.qg， Pointwise.yes_no， and Pointwise.3Label. In terms of reasoning efficiency， HFLS has the processing latency per query reduced by 91.06%， 68.87% and 33.54% compared to Listwise， Pairwise， and Setwise methods， respectively.

Multi-feature fusion speech emotion recognition method based on SAA-CNN-BiLSTM network

Zhihui ZAN, Yajing WANG, Ke LI, Zhixiang YANG, Guangyu YANG

2026, 46(1): 69-76. DOI: 10.11772/j.issn.1001-9081.2025010042

Asbtract ( )

HTML ( )

PDF (1151KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of incomplete representation of speech information by single speech emotion features and low utilization of speech features by the model， a multi-feature fusion speech emotion recognition method based on SAA-CNN-BiLSTM network was proposed. The method enhanced data by introducing noise，volume and audio rate boosters，enabling the model to learn diverse data features，and integrated multiple features such as fundamental frequency， time domain and frequency domain features to comprehensively represent emotional information from different perspectives. Besides， based on Bidirectional Long Short-Term Memory （BiLSTM） network， Convolutional Neural Network （CNN） was introduced to capture the spatial correlation of the input data and extract more representative features. At the same time， a Simplified Additive Attention （SAA） mechanism was constructed to simplify the explicit query keys and query vectors， so that the calculation of attention weights did not depend on specific query information. Features of different dimensions were able to be correlated and influenced each other based on the attention weights. In this way， the information between features was able to be interacted and fused with each other， thus improving the effective utilization of features. Experimental results show that this method achieves the weighted precision of 87.02%， 82.59%， and 73.13%， respectively， on the EMO-DB， CASIA， and SAVEE datasets. Compared with the baseline methods such as Incremental Convolution （IncConv）， Novel Heterogeneous Parallel Convolution BiLSTM （NHPC-BiLSTM）， and Dynamic Convolutional Recurrent Neural Network （DCRNN）， the improvements are 0.52-9.80， 2.92-23.09， and 3.13-16.63 percentage points， respectively.

On-chain data query optimization based on hybrid index

Ruiyang ZHANG, Mingjie ZHAO, Bing GUO, Pinghong JIANG

2026, 46(1): 77-84. DOI: 10.11772/j.issn.1001-9081.2025010067

Asbtract ( )

HTML ( )

PDF (1270KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of low query efficiency and limited query types in on-chain data querying of blockchain systems， an inter-block index model was proposed. Firstly， for discrete attributes within blocks， Inverted Bloom Filters （IBFS） index was introduced. When querying data using IBFS， it allows for locating target block in O（1） time complexity without all block traversal. Secondly， for continuous attributes， the clustering algorithm was employed to calculate fine-grained data distribution intervals in block， and Dual-Layer Clustering Chain （DLCC） index was constructed by combining with maximum/minimum values of the data in block， thereby enabling the filtering of more non-target blocks during querying data. Finally， based on the proposed index model， various query algorithms were designed and implemented. Experimental results show that compared with tree Bloom filter index， IBFS index reduces the storage space by 51.0%， and shorten the time to locate target blocks by 75.9%； compared with start-end interval index， DLCC index reduces the number of located blocks during range query by 55.5%.

Deep evolutionary topic clustering model

Ziyang CHENG, Ruizhang HUANG, Jingjing XUE

2026, 46(1): 85-94. DOI: 10.11772/j.issn.1001-9081.2025010126

Asbtract ( )

HTML ( )

PDF (1855KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address challenges related to topic ambiguity and inaccurate alignment problems in the existing deep document clustering methods when processing dynamic textual data with topics varying with time， a Deep Evolutionary Topic Clustering Model （DETCM） was proposed. In DETCM， information of topics varying with time was captured in dynamic text， and historical topic information was integrated with document features of the current time slice， thereby discovering event topic trajectories and generating dynamic document topic representations. In specific， to solve topic ambiguity problem of topics varying with time， a topic fusion learning module based on a hybrid encoder was designed， in which topic information from preceding time slices was utilized to enhance topic discrimination and feature extraction of the current time slice. Furthermore， a topic inheritance module across different time slices was designed to achieve topic match alignment across different time slices， so that topic information on historical slices was effectively transferred and incorporated into cluster assignment process of the current time slice. Experimental results based on the real-world arXiv evolving textual document dataset demonstrate that compared with the Deep Evolutionary Document Clustering model with Instance-level Mutual Attention Enhancement （DEDC-IMAE）， DETCM achieves an average improvement of 3.08% （-0.37% to 5.43%） in Normalized Mutual Information （NMI） across all time slices， verifying the superior capability of DETCM in tracking topic evolution under dynamic scenarios， enabling more accurate capture of temporal variation features in topics and leading to better clustering performance.

Recommendation method integrating user behaviors and improved long-tail algorithm

Yancui SHI, Haozhe QIN

2026, 46(1): 95-103. DOI: 10.11772/j.issn.1001-9081.2024121727

Asbtract ( )

HTML ( )

PDF (811KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problem of the long tail effect failing to fully consider users' personalized behaviors when dividing popular items and long-tail items， a recommendation method integrating user behaviors and improved long-tail algorithm was proposed. Firstly， Bidirectional Encoder Representations from Transformers （BERT） was utilized to encode item attribute information， and items were clustered according to the encoding results. At the same time， personalized popular items and long-tail items were divided again for the user according to the user's interaction records with different clusters， thereby integrating personalized user behaviors into the process of dividing popular items. Secondly， the user's popularity sensitivity was evaluated on the basis of interaction records， thereby fully considering the extent of popularity factors influencing the user. Finally， a novel negative sampling method was proposed， in which different negative sampling strategies were adopted for users with varying popularity sensitivities， and user preference clustering was integrated to select higher-quality negative samples. Experimental results on three public real-world datasets demonstrate that compared to the traditional 80-20 division method， the proposed personalized division method is improved in terms of recall， Hit Rate （HR）， and Normalized Discounted Cumulative Gain （NDCG）. In the resampling experiment， the average NDCG@20 for the original， popular， and long-tail data across the three datasets increased by 0.45， 1.03， and 2.33 percentage points， respectively. When compared with the optimal baseline model NNS （Noise-free Negative Sampling）， improvements in metrics such as HR and NDCG were demonstrated by the proposed negative sampling method. Improvements of 2.72， 1.37， and 5.93 percentage points in the average NDCG@20 metrics were achieved on the raw data， popular data， and long-tail data， respectively， which validated the effectiveness of the proposed negative sampling method.

Session-based recommendation model based on time-aware and space-enhanced dual channel graph neural network

Xingyao YANG, Zheng QI, Jiong YU, Zulian ZHANG, Shuai MA, Hongtao SHEN

2026, 46(1): 104-112. DOI: 10.11772/j.issn.1001-9081.2025010097

Asbtract ( )

HTML ( )

PDF (1288KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem that session-based recommendation models ignore temporal information and spatial relationships among items， leading to an inability to capture complex transition patterns among items accurately， a session-based recommendation model based on time-aware and space-enhanced dual channel Graph Neural Network （GNN） was proposed. Firstly， for the temporal channel， adaptive temporal weights were used to process the items， thereby constructing a time-aware session graph， and the users’ interest-shifting patterns were captured through a time-aware GNN. Secondly， for the spatial channel， spatial relationships among items were embedded into a Graph ATtention network （GAT）， so as to aggregate the information from the perspective of spatial graph structure. Finally， a contrastive learning strategy was introduced to enhance recommendation performance. The results of comparative experiments conducted on three publicly available datasets， Diginetica， Tmall， and Nowplaying — where the proposed model was compared with baseline models including Atten-Mixer （multi-level Attention Mixture network） and GCE-GNN （Global Context Enhanced GNN） — show that the proposed model achieves superior precision （P） and Mean Reciprocal Rank （MRR）. Compared to the suboptimal results， the proposed model has the P@10 improved by 2.09%， 24.97%， and 10.45%， respectively， and the MRR@10 improved by 2.52%， 11.60%， and 4.43%， respectively.

Time series prediction model based on statistical distribution sensing and frequency domain dual-channel fusion

Junheng WU, Xiaodong WANG, Qixue HE

2026, 46(1): 113-123. DOI: 10.11772/j.issn.1001-9081.2024121750

Asbtract ( )

HTML ( )

PDF (1286KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the prediction difficulties caused by periodic complexity and high-frequency noise in time series data， a time series prediction model based on statistical distribution sensing and frequency domain dual-channel fusion was proposed to mitigate data drift， suppress noise interference， and improve prediction accuracy. Firstly， the original time series data was processed through window overlapping slices， the statistical distribution of data in each slice was calculated and normalized， and a MultiLayer Perceptron （MLP） was used to predict the statistical distribution of future data. Then， adaptive time-frequency transformation was performed to the normalized series， and the correlation features within the frequency domain and between channels were strengthened through the channel independent encoder and the channel interactive learner， so as to obtain multi-scale frequency domain representation. Finally， a linear prediction layer was used to complete the inverse transformation from the frequency domain to the time domain. In the output stage， the model used the statistical distribution of future data to perform inverse normalization，thereby generating the final prediction results. Comparison experimental results of the proposed model with the current mainstream time series prediction model PatchTST （Patch Time Series Transformer） show that on the Exchange， ETTm2， and Solar datasets， the model has the Mean Square Error （MSE） reduced by average of 5.3%， and the Mean Absolute Error （MAE） reduced by average of 4.0%， demonstrating good noise suppression capabilities and prediction performance. Ablation experimental results show that the all of data statistical distribution sensing， adaptive frequency domain processing， and dual-channel fusion modules have significant contributions for improving prediction accuracy.

Clean-label multi-backdoor attack method based on feature regulation and color separation

Yingchun TANG, Rong HUANG, Shubo ZHOU, Xueqin JIANG

2026, 46(1): 124-134. DOI: 10.11772/j.issn.1001-9081.2024121776

Asbtract ( )

HTML ( )

PDF (2376KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problem of lack of stealth and flexibility in traditional backdoor attacks， a clean-label multi-backdoor attack method based on feature regulation and color separation was proposed to train poisoned network to embed triggers based on the information hiding framework. Firstly， image edge was used as trigger， a feature regulation strategy was designed and adversarial perturbation and a surrogate model were combined to assist poisoned network training， and enhance the significance of trigger features. Secondly， by proposing a color separation strategy to color the trigger， the trigger was given distinguishable RGB space colors and set one-hot target confidence corresponding to the color to guide training， thereby ensuring the distinguishability of trigger features. In order to verify the effectiveness of the proposed method， experiments were conducted on 3 datasets （CIFAR-10， ImageNet-10 and GTSRB） and 5 models. The results show that in the single-backdoor scenario， the proposed method achieves the Attack Success Rate （ASR） over 98% on all three datasets， outperforming the second-best method by 7.94， 1.70， and 8.61 percentage points， respectively； in the multi-backdoor scenario， the proposed method achieves the ASR over 90% on the ImageNet-10 dataset， outperforming the second-best method by 36.63 percentage points averagely. The results of ablation experiments verify the rationality of the feature regulation and color separation strategies as well as the contribution of adversarial perturbation and surrogate model. The results of the multi-backdoor experiment demonstrate the flexibility of the proposed attack method.

AI-Agent based method for hidden RESTful API discovery and vulnerability detection

Yi LIN, Bing XIA, Yong WANG, Shunda MENG, Juchong LIU, Shuqin ZHANG

2026, 46(1): 135-143. DOI: 10.11772/j.issn.1001-9081.2025070909

Asbtract ( )

HTML ( )

PDF (930KB) ( )

Figures and Tables | References | Related Articles | Metrics

The popularity of RESTful APIs within modern Web services makes API security a critical concern gradually. The mainstream tools for API discovery and vulnerability detection have effect limitations in discovering hidden or undocumented APIs due to relying on API documents or public paths for scanning， and have high false positive rates in complex or dynamic API environments. Addressing these challenges， A2A （Agent to API vulnerability detection）， an Agent system for hidden API discovery and vulnerability detection was proposed through agents communicating seamlessly via a Model Context Protocol （MCP）， so as to realize full-process automation from hidden API discovery to vulnerability detection. In A2A， adaptive enumeration and HTTP response analysis were employed to discover potential hidden API endpoints automatically， and a service-specific API fingerprint library was combined to confirm and discover hidden APIs， On API vulnerability detection， Large Language Model （LLM） and Retrieval-Augmented Generation （RAG） techniques were integrated by A2A， and high-quality test cases were generated automatically through a feedback iterative optimization mechanism， so as to verify whether the vulnerability exists. Experimental evaluation results indicate that A2A has the average API discovery rate of 91.9%， with an false discovery rate of 7.8%， and discover multiple hidden API vulnerabilities previously undetected by NAUTILUS and RESTler successfully.

Deep compressive sensing network for IoT images and its chaotic encryption protection method

Yingjie MA, Jingying QIN, Geng ZHAO, Jing XIAO

2026, 46(1): 144-151. DOI: 10.11772/j.issn.1001-9081.2025020144

Asbtract ( )

HTML ( )

PDF (3742KB) ( )

Figures and Tables | References | Related Articles | Metrics

Concerning the problem that the transmission and storage of massive redundant image data in Internet of Things （IoT） lead to high resource consumption and potential privacy leakage， a Deep Compressive Sensing （DCS） network for IoT images and its chaotic encryption protection method were proposed. Firstly， an improved DCS network was proposed to achieve high-quality image compression and reconstruction. In this network， residual blocks in conventional deep reconstruction network were modified using a channel attention mechanism， and a parallel fusion design integrating multi-scale branches and a fusion module was adopted， thereby improving the reconstruction performance of traditional deep reconstruction network based on residual stacking in convolutional layers. Secondly， a multi-cavity chaotic system was proposed to realize spherical cavity expansion in the X， Y， or Z direction — either in one direction or both directions — through spherical coordinate transformation and two sets of parity-controlled step functions. This system has chaotic properties and randomness， making it suitable for image encryption. Finally， based on the proposed DCS network and the multi-cavity chaotic system， encryption and decryption for DCS measured images was designed using chaotic index scrambling and diffusion， and detailed security analysis was conducted， thereby guaranteeing the security of image transmission. Experimental results show that compared with the classical DCS method CSNet+， the proposed network achieves an average increase of 0.606 dB （0.25-1.42 dB） in Peak Signal-to-Noise Ratio （PSNR） and an average 1.11 percentage point （0.69-2.17 percentage point） improvement in Structural Similarity Index Measure （SSIM）.

Image encryption algorithm based on cascaded chaotic system and filter diffusion

Mengmeng LI, Jiaxin HUANG, Jiawen LI, Shanshan LI

2026, 46(1): 152-160. DOI: 10.11772/j.issn.1001-9081.2025010043

Asbtract ( )

HTML ( )

PDF (7446KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the problems of small existing chaos system parameter range and poor diffusion effect of the encrypted algorithm， a new cascaded chaotic system and a filter diffusion model were designed， and a color image encryption algorithm with unlimited image size was proposed and implemented. Firstly， a chaotic system named 2D-SIHC （Two-Dimensional Sine-Iterative-Henon Chaotic system） of was proposed with Henon mapping， Sine mapping and Iterative mapping as seed mappings， and a linear function was incorporated to extend the parameter range. In the two-dimensional sequence generated by this system， one dimension was employed to disturb pixel positions， the other dimension was applied to update filter templates and perform diffusion operations. Secondly， in order to avoid key reuse that reduces the algorithm’s security， the SHA-512 algorithm was combined with plaintext images to generate the key through the calculation of flag bits and weights. Thirdly， in order to enhance the algorithm's diffusion effect， a two-dimensional filter diffusion model was designed. Different from the traditional filter diffusion， which altered image pixel values by fixed template traversing， image pixel values were modified by the new diffusion model dynamically in the way of introducing chaotic sequences to update filter template values continuously. Finally， encryption was realized. Experimental results show that，taking the Airplane image as an example， the proposed algorithm can achieve the Number Pixel Change Rate （NPCR） of 99.605 9% and the Unified Average Change Intensity （UACI） of 33.397 1%， respectively， which are very close to the ideal values. In addition， the algorithm can resist a noise interference with 0.2 intensity and a cropping attack with 50% missing，and it has high encryption efficiency.

Watermarking method for diffusion model output

Yuan JIA, Deyu YUAN, Yuquan PAN, Anran WANG

2026, 46(1): 161-168. DOI: 10.11772/j.issn.1001-9081.2025010006

Asbtract ( )

HTML ( )

PDF (1899KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of image authenticity verification in deepfake detection and model copyright protection， a high-quality and highly robust watermarking method for diffusion model output， DeWM （Decoder-driven WaterMarking for diffusion model）， was proposed. Firstly， a decoder-driven watermark embedding network was proposed to realize direct sharing of encoder and decoder features， so as to produce watermarks with high robustness and invisibility. Then， a fine-tuning strategy was designed to fine-tune the pre-trained diffusion model's decoder， and embed a specific watermark into all generated images， thereby achieving simple and effective watermark embedding without changing the model architecture and diffusion process. Experimental results show that compared with Stable Signature method on the MS-COCO dataset， when the watermark bit-length is increased to 64 bits， the proposed method has the Peak Signal-to-Noise Ratio （PSNR） and Structure SIMilarity （SSIM） of the generated watermarked images improved by 14.87% and 9.41%， respectively. Moreover， the average bit accuracy of watermark extraction under cropping， brightness adjustment and image reconstruction is enhanced by than 3%， which demonstrates significantly improved robustness.

Semantic privacy protection mechanism of vehicle trajectory based on improved generative adversarial network

Na FAN, Chuang LUO, Zehui ZHANG, Mengyao ZHANG, Ding MU

2026, 46(1): 169-180. DOI: 10.11772/j.issn.1001-9081.2024121843

Asbtract ( )

HTML ( )

PDF (1490KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of ensuring the effectiveness and mining analysis value of trajectory semantic data while realizing personalized privacy protection of vehicle trajectory data， a vehicle trajectory semantic protection mechanism based on improved Generative Adversarial Network （GAN） was proposed. In this mechanism： firstly， a position sensitivity grading and semantic annotation method based on Hidden Markov Model （HMM） was designed to extract the effective stop points from vehicle trajectories， and then the stop points were divided into different sensitive levels and annotated semantically. Secondly， Long Short-Term Memory （LSTM） network was introduced into the improved GAN to construct the semantic trajectory model based on the dynamic GAN， and the GAN model was used for training to generate high-quality synthetic trajectories. Finally， for the stop points in synthetic trajectories that required further privacy protection， a differential privacy personalized protection algorithm combining the position sensitivity levels was proposed， which assigned privacy budgets to the stop points according to their sensitivity level and correlation between the stop points， and noise was injected by combining with the Laplace mechanism to achieve the privacy protection， so as to maximize the usability of the trajectory data after protection. Experimental results show that compared to the LSTM-TrajGAN model， the proposed mechanism reduces the Mutual Information （MI） value by 27.58% and improves the semantic trajectory similarity by 24.4%. It can be seen that the proposed mechanism protects user privacy effectively while ensuring the usability of semantic trajectory data.

Hybrid particle swarm optimization for solving vehicle routing problems with time windows

Luhui ZHOU, Xuezhi YUE

2026, 46(1): 181-187. DOI: 10.11772/j.issn.1001-9081.2025010113

Asbtract ( )

HTML ( )

PDF (682KB) ( )

Figures and Tables | References | Related Articles | Metrics

To efficiently solve Vehicle Routing Problems with Time Windows （VRPTW）， a Hybrid Particle Swarm Optimization （HPSO） algorithm was proposed. This algorithm replaced the traditional particle update method with Partially Matched Crossover （PMX）， enhanced diversity by combining the worst neighbor particle selection and roulette wheel selection mechanism， and balanced global exploration and local exploitation capabilities through a dynamic weight adjustment strategy. A Variable Neighborhood Search （VNS） integrating 2-opt inversion， sequential insertion， and swap operations was designed to optimize solution quality， and a greedy algorithm was used to quickly generate high-quality initial solutions. Experimental results on the Solomon standard test set show that the HPSO algorithm has the solution gap within 1% with the known optimal solution for 69% of the test problems in datasets with 25 and 50 customers， and has the solution almost close to the optimal solution for C-class test problems with 100 customers， demonstrating its effectiveness and competitiveness in solving complex VRPTW. On datasets with 100 customers， compared with the Neighborhood Comprehensive Learning Particle Swarm Optimization （N-CLPSO） algorithm， the HPSO algorithm reduces the standard deviation by at least 2.4% on the RC102 test problem， and improves the convergence speed by an average of 41% （59% and 23%） on the C101 and R101 test problems. Through the collaborative optimization of multiple strategies， the HPSO algorithm can significantly improve the solution accuracy， convergence efficiency， and robustness of complex VRPTW.

Differential evolution algorithm integrating mutation strategy and adjacency information

Min RAN, Dazhi PAN

2026, 46(1): 188-197. DOI: 10.11772/j.issn.1001-9081.2025010068

Asbtract ( )

HTML ( )

PDF (890KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at multi-objective Vehicle Routing Problem （VRP） with time windows， a Differential Evolution algorithm integrating Mutation Strategy and Adjacency Information （DE-MSAI） was proposed. Firstly， four mutation operators were designed by employing elite sampling strategy， so as to increase the algorithm search breadth. Secondly， customer adjacency information matrix was combined to guide the neighborhood search of the individuals， thereby improving local optimization efficiency. Finally， simulated annealing criterion was adopted to accept inferior solutions with certain probability. If the number of Pareto non-dominated solution set remained unimproved was beyond the threshold during optimization， elite fragment protection strategy would be activated to perturb a randomly selected solution from non-dominated solution set， thereby maintaining the population diversity. Simulation results based on Solomon standard library instances show that the proposed algorithm controls the solving error within 0.07% compared to Hybrid Crow Search Algorithm （HCSA）， and outperforms K-means clustering algorithm and Improved Large Neighborhood Search Algorithm （K-means-ILNSA） in most cases， achieving an average reduction of 4.51% in route deviation metric， verifying the effectiveness of the algorithm.

Low-latency data transmission routing method for LEO satellites considering laser antenna alignment overhead

Zifen HE, Deze ZENG, Lifeng TIAN, Yuepeng LI, Jiayu ZHANG

2026, 46(1): 198-206. DOI: 10.11772/j.issn.1001-9081.2025020140

Asbtract ( )

HTML ( )

PDF (1323KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， the growing number of Low Earth Orbit （LEO） satellites and their enhanced capabilities have enabled LEO constellations to undertake a wider range of in-orbit missions， resulting in exponential growth in satellite data transmission volumes. Emerging Laser Communication （LC） technology， with its higher bandwidth advantage， significantly improves the efficiency of Inter-Satellite Link （ISL） data transmission. However， LC-based inter-satellite communication first requires alignment between the transmitting and receiving antennas， a process that incurs considerable time overhead. Furthermore， the limited number of laser transceivers onboard satellites increases communication latency， while unreasonable laser routing and link planning strategies can further exacerbate delays. To address these challenges， this study investigated the satellite routing communication link planning problem considering laser antenna alignment overhead， and formulates it as a Mixed Integer Linear Programming （MILP） model to minimize total communication time. As the problem was proven to be NP-hard， a low-overhead approximation algorithm named Relay Satellite Laser Routing （RSLR） was proposed. The performance of RSLR was compared with two baseline algorithms： Average Latency-based Persistent Routing （ALPR） and Minimum Hop Earliest Arrival （MHEA）. Experimental results show that the RSLR algorithm reduces communication latency by 10.3% and 12.5%， respectively， compared to ALPR and MHEA， confirming its effectiveness in lowering delays in satellite data transmission.

Multi-target 3D visual grounding method based on monocular images

Shuwen HUANG, Keyu GUO, Xiangyu SONG, Feng HAN, Shijie SUN, Huansheng SONG

2026, 46(1): 207-215. DOI: 10.11772/j.issn.1001-9081.2025010074

Asbtract ( )

HTML ( )

PDF (1812KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of the problems that the existing 3D visual grounding methods rely on expensive sensor equipment， have high system costs， and exhibit poor accuracy and robustness in complex multi-target grounding scenarios， a multi-target 3D visual grounding method based on monocular images was proposed. In this method， natural language descriptions were combined to achieve the recognition of multiple 3D targets in a single RGB image. To this end， a multi-target visual grounding dataset， Mmo3DRefer， was constructed， and a cross-modal matching network， TextVizNet， was designed. In TextVizNet， 3D bounding boxes for targets were generated by a pre-trained monocular detector， and visual and linguistic information was integrated deeply via an information fusion module and an information alignment module， thereby realizing text-guided multi-target 3D detection. Experimental results of comparing with 5 existing advanced methods including CORE-3DVG （Contextual Objects and RElations for 3D Visual Grounding）， 3DVG-Transformer， and Multi3DRefer （Multiple 3D object Referencing dataset and task） show that TextVizNet improves the F1-score， precision， and recall by 8.92%， 8.39%， and 9.57%， respectively， on the Mmo3DRefer dataset compared with the second-best method Multi3DRefer， improving the precision of text-based multi-target grounding in complex scenarios significantly， and providing effective support for practical applications such as autonomous driving and intelligent robotics.

3D face generation method based on latent feature enhancement for disentanglement

Jinyu LIANG, Hongjuan GAO, Xiaofei DU

2026, 46(1): 216-223. DOI: 10.11772/j.issn.1001-9081.2025010051

Asbtract ( )

HTML ( )

PDF (3985KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems of insufficient interpretability of latent features， limited disentanglement capability， and poor identity consistency in the existing 3D face generation methods， a 3D face generation method based on Latent Feature Enhancement for Disentanglement （LFED） was proposed. Firstly， the hierarchical clustering technique was used to construct a vector discretization module， so as to promote the potential features to absorb prior knowledge and improve the disentanglement performance. Secondly， a positional attention module was designed to integrate location information of the potential features selectively through element-by-element summation operation， so as to ensure the identity consistency of generated faces. Finally， combining prior knowledge and position information， the maximum normalization technique was used to enhance the interpretability of potential features in face generation process. Experimental results demonstrate that the proposed method achieves an accuracy of 95.67% in the latent feature disentanglement metric — Variability Predictability （VP）. Compared with Swap Disentangled Variational Auto-Encoder （SD-VAE）， Local Eigenprojection Disentangled Variational Auto-Encoder （LED-VAE）， and Spherical Harmonic Local Eigenprojection Disentangled Variational Auto-Encoder （SHLED-VAE）， the improvements are 14.96， 14.33， and 12.46 percentage points， respectively. It can be seen that the proposed method enhances disentanglement performance while maintaining good representation and reconstruction capabilities.

Underwater image enhancement algorithm based on multi-scale perception and multi-dimensional space fusion

Wei GUO, Manting WANG, Haicheng QU

2026, 46(1): 224-232. DOI: 10.11772/j.issn.1001-9081.2025010139

Asbtract ( )

HTML ( )

PDF (3529KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address problems caused by deep-sea imaging， such as color distortion， low contrast， and blurred structures in underwater images， an underwater image enhancement algorithm based on multi-scale perception and multi-dimensional space fusion was proposed. By combining spatial， channel， and three-dimensional features， image information was transmitted in parallel by the algorithm to a multi-dimensional feature extraction network and an encoder. Firstly， a multiscale feature refinement module was introduced into the feature extraction network to further process the extracted feature information， allowing the network to learn information at different scales more accurately. Secondly， a multidimensional color enhancement module was incorporated into the encoder to enhance image details and colors. Finally， an adaptive enhancement network was designed to further process the feature information and fuse multi-level features， then the decoder was used to generate the final enhanced image. Experimental results on public datasets demonstrate the outstanding performance of the proposed algorithm. Specifically， it achieves a Peak Signal-to-Noise Ratio （PSNR） of up to 24.865 1 dB and a Structural Similarity （SSIM） of 0.895 4， representing improvements of 1.580 6 dB and 0.039 8 over Hybrid Fusion Method （HFM）， respectively， and it has the Underwater Color Image Quality Evaluation （UCIQE） and Underwater Image Quality Measure （UIQM） up to 0.593 1 and 3.102 8， respectively， surpassing HFM by 0.038 4 and 0.151 4， respectively. It can be seen that the proposed algorithm improves underwater visual quality effectively.

Agent prototype distillation algorithm for few-shot object detection

Binhong XIE, Rui WANG, Rui ZHANG, Yingjun ZHANG

2026, 46(1): 233-241. DOI: 10.11772/j.issn.1001-9081.2025020142

Asbtract ( )

HTML ( )

PDF (6150KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing Few-Shot Object Detection （FSOD） algorithms are constrained by insufficient class-level prototype generation accuracy and loss of detail information， which limit the feature representation capability in target regions. To address this issue， an Agent Prototype Aggregation （APA） based FSOD algorithm named APA-FSOD was proposed. In the algorithm， the support features were distilled into detail-rich prototypes through agent attention， and the prototype vectors were assigned to the query feature map accurately based on their correlations， thereby enhancing the feature representation capability in target instance regions significantly. Additionally， a Wavelet Convolution Enhancement Module （WCEM） and an Adaptive Multi-Relation Fusion （AMRF） module were designed to optimize global feature extraction and advanced feature fusion of the algorithm， respectively. Experimental results demonstrate that on three novel class splits of the PASCAL VOC dataset， the nAP50 of APA-FSOD is improved by 0.5 to 1.1 percentage points compared to that of the baseline method VFA （Variational Feature Aggregation）； under the 30-shot setting of the MS COCO dataset， nAP of APA-FSOD is increased by 1.0 percentage point compared to that of the meta-learning method SMPCCNet （Support-query Mutual Promotion and Classification Correction Network）. It can be seen that the proposed algorithm achieves significant accuracy improvement in FSOD.

Domain-adaptive nighttime object detection method with photometric alignment

Yu SANG, Tong GONG, Chen ZHAO, Bowen YU, Siman LI

2026, 46(1): 242-251. DOI: 10.11772/j.issn.1001-9081.2025010058

Asbtract ( )

HTML ( )

PDF (2857KB) ( )

Figures and Tables | References | Related Articles | Metrics

Nighttime object detection faces numerous challenges compared to daytime object detection， due to low-light conditions and the scarcity of high-quality labeled data， which hinder feature extraction and degrade detection accuracy. Therefore， a domain-adaptive object detection method for nighttime images was proposed. Firstly， a nighttime domain-adaptive photometric alignment module was designed to convert a labeled daytime source domain image into a corresponding nighttime target domain image， that is bridging the gap between source and target domains through photometric alignment， thereby solving the problem of difficulty in obtaining accurate nighttime object labels under low-light conditions. Secondly， a hybrid CNN-Transformer model was used as a detector， in which using CSwin Transformer was used as a backbone network to extract multi-level image features and these features were input into a feature pyramid network， thus enhancing the model's capability for multi-scale object detection. Finally， the Outlook attention module was introduced to address the loss of image details caused by insufficient lighting， thereby enhancing the model's robustness under varying lighting conditions， shadows， and other complex environments. Experimental results demonstrate that the proposed method achieves the mean Average Precision （mAP） @0.5 of 50.0% on the public dataset BDD100K， which is improved by 4.2 percentage points compared to 2PCNet （two-Phase Consistency Network） method； and the mAP@0.5 reached 45.4% on the public dataset SODA10M， which is improved by 0.9 percentage points compared to SFA （Sequence Feature Alignment） method.

Dual-coding space-frequency mixing method for infrared small target detection

Xiaoyong BIAN, Peiyang YUAN, Qiren HU

2026, 46(1): 252-259. DOI: 10.11772/j.issn.1001-9081.2025010078

Asbtract ( )

HTML ( )

PDF (1243KB) ( )

Figures and Tables | References | Related Articles | Metrics

InfraRed Small Target Detection （IRSTD） aims to find targets from infrared images with low Signal-to-Clutter Ratio （SCR） accurately， and has been used in many fields widely. However， due to the weak target features and severe background interference， the existing methods struggle to extract structural information of the target effectively， which leads to issues such as incomplete target segmentation and low detection accuracy. Moreover， these models usually have a large number of parameters. To solve the above problems， a dual-coding space-frequency mixing IRSTD method was proposed. Firstly， with using U-Net3+ as the basic framework， a dual-coding structure combining Multi-Shape Context Aware （MSCA） module and Frequency-Domain Interactive Attention （FDIA） module was proposed to extract space-frequency mixing features in the coding stage. Secondly， in the decoding stage， a Cross-Layer Feature Guide （CLFG） module was designed to fuse multi-scale feature maps. Finally， the proposed method was experimentally verified on the NUAA-SIRST and IRSTD-1k datasets. The results show that the proposed method has the number of parameters of 0.86×10⁶， and the Intersection over Union （IoU） reached 78.11% and 69.08% respectively. Compared with Attention Multiscale feature Fusion U-Net （AMFUNet） method， the number of parameters is reduced by 1.31×10⁶， and the IoUs are increased by 2.25 percentage points and 1.23 percentage points respectively. It can be seen that the proposed method has high detection performance while retaining fewer parameters.

Global feature pose estimation method based on keypoint distance

Yi XIONG, Caiqi WANG, Ling MEI, Shiqian WU

2026, 46(1): 260-269. DOI: 10.11772/j.issn.1001-9081.2025010071

Asbtract ( )

HTML ( )

PDF (2825KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problem of low accuracy of pose estimation due to the existence of numerous similar features and non-corresponding points in the point cloud， a global feature pose estimation method based on keypoint distance was proposed. In this method， the global features were constructed using the distances between keypoints， thereby avoiding the influence of similar local features on the accuracy of pose estimation. Meanwhile， to improve the matching speed of global features， a feature matching strategy based on distance comparison table was proposed， so that similarity measurement was carried out on global feature votes through the comparison table， thereby avoiding the interference of non-corresponding points and enhancing the efficiency of finding the correspondences by global features effectively. Finally， these correspondences were subjected to Graph-based Reliability for Outlier Removal （GROR） to eliminate outliers and obtain the transformation pose. Experimental results on four public datasets show that compared with Fast Point Feature Histogram （FPFH）， Signature of Histograms of Orientations （SHOT）， and Binarized Signature of Histograms of Orientations （BSHOT）， the proposed method has the area under the precision-recall curve of the feature matching increased by 116%， 169%， and 137% in average， respectively. Moreover， compared with the original GROR， the proposed method has the rotation error and translation error reduced by 47.38% and 52.43%， respectively.

Trident generative adversarial network for low-dose CT image denoising

Lifang WANG, Wenjing REN, Xiaodong GUO, Rongguo ZHANG, Lihua HU

2026, 46(1): 270-279. DOI: 10.11772/j.issn.1001-9081.2024121765

Asbtract ( )

HTML ( )

PDF (1519KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， significant progress has been made in applying Generative Adversarial Network （GAN） in Low-Dose Computed Tomography （LDCT） image denoising. However， the existing methods face challenges such as insufficient modeling capability for complex noise distribution and limited ability to preserve structural details. Therefore， a multi-path GAN for LDCT image denoising — Trident GAN was proposed. Firstly， a feature guided generator Trident Uformer was designed. In this generator， a Feature Polymerization Attention （FPA） module was added at the bottleneck layer of the U-Net structure， thereby solving the problem of low spatial resolution in a U-shaped structure. Secondly， a multi-path feature extraction submodule Trident Block was designed， and in each of the three blocks， a Local Detail Enhancement Block （LDEB） was introduced to extract detailed features， a Lightweight Channel Attention Block （LCAB） was incorporated to enhance channel features， and a Spatial Interaction Attention Block （SIAB） was utilized to capture important spatial features， respectively. Within the SIAB， a multi-level interactive attention function and evaluation mechanism were employed to design a Spatial Context Attention Mechanism （SCAM）， which addresses the limitations of single-attention mechanisms. Finally， a Multi-Feature Fusion （MFF） module was designed to realize feature aggregation at the end of the three blocks， and model both local detail information and global semantic information， and solve the problem of discontinuous details across different levels. Furthermore， the Multi-Scale Pyramid Discriminator （MSPD） was used to check the quality of the generated results at different dimensions， and guide the generation of globally consistent images. Experimental results show that Trident GAN achieves the average Peak Signal-to-Noise Ratio （PSNR） and Structural Similarity Index Measure （SSIM） of 31.519 3 dB/0.883 0 and 33.633 1 dB/0.947 8， respectively， on Mayo and Piglet datasets. Compared with High-Frequency Sensitive GAN （HFSGAN）， this method has the number of parameters reduced by 75.58% and the test time reduced by 36.36%. It can be seen that compared with the existing methods such as HFSGAN， Trident GAN improves image quality with less computational load.

Multi-scale and spatial frequency feature-based image segmentation network for pheochromocytoma

Chaoyun MAI, Hongyi ZHANG, Chuanbo QIN, Junying ZENG, Dong WANG

2026, 46(1): 280-288. DOI: 10.11772/j.issn.1001-9081.2024121868

Asbtract ( )

HTML ( )

PDF (3313KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of insufficient learning of features between different organs in the segmentation of pheochromocytoma images from partially annotated abdominal datasets， which leads to difficulties in distinguishing tumor and surrounding organ boundaries accurately， a Multi-scale and spatial Frequency feature-based image segmentation Network for pheochromocytoma （MF-Net） was proposed. Firstly， a Multi-scale Spatial Frequency Channel Attention module （MSFCA） was constructed， which enhanced the capture of inter-organ texture and boundary features by weighting and fusing the frequency domain information of image and the multi-scale feature maps from adjacent encoders， thereby highlighting the feature representation ability of tumor regions. Then， an Upsampling Multi-Scale Feature Fusion module （UMFF） was introduced， which combined upsampled feature maps of different scales to enhance the model's ability to recognize objects of varying sizes in image. Finally， an Adaptive Objective loss function （AOb） was utilized， which calculated the loss for the annotated abdominal organ labels and adjusted the loss weights according to the annotated organ categories， thereby optimizing learning process of the segmentation network. Experimental results show that on the abdominal organ and pheochromocytoma datasets， MF-Net has the segmentation accuracy improved by 3.33 and 3.18 percentage points， respectively， compared to the separately trained nnU-Net （no new U-Net）， with Dice similarity coefficient （Dice） and Normalized Surface Dice （NSD） reached 89.07% and 92.85%， respectively. On the external datasets， MF-Net has the Dice and NSD of 84.66% and 90.55%， respectively. In addition visualization results indicate that MF-Net can handle complex backgrounds and blurred boundaries in pheochromocytoma images better， providing better technical support for accurate diagnosis and treatment of pheochromocytoma.

Lightweight motor imagery electroencephalogram decoding neural network with multi-domain feature fusion

Ning CAO, Xin WEN, Yanrong HAO, Rui CAO

2026, 46(1): 289-296. DOI: 10.11772/j.issn.1001-9081.2025010019

Asbtract ( )

HTML ( )

PDF (1645KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of slow decoding speed in large-scale networks and insufficient utilization of feature information in decoding Motor Imagery ElectroEncephaloGram （MI-EEG）， a lightweight MI-EEG decoding neural network with multi-domain feature fusion was proposed. In the proposed network， lightweight modules were introduced to extract multi-domain features， including the use of SincNet for frequency-domain feature extraction and the use of Temporal Convolutional Network （TCN） for time-domain feature extraction. Additionally， after extracting time-frequency domain features， the Squeeze-and-Excitation （SE） attention was incorporated to calibrate feature maps adaptively， thereby emphasizing important features and suppressing redundant information. Finally， separable convolution was employed to fuse time-frequency features effectively， thereby addressing the limitation of single-domain feature information. Furthermore， a joint loss function combining cross-entropy and center loss was adopted to constrain network training， thereby optimizing both intra-class and inter-class classification performance. Experimental results showed that on the Motor Imagery （MI） public datasets BCI 2a， SMR-BCI， and OpenBMI， the proposed network had the parameter counts of 6 870， 5 690， and 6 870， respectively， the average accuracies of 74.78%， 71.93%， and 65.40%， respectively， and the average Kappa values of 0.70， 0.66， and 0.59， respectively； Compared to Deep Convolutional Network （DeepConvNet）， lightweight EEG convolutional neural Network （EEGNet）， and Temporal Convolutional Network-based EEG Recognition （EEG-TCNet）， the proposed network achieved average accuracy improvements of 11.06， 8.85， and 6.36 percentage points on the BCI 2a dataset； 10.53， 4.17， and 3.57 percentage points on the SMR-BCI dataset； and 5.09， 4.99， and 2.33 percentage points on the OpenBMI dataset， respectively. The results demonstrate that the proposed network ensures lightweight design while maintaining robust decoding performance.

Transformer and gated recurrent unit-based de novo sequencing algorithm for phosphopeptides

Lijin YAO, Di ZHANG, Piyu ZHOU, Zhijian QU, Haipeng WANG

2026, 46(1): 297-304. DOI: 10.11772/j.issn.1001-9081.2025010060

Asbtract ( )

HTML ( )

PDF (995KB) ( )

Figures and Tables | References | Related Articles | Metrics

Peptide sequencing using tandem mass spectrometry for proteolytically digested peptides （referred to as peptide identification） is a foundational technology in proteomics research. Current de novo peptide sequencing algorithms face challenges in identifying phosphopeptides accurately， which are of significant biological importance. The primary reason is the complex fragmentation patterns induced by phosphorylation， the frequent occurrence of neutral loss peaks， and the low abundance of phosphopeptides' mass spectrum in conventional mass spectrometric data. To address these issues， a Transformer and Gated Recurrent Unit （GRU）-based de novo sequencing algorithm for phosphopeptides was proposed， namely TGNovo. A spectrum graph was introduced in TGNovo to model the mass differences between peaks explicitly， guiding the Transformer encoder to capture spectral features. The Transformer module and the GRU module jointly model the association between spectral and amino acid sequence features and the dependencies among spectral peaks and amino acids， respectively， working in concert to achieve peptide reconstruction. Compared to the fully Transformer-based de novo sequencing algorithm Casanovo， TGNovo fully utilizes prior spectral information through the spectrum graph and GRU module， enhancing the model's ability to model spectrum graph. In evaluations of phosphopeptide fragments across species， TGNovo outperforms Casanovo with average improvements of 16.5 percentage points in peptide-level recall and 37.1 percentage points in amino acid-level recall. Additionally， experimental results on an immune peptide dataset show that TGNovo-identified high-confidence antigenic peptides cover 86% of the database search results.

Method for life prediction of parallel branching engine based on multi-modal fusion features

Yanan LI, Mengyang GUO, Guojun DENG, Yunfeng CHEN, Jianji REN, Yongliang YUAN

2026, 46(1): 305-313. DOI: 10.11772/j.issn.1001-9081.2025010070

Asbtract ( )

HTML ( )

PDF (907KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problems that engine operation data are multi-modal and it is difficult to achieve effective engine life prediction， a parallel branching engine life prediction method was proposed on the basis of multi-modal features integrating potential relationship between images and engine operation time data. Firstly， a sliding window was used to segment the engine operation data， so as to construct sequence samples of engine operation data， and Gramian Angular Field （GAF） was used to convert the constructed sequence samples into images. Then， the sequence samples and images were processed by a Bi-directional Long Short-Term Memory （BiLSTM） network and a Convolutional Neural Network （CNN） to obtain potential relationship features between sensors such as trends and cycles. Finally， Cross-Attention Mechanism （CAM） was introduced to achieve fusion of the two modal features and realize life prediction of the engine. Experimental results on the public C-MAPSS dataset show that the R-squared （R²） of the prediction method is higher than 0.99 and the Root Mean Square Error （RMSE） of the method is less than 1. It can be seen that the method can improve computational efficiency while ensuring the prediction accuracy.

Cable temperature prediction model based on multi-scale patch and convolution interaction

Tingting WANG, Tingshun LI, Wen TAN, Bo LYU, Yixuan CHEN

2026, 46(1): 314-321. DOI: 10.11772/j.issn.1001-9081.2025010122

Asbtract ( )

HTML ( )

PDF (2052KB) ( )

Figures and Tables | References | Related Articles | Metrics

Prolonged overheating of high-voltage cables may lead to insulation thermal breakdown， consequently affecting the stability of the power grid. However， the existing research primarily focuses on traditional prediction models， and ignores the complexity and dynamic characteristics of temperature data. To address this limitation， a cable temperature prediction model based on Multi-Scale Patch and Convolution Interaction （MSP-CI） was proposed. Firstly， the input dimension was reduced using a channel resampling method， and a multi-scale patch branch structure was constructed， so as to decouple the complex time series. Then， macroscopic information from coarse-grained patches and microscopic information from fine-grained patches were extracted， respectively， through the combination of sequence decomposition and convolution interaction strategies. Finally， an attention fusion module was constructed to balance the weights of macroscopic and microscopic information dynamically and obtain the final prediction results. Experimental results on real high-voltage cable temperature datasets demonstrate that compared to the baseline models such as TimeMixer， PatchTST （Patch Time Series Transformer）， and MSGNet （Multi-Scale inter-series Graph Network）， MSP-CI achieves a reduction of 7.02% to 34.87% in Mean Squared Error （MSE）， and a reduction of 5.15% to 32.04% in Mean Absolute Error （MAE）. It can be seen that MSP-CI enhances cable temperature prediction accuracy effectively， providing a reliable basis for power dispatching operations.

Feature extraction method of flight pitch operation based on quick access recorder data

Xiuyan ZHANG, Wentao LIU, Xin WANG

2026, 46(1): 322-330. DOI: 10.11772/j.issn.1001-9081.2025010120

Asbtract ( )

HTML ( )

PDF (1401KB) ( )

Figures and Tables | References | Related Articles | Metrics

Low efficiency of Quick Access Recorder （QAR） data analysis highlights the importance of feature extraction from QAR data. In response to issue of insufficient focus on temporal trend features in QAR data feature extraction， Piecewise Cubic Hermite Interpolating Polynomial （PCHIP） module and ordering relation algorithm （G1） weight assignment module were integrated to form the interpolation weighting， Piecewise Cubic Hermite Interpolating Polynomial — ordering relation algorithm （PG）， then a Convolutional AutoEncoder （CAE） was combined to construct PG-CAE model， and a flight pitch operation feature extraction method based on QAR data was proposed to support analyses such as flight-level anomaly detection. Firstly， the PCHIP was used to standardize data length. Secondly， the G1 weight assignment module was used to determine the weights according to causal temporal correlation between flight operations and flight attitudes， thereby quantifying temporal importance of flight pitch operation data. Thirdly， the CAE module was employed to extract features from the weighted data. Finally， the model validation was performed on the basis of pitch operation data from 406 flight segments of a certain airline's A319 aircraft. Experimental results indicate that PG-CAE model outperforms CAE model significantly by introducing PCHIP and G1 modules， so as to employ the reconstruction error to measure compliance of individual data samples with original data， which determines model acceptability， and use standard deviation to assess model's capability to extract overall trend features from dataset. Ultimately， CAE5 model with a 5- convolutional-pooling layers is identified as the optimal model structure， demonstrating a reconstruction error of 0.032 84 and standard deviations of （0.162 1， 0.280 5）. Furthermore， by combining K-means algorithm， a comparison of point clustering effect after PG-CAE feature extraction with curve clustering effect without feature extraction further demonstrates that PG-CAE model can extract line cluster data of temporal trend data as point cluster data of two-dimensional features， serving research such as flight-level anomaly detection based on QAR data.

Point cloud data augmentation method based on scattering and absorption effects of coal dust on LiDAR electromagnetic waves

Shiwei LI, Yufeng ZHOU, Pengfei SUN, Weisong LIU, Zhuxuan MENG, Haojie LIAN

2026, 46(1): 331-340. DOI: 10.11772/j.issn.1001-9081.2025010085

Asbtract ( )

HTML ( )

PDF (8148KB) ( )

Figures and Tables | References | Related Articles | Metrics

Current 3D object detection models are predominantly based on data-driven deep learning techniques， so that dataset quality plays a pivotal role in model performance. Aiming at the scarcity of coal dust environment data and the time-consuming and labor-intensive construction process of real-world datasets， a point cloud data augmentation method was proposed on the basis of scattering and absorption effects of coal dust on Light Detection and Ranging （LiDAR） electromagnetic waves. In the method， by considering the optical characteristics of coal dust particles， a propagation simulation model of LiDAR electromagnetic waves was built to characterize LiDAR signal attenuation and scattering phenomena in coal dust environment. Secondly， based on real point cloud data acquired under clear weather conditions， 3D coordinates and reflection intensities were adjusted via the simulation model， so as to generate simulated point cloud data that conforms to the perception characteristics of coal dust environment. Finally， five mainstream 3D models （PV-RCNN++， PV-RCNN， PointRCNN， PointPillars， Voxel-RCNN_Car） were trained and tested on the augmented dataset. The results demonstrate that the proposed method improves the detection precision of these five detection models in coal dust environment. For the most complex model， PV-RCNN， improved performance is 1.88， 1.74， and 0.84 percentage points in the car， pedestrian， and cyclist categories， respectively. It can be seen that in comparison with models trained in clear weather conditions， using the augmented point cloud data to train object detection models improves detection precision in coal dust environment significantly， so as to enable more reliable perception in complex environments in open-pit mines， thereby providing data support for stable operation of autonomous mine carts.

Table of Content