Loading...

Table of Content

    10 March 2026, Volume 46 Issue 3
    Artificial intelligence
    Review of large language model methods for knowledge graph completion
    Haoyang ZHANG, Liping ZHANG, Sheng YAN, Na LI, Xuefei ZHANG
    2026, 46(3):  683-695.  DOI: 10.11772/j.issn.1001-9081.2025030294
    Asbtract ( )   HTML ( )   PDF (816KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Knowledge Graph (KG) can extract and structurally represent the prior knowledge from massive data, and plays a key role in the construction and application of intelligent systems. Knowledge Graph Completion (KGC) aims to predict missing triples in the KGs to improve integrity and usability, and usually covers encoding and prediction links. However, the traditional KGC methods have difficulties in utilizing additional information and semantic information effectively in the encoding process, the problems of incomplete knowledge coverage and closed world in the prediction process, and the framework of first encoding and then prediction will be limited by embedded representation forms and computing efficiency. Large Language Models (LLMs) can solve these problems with rich knowledge and strong understanding abilities. Therefore, LLM methods for KGC were reviewed. Firstly, the basic concepts and research status of KGs and LLMs were outlined, and the KGC process was explained. Secondly, the existing KGC methods based on LLMs were summarized and sorted out from three aspects: using LLM as an encoder, using LLM as an generator, and basing on prompt guidance. Finally, the performance of the models on different datasets was summed up and the problems and challenges faced by KGC research based on LLMs were discussed.

    Pre-answering and retrieval filtering: dual-stage optimization method for RAG-based question-answering systems
    Yiming HUANG, Xihua ZOU, Guo DENG, Di ZHENG
    2026, 46(3):  696-707.  DOI: 10.11772/j.issn.1001-9081.2025030288
    Asbtract ( )   HTML ( )   PDF (837KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing Retrieval-Augmented Generation (RAG) question-answering systems in domain-specific applications face challenges such as a single retrieval path, insufficient coverage of users’ implicit intents, and low-quality retrieved segments, resulting in inaccurate and incomplete answers. Therefore, a dual-stage optimization method, Pre-Answering and Retrieval Filtering (PARF), was proposed. Firstly, by integrating domain knowledge graphs and prompt engineering techniques, Large Language Models (LLMs) were guided to generate preliminary answers, thereby constructing a multi-directional retrieval path of “original query → preliminary answer → relevant segments” to expand the semantic space of the original query. Secondly, the retrieved segments were scored and filtered based on the relevance using a BERT (Bidirectional Encoder Representations from Transformers) model, thereby enabling collaborative optimization between the retrieval and generation stages, as well as improving the density of effective information. Experimental results show that compared to the RAG question-answering system constructed by the baseline method DPR-LLM (Dense Passage Retrieval with LLM), the RAG question-answering system constructed by PARF method achieves the improvements of 19.8 and 41.5 percentage points in consistency metrics F1 and ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation-L) score, respectively, on a rail transportation question-answering dataset, the improvements of 16.1 and 17.6 percentage points, respectively, on a medical question-answering dataset; and the correct rates of effectiveness metric increased by 10.2 and 8.8 percentage points.

    Multi-Agent collaborative knowledge reasoning framework
    Rilong WANG, Zhenping LI, Xiaosong LI, Qiang GAO, Ya HE, Yong ZHONG, Yingxiao ZHAO
    2026, 46(3):  708-714.  DOI: 10.11772/j.issn.1001-9081.2025030349
    Asbtract ( )   HTML ( )   PDF (612KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The core mission of intelligence analysis is to extract event associations and perform causal inference from massive data. However, the existing Large Language Model (LLM)-based methods are constrained by the context window and computational complexity in processing long texts, so as to have difficulty in effective capture of causal associations between events, leading to a significant decline in inference capability, particularly in language models with limited parameter sizes. To address this issue, a Multi-Agent collaborative Knowledge Reasoning (MAKR) framework based on a dual tower structure was proposed. In the framework, the complex associations among entities were modeled explicitly through incremental construction of an entity relationship graph, thus assisting LLMs to achieve more accurate causal inference. The dual tower structure was designed to enable the graph model and the language model to process graph structure information and textual information, respectively. The comprehension ability of the model for complex logical relationships within long texts was enhanced through a fusion mechanism. Additionally, a prediction task for node relations in the graph was added on the basis of the language prediction task, thereby further optimizing the semantic alignment effect. Experimental results demonstrate that under conditions of limited computational resources, MAKR framework achieves superior performance in both event prediction and causal inference compared to the existing methods such as HetGNN (Heterogeneous Graph Neural Network) and HiGPT in the tasks of security event analysis of the GDELT dataset and sanction associated analysis of the OpenSanctions dataset. In conclusion, the practical value of the framework in industrial scenarios with constrained computational resources was validated.

    Dynamic clause selection method based on AHP_TOPSIS
    Yuanyuan LIU, Shuwei CHEN, Depei SONG, Yuanjun YANG
    2026, 46(3):  715-722.  DOI: 10.11772/j.issn.1001-9081.2025030382
    Asbtract ( )   HTML ( )   PDF (677KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Clause selection is the core part of Automated Theorem Prover (ATP), the ability and efficiency of ATP can be improved by optimizing the clause selection method. At present, although the traditional one-by-one filtering method based on attribute priority can realize clause selection, it is difficult to evaluate clause comprehensively and lacks flexibility. Therefore, a dynamic clause selection method based on AHP_TOPSIS was proposed. In the method, the weight of each attribute of the clause was calculated by Analytic Hierarchy Process (AHP), and then the clauses were evaluated and ranked by combining the weight results with Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS), thereby providing a basis for clause selection. In the AHP, considering the dynamic change of clause attributes, the methods of stage awareness and smooth transition were introduced. This enables the judgment matrix to be adjusted dynamically according to the derivation process, thus extending the AHP to the dynamic AHP. Meanwhile, according to the above clause selection method, the corresponding algorithm was implemented and applied to the first-order logic theorem prover CSE (Contradiction Separation Extension) to form a new prover CSE_AT. This prover was used to test the first-order logic problems in the TPTP (Thousands of Problems for Theorem Provers) problem bank from 2021 to 2024. Experimental results show that CSE_AT proves 22 more theorems than CSE, and most of the theorems proved by CSE_AT have Rating value in the range of [0.6,0.9]. It can be seen that the AHP_TOPSIS-based dynamic clause selection method can optimize the deductive paths, so as to improve the proof ability of the prover.

    MG-SQL: SQL generation framework with enhanced schema linking and multi-generator collaboration
    Dingjia WU, Zhe CUI
    2026, 46(3):  723-731.  DOI: 10.11772/j.issn.1001-9081.2025040454
    Asbtract ( )   HTML ( )   PDF (642KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the limitations of Large Language Models (LLMs) in generating Structured Query Language (SQL) in complex multi-table database scenarios, a multi-generator collaboration-based Text-SQL framework MG-SQL (Multi-Generator SQL) based on collaborative generators was proposed. Firstly, to mitigate noise interference caused by irrelevant schema information, the optimization method for enhancing schema linking process was proposed by generating initial SQLs and combining semantic similarity-based retrieval. Secondly, to improve the quality and diversity of candidate SQLs, a multi-strategy collaborative generation framework was developed on the basis of refined schema: 1) the experience generator was used to retrieve dynamic examples; 2) the chain-of-thought generator was used to strengthen logical reasoning; 3) the query plan generator was used to simulate database execution flows; and 4) the progressive generator was used to perform iterative optimization. Thirdly, the optimal SQL was selected through voting mechanism. Finally, a reflective learning mechanism was further proposed, where the generated results and reference SQL were compared to form reflective samples, so as to construct domain-specific knowledge base dynamically for continuous learning. The BIRD benchmark results demonstrate that, when employing the lightweight GPT-4o-mini model, the proposed framework’s schema linking achieves a 98.89% Strict Recall Rate (SRR) while effectively filtering out 44.91% of irrelevant columns; the SQL generated by the proposed framework achieves a 69.69% EXecution accuracy (EX) and a 79.59% Valid Efficiency Score (VES), outperforming mainstream GPT-4o-based approaches, which validates the effectiveness of the proposed framework in complex scenarios.

    Few-shot relation extraction model with graph-based multi-view contrastive learning
    Yuhang XIAO, Guanfeng LI, Yuyin CHEN, Jing QIN
    2026, 46(3):  732-740.  DOI: 10.11772/j.issn.1001-9081.2025030371
    Asbtract ( )   HTML ( )   PDF (661KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Few-Shot Relation Extraction (FSRE) tasks aim to recognize the semantic relationship between entities in text from limited number of labelled examples. To overcome the problems such as feature alignment deficiency and poor task adaptation caused by single-view contrastive learning and static graph structures adopted in the existing methods, an FSRE model fusing multi-view contrastive learning and dynamic graph generation mechanism, namely SAGM (Synergistic Anchored Graph-based Model), was proposed. In the model, in the pre-training stage, sentence-anchored and label-anchored strategies were introduced through multi-view contrastive learning, so as to enhance the alignment effect between instance and relation label features. In the task generation stage, a graph generation module was used to construct task-specific graph structure, and a multi-head attention guidance layer was applied to adjust feature importance dynamically, thereby improving the model’s adaptability in few-shot and cross-domain tasks. Experimental results on the FewRel 1.0, FewRel 2.0, and NYT-25 datasets show that the proposed model achieves high accuracy in multiple N-way K-shot settings, demonstrating strong generalization ability and task adaptability, and confirming its effectiveness in few-shot and cross-domain scenarios.

    Remote sensing image captioning model combining dense multi-scale feature fusion and feature knowledge-enhanced Transformer
    Hanqing LIU, Guoming SANG, Yijia ZHANG
    2026, 46(3):  741-749.  DOI: 10.11772/j.issn.1001-9081.2025040414
    Asbtract ( )   HTML ( )   PDF (791KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the challenges of insufficient multi-scale feature utilization, low inter-region detail correlation in texture-repetitive areas, and difficulty in multi-target feature collaborative modeling in remote sensing image captioning tasks, a remote sensing image captioning model combining dense multi-scale feature fusion and feature knowledge-enhanced Transformer — DMFKF-T (Dense Multi-scale Feature and Knowledge Fusion Transformer) was proposed. A Dense Multi-scale Feature Fusion Module (DMFFM) was designed to aggregate feature maps with different scales dynamically through cross-layer skip connections, thereby capturing global scene features and local detail information simultaneously. During the decoding stage, a Semantic Fusion Amplifier (SFA) module was introduced to enhance the model's abilities to capture long-range dependencies and comprehend contextual information, and the frequency-enhanced channel attention mechanism in Discrete Cosine Transform (DCT) was incorporated to analyze the correlation of frequency-domain features, thereby strengthening the modeling capability for complex spatial topologies and nonlinear relationships. On the Remote Sensing Image Captioning Dataset (RSICD), compared with SD-RSIC (Summarization-driven Deep Remote Sensing Image Captioning) model, DMFKF-T improves the BLEU-4(BiLingual Evaluation Understudy with 4-grams) and CIDEr (Consensus-based Image Description Evaluation) metrics of by 8.6% and 14.4%, respectively. It can be seen that DMFKF-T can generate semantically rich descriptions for remote sensing images accurately.

    Chinese character image retrieval algorithm in ancient books based on NetVLAD feature encoding
    Huihui CHEN, Hongtao SUN, Boliang GUAN, Zhongqing HENG
    2026, 46(3):  750-757.  DOI: 10.11772/j.issn.1001-9081.2025030320
    Asbtract ( )   HTML ( )   PDF (1990KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Retrieval of ancient characters is a part of current digitization work of ancient books. Ancient Chinese books often exhibit inconsistent printing glyphs and a wide variety of font types, and using visual features for Chinese character retrieval is an effective solution. Therefore, a Chinese Character Feature Extraction and Encoding Network (CFEENet) was proposed. Firstly, a Convolutional Neural Network (CNN) was used to extract the visual features of Chinese character images in ancient books. Secondly, a trainable generalized vector aggregation layer, namely NetVLAD, was employed to aggregate and encode the visual features. Finally, the cosine similarity was used to calculate the code similarity to realize Chinese character retrieval in ancient books. Besides, a visual analysis of CFEENet encodes was carried out using t-distributed Stochastic Neighbor Embedding (t-SNE) after dimension reduction, and it was found that the clusters formed by CFEENet encoding had high density, small overlap between clusters, and high encoding resolution. CFEENet was tested on multiple ancient book datasets. Experimental results show that CFEENet outperforms comparison methods such as Ancient Chinese Character Image Network (ACCINet) in terms of mean Average Precision (mAP) and F1 score in most scenarios, while achieves a good balance between retrieval quality and efficiency, verifying the applicability and effectiveness of CFEENet in tasks of retrieving Chinese character in ancient books.

    Local and long-range temporal complementary modeling for video action recognition
    Zuxi ZHANG, Zhancheng ZHANG, Fuyuan HU
    2026, 46(3):  758-766.  DOI: 10.11772/j.issn.1001-9081.2025040509
    Asbtract ( )   HTML ( )   PDF (980KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to the diversity and complexity of spatio-temporal features in videos, as well as the wide variability of actions across different speeds and scales, problems of insufficient capture of local motion details and inadequate mining of long-range temporal dependencies are commonly encountered in the existing action recognition methods. Therefore, a video action recognition network based on complementary modeling of local and long-range temporal information was proposed. The network is composed of the Two-level Fusion Motion Excitation (TFME) and the Temporal Aggregation Channel Excitation (TACE) modules. In the TFME module, the first-order and second-order differences between adjacent feature maps were computed and fused, and the fused weights were used to excite channels of the original feature maps, so as to enhance the fine-grained extraction capability of multi-level motion features, thereby modeling local temporal information. In the TACE module, a hierarchical residual pyramid structure was constructed using a channel grouping strategy, which expanded the temporal receptive field and enhanced the learning ability of multi-scale features. Meanwhile, a Temporal Channel Attention (TCA) mechanism was designed to adjust the aggregated feature maps dynamically and optimize the weight allocation among temporal channels, thereby modeling long-range temporal information. Finally, the above complementary modules were integrated and embedded into a 2D residual network to realize end-to-end action recognition. Experimental results on the Something-SomethingV1 and V2 validation sets show that using only RGB frames with a random 8-frame sampling strategy, the proposed network achieves the Top-1 accuracies of 50.6% and 61.9%, respectively; with a 16-frame sampling strategy, the accuracies are 54.1% and 65.6%, respectively. It can be seen that the proposed network models both local motion details and long-range temporal dependencies efficiently, offering a new way of thinking for action recognition tasks in complex temporal scenarios.

    Multi-scale spatio-temporal decoupling for contrastive learning of skeleton action recognition
    Xiaoxia LIU, Liqun KUANG, Song WANG, Shichao JIAO, Huiyan HAN, Fengguang XIONG
    2026, 46(3):  767-774.  DOI: 10.11772/j.issn.1001-9081.2025030310
    Asbtract ( )   HTML ( )   PDF (1003KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of dynamic action modeling and multi-scale temporal fusion in skeleton action recognition, an efficient Multi-scale Spatio-Temporal Decoupled Contrastive Learning Framework (MSTDCLF) was proposed. Firstly, a Multi-scale Spatio-Temporal Feature enhancement module (MSTF) was designed to combine depth separable convolution and dilated convolution, so as to model short-term motion features and long-term behavior patterns simultaneously. Secondly, the semantic response between joints and feature channels was further strengthened by embedding the channel-spatial joint attention mechanism. Thirdly, a residual network with attention mechanism was used to solve the gradient decay problem of deep network structure. Finally, a Bidirectional Gated Spatio-temporal Context Modeling (BGSCM) was proposed, and a spatio-temporal enhancement branch was constructed on the basis of Bidirectional Long Short-Term Memory (BiLSTM) network, and the decoupled features were transmitted in joint topology and temporal axis through the gating mechanism, thereby suppressing noise interference and establishing complete action evolution dependency. Experimental results show that MSTDCLF has the accuracies of 87.5% (Cross-Subject (CS)) and 93.0% (Cross-View (CV)) on the NTU RGB+D 60 dataset, and the accuracies of 79.3% (CS) and 80.6% (crosS-Setup (SS)) on the NTU RGB+D 120 dataset, all of which are better than those of the suboptimal method SCD-Net (Spatiotemporal Clues Disentanglement Network). Ablation experiments verify the effectiveness of the multi-scale design and bidirectional gating mechanism, indicating that MSTDCLF can achieve efficient behavior representation in skeleton behavior recognition and improve recognition accuracy effectively.

    Data science and technology
    Rare sequential pattern mining method with adaptive gap under one-off condition
    Hao LI, Lei WANG, Le SUN, Youxi WU
    2026, 46(3):  775-780.  DOI: 10.11772/j.issn.1001-9081.2025030305
    Asbtract ( )   HTML ( )   PDF (542KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Rare sequential pattern mining aims to discover infrequent and important patterns in sequence databases. However, current sequential pattern mining methods mostly determine whether a pattern occurs in a sequence or not, ignoring the repetition of the pattern in the sequence, that is, the user’s level of interest, resulting in bias in the mining results. To tackle this issue, a rare sequential pattern mining method with adaptive gap under one-off condition was proposed, namely ORP (One-off Rare sequential Pattern mining). In the method, the number of repetitions of the pattern in the sequence was calculated using one-off condition, and the sequence features were reflected using adaptive gaps. To avoid the inefficient sequential traversal of the original database in support calculation process required by the traditional algorithms, an inverted index structure was established, which stores each transaction and its location information occurred in the original database, thereby eliminating the need for any redundant traversal of the database and improving efficiency of the support calculation. Besides, in the process of candidate pattern generation, a pattern connection strategy was used to generate candidate patterns. To further reduce the number of candidate patterns, a pruning strategy was proposed, thereby improving the mining speed. Ablation experimental results on five real datasets show that the running time of the proposed method is significantly shorter, thus verifying the superiority of the proposed method.

    Conflict-aware k-nearest neighbor query over moving objects in road networks
    Rui ZHANG, Ziqiang YU, Mingjin TAO, Yujie BAI
    2026, 46(3):  781-789.  DOI: 10.11772/j.issn.1001-9081.2025040442
    Asbtract ( )   HTML ( )   PDF (773KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The k-nearest neighbor (kNN) query over moving objects is one of the important research topics in Location-Based Service (LBS). Unlike kNN queries with a single query point, when a large number of query points issue query requests simultaneously, the same moving object may appear in the query results of multiple query points. This situation is called query result conflict, and multiple kNN queries involved in such conflicts are called conflict-aware kNN queries. In practical applications, the same moving object involved in query result conflicts cannot be assigned to multiple queries simultaneously. Instead, each query must retrieve k moving objects that differ from the results of other queries. Therefore, a globally optimized algorithm for conflict-aware kNN queries named RBCS-KNN (Road-Based Conflict Sensitive K Nearest Neighbor query algorithm) was proposed for road networks. Firstly, a two-layer index structure was built on the basis of the road network subgraphs after dividing. Secondly, by subgraph expansion and pruning strategies, the candidate conflicting query points were screened rapidly. Thirdly, kNN for the candidate query points were computed, and a sufficient number of candidate objects were expanded, while all conflicting query points were grouped dynamically. Finally, an optimal assignment solution was determined via an improved assignment strategy by which the sum of the distance from every query point to its k moving objects was minimized. Experimental results on multiple real-world datasets show that RBCS-KNN reduces the total query distance by 10% compared to the GLAD (Grid based LAbelling with scheDuling) algorithm, demonstrating its correctness and good performance.

    Multi-scale based multivariate time series anomaly detection model
    Chunyong YIN, Bufan ZHANG
    2026, 46(3):  790-797.  DOI: 10.11772/j.issn.1001-9081.2025030302
    Asbtract ( )   HTML ( )   PDF (693KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Multivariate time series data often exhibit multi-scale characteristics and complex interdependencies, which makes challenges for anomaly detection. To address this issue, a multi-scale based multivariate time series anomaly detection model was developed, namely M3AD (Multi-scale Mamba Multi-layer perceptron Anomaly Detection). Firstly, a multi-scale feature extraction method was employed, which means segmenting time series across different temporal scales to capture short- and long-term patterns. Secondly, a Multi-Layer Perceptron (MLP) and convolutional layers were utilized for feature learning for extracting local and high-level feature representations. Thirdly, a selective State Space Model (SSM) Mamba was introduced to capture key information in long sequences through its efficient processing capability. Finally, sensitive anomaly detection was achieved through a loss function based on KL (Kullback-Leibler) divergence and anomaly score calculation. To validate the model’s effectiveness, M3AD was compared with seven models, such as Anomaly Transformer and MEMTO, on four public datasets. Experimental results show that M3AD has significant advantages and leading performance compared to the other methods in key metrics such as precision, recall, and F1 score, verifying the effectiveness and superiority of M3AD in multivariate time series anomaly detection tasks.

    Cyber security
    Review of threats faced by federated learning in privacy and security field
    Enkang XI, Jing FAN, Yadong JIN, Hua DONG, Hao YU, Yihang SUN
    2026, 46(3):  798-808.  DOI: 10.11772/j.issn.1001-9081.2025030357
    Asbtract ( )   HTML ( )   PDF (921KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated learning, as a new type of distributed machine learning, has the potential to address data silos and privacy protection issues, but it faces potential privacy and security threats. Therefore, the cutting-edge research achievements of federated learning in privacy and security field were reviewed systematically, the basic concepts and workflow of federated learning were elaborated in detail, and the privacy and security issues in federated learning were classified based on the current cutting-edge research achievements. Firstly, the privacy threats in federated learning were analyzed, and the corresponding privacy protection methods were summarized. Secondly, the security threats in federated learning were summed up, and the corresponding defense methods against security attacks were introduced. Finally, the challenges in federated learning that need to be addressed in the future were discussed, focusing on the applications of Large Language Models (LLMs) such as ChatGPT and DeepSeek in federated learning, and the computational efficiency and privacy leakage challenges brought by LLMs were further explored.

    Panorama and future of location privacy protection in internet of vehicles
    Lili HE, Xinru GUAN, Lei ZHANG, Sheng JIANG, Chengjie JIANG
    2026, 46(3):  809-820.  DOI: 10.11772/j.issn.1001-9081.2025030352
    Asbtract ( )   HTML ( )   PDF (943KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the development of wireless communication technology and high-precision mobile positioning technology, Internet of Vehicles (IoV) has become deeply embedded in everyday life. While IoV brings convenience to people, it also brings privacy risks. Typically, in IoV, vehicle driving information interacts with information of other vehicles and infrastructure in real time. During the interaction process, privacy issues such as the leakage of sensitive information may occur. Firstly, the location privacy architecture and privacy risks of IoV were introduced. Secondly, the dynamic noise allocation mechanism, multi-dimensional differential privacy trajectory protection and data perturbation technology in differential privacy were presented. Thirdly, the spatial generalization based on anonymization and the K-anonymity, as well as the asymmetric encryption, symmetric encryption, and homomorphic encryption of encryption mechanism were introduced. Finally, the advantages, disadvantages, limitations and other aspects of differential privacy, anonymity, and encryption mechanisms were analyzed and evaluated.

    Adversarial purification method based on directly guided diffusion model
    Yan HU, Peng LI, Shuyan CHENG
    2026, 46(3):  821-829.  DOI: 10.11772/j.issn.1001-9081.2025030384
    Asbtract ( )   HTML ( )   PDF (948KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Deep Neural Networks (DNNs) are susceptible to adversarial perturbations, so that attackers may deceive DNNs by adding imperceptible adversarial perturbations to the image. The adversarial purification methods based on diffusion model use diffusion models to generate clean samples to defend against such attacks, but the diffusion models themselves are also susceptible to adversarial perturbations. Therefore, an adversarial purification method named StraightDiffusion was proposed, in which the diffusion process of diffusion model was guided by adversarial samples directly. Firstly, key problems and limitations of the existing methods during the used of diffusion models for adversarial purification were discussed. Secondly, a new sampling method was proposed, in which a two-stage guidance approach was used in the denoising process — head guidance and tail guidance, which means guidance was applied only in the early and late stages of denoising process, and not in other stages. Experimental results on the CIFAR-10 and ImageNet datasets using three classifiers: WideResNet-70-16, WideResNet-28-10, and ResNet-50 show that StraightDiffusion outperforms baseline methods in defense performance. Compared to the methods such as diffusion models for adversarial purification (DiffPure method) and Guided Diffusion Model for Purification (GDMP), StraightDiffusion achieves the best standard and robust accuracies on both CIFAR-10 and ImageNet datasets. The above verifies that the proposed method can improve purification performance, thereby enhancing the robust accuracy of classification models against adversarial samples and achieving effective defense under multiple attack scenarios.

    Terminal data privacy-preserving scheme based on hierarchical federated learning
    Huan PING, Zhanguo XIA, Sicheng LIU, Qihan LIU, Chunlei LI
    2026, 46(3):  830-838.  DOI: 10.11772/j.issn.1001-9081.2025040437
    Asbtract ( )   HTML ( )   PDF (2453KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated learning, as a distributed machine learning framework for data privacy protection, faces several challenges such as that the traditional cloud-edge-device structure cannot meet the increasingly strict data security regulations and the exponential growth of data. At the same time, the lightweight trend of edge devices leads to limited computing capabilities. To address these issues, an Optional Hierarchical Federated Learning (OHFL) scheme facing multi-level supervision and heterogeneous resource environments was proposed. Firstly, a tree-structured multi-layer architecture was adopted to configure the number of edge layers and the number of servers per layer flexibly, so as to adapt to supervision levels and resource distribution in different scenarios. Secondly, a differential privacy mechanism was introduced on the basis of ensuring communication efficiency, homomorphic encryption was applied to certain neural network layers, and the encryption tasks were offloaded to the nearest edge servers to distribute the computational burden. Experimental results show that the OHFL scheme achieves a classification accuracy of 97.82% on the MNIST Independent Identically Distributed (IID) dataset, with the additional encryption mechanism using Convolutional Neural Network (CNN) and ResNet18 incurring only a time overhead of 6.24 s and 6.80 s per round, verifying that the OHFL scheme improves the system efficiency of computation and communication significantly. It can be seen that the proposed scheme improves the security and adaptability of hierarchical federated learning architecture theoretically, and provides a feasible solution for meeting multi-level supervision and efficient secure computation in practical applications, especially suitable for scenarios such as Internet of Things (IoT) and healthcare with data sensitivity and resource heterogeneity.

    Difference-based shared key extraction scheme via multi-level quantization
    Qiong LI, Chunyi CHEN, Zhenzhong ZHANG, Bo YU, Haiyang YU, Xiaojuan HU
    2026, 46(3):  839-846.  DOI: 10.11772/j.issn.1001-9081.2025030297
    Asbtract ( )   HTML ( )   PDF (823KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Legitimate communicating parties can leverage the randomness of wireless channel state to extract shared key sequences that are information-theoretically secure. To enhance the efficiency of wireless channel key extraction, a difference-based shared key extraction scheme via multi-level quantization was proposed. In the scheme, random modulation was employed to perform high-frequency sampling of the wireless channel, and two quantization algorithms integrated with random sampling difference — Adaptive Symbol Quantization (ASQ) and Balanced Multi-bit Modified Quantization (BMMQ) — were introduced to process the first-order differential sequence, so as to obtain the original key sequence. On this basis, an information negotiation algorithm was applied to correct inconsistent bits in the original key, and the signal was reconstructed using the original key and the first-order differential sequence, and then the signal was requantized, ultimately achieving key synchronization between legitimate communicating parties. Experimental results demonstrate that random sampling difference reduces the correlation coefficient between adjacent sample points to below e?1, thereby decreasing statistical dependence in the key sequence effectively; under a Signal-to-Noise Ratio (SNR) of 25 dB, the ASQ algorithm reduces the Key Disagreement Rate (KDR) to 3.8×10?? while maintaining an Original Key Extraction Rate (OKER) of 0.86; under lossless quantization conditions, the BMMQ algorithm reduces the KDR to 7×10?3. The finally generated shared key sequences pass the NIST (National Institute of Standards and Technology) randomness test, validating the security and effectiveness of the keys.

    Cross-domain network defense strategy conflict detection method based on multi-domain policy graph
    Xinlu LIU, Dexian CHANG, Jingkun ZHANG, Dawei ZHANG
    2026, 46(3):  847-856.  DOI: 10.11772/j.issn.1001-9081.2025091182
    Asbtract ( )   HTML ( )   PDF (765KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems such as insufficient inter-domain collaboration ability, heterogeneous resource identifiers and low detection efficiency in conflict detection of cross-domain defense strategies in programmable networks, a cross-domain defense strategy conflict detection method based on multi-domain policy graph was proposed. Firstly, an intent-driven defense strategy model was constructed on the basis of the general JSON language, and the precise association between defense intentions and defense strategies was achieved through semantic label injection, so as to solve closed problem of the single-domain strategy model. Secondly, the Layered Hash Mapping (LHM) algorithm was utilized to generate Global Resource Identifier (GRI), thereby solving the problem of resource identifier conflicts in multi-controller domains. Finally, by constructing a Multi-Domain Joint Policy Graph (MD-JPG), as well as integrating the topological, action and resource dependencies among cross-domain strategies, a Cross-domain Conflict Detection algorithm based on graph traversal in four Dimensions, (CDC-4D) was designed to identify action conflicts, rule coverage conflicts, resource competition conflicts and strategy type conflicts accurately. Experimental results show that in the multi-controller network defense scenarios, the strategy conflict detection latency, memory usage, and detection F1-score of the proposed method have achieved good results.

    Network intrusion detection based on hybrid sequence model and federated class balance algorithm
    Kaiguang MA, Xuebin CHEN, Yinlong JIAN, Liu WANG, Yuan GAO
    2026, 46(3):  857-866.  DOI: 10.11772/j.issn.1001-9081.2025030296
    Asbtract ( )   HTML ( )   PDF (739KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Network Intrusion Detection System (NIDS) plays a crucial role in network security defense. Traditional rule-based matching methods struggle to detect unknown attacks effectively, while deep learning enhances detection performance, but is constrained by the data privacy issue caused by centralized training. In this context, federated learning enables collaborative learning while preserving data privacy through local training and parameter sharing, offering a feasible solution for network intrusion detection. However, federated learning still faces challenges in NIDS applications, including the temporal dependency of network traffic and imbalanced data distribution, which limits the federated model’s ability to detect minority-class attacks. Therefore, a hybrid framework that integrates bidirectional Recurrent Neural Network (RNN) parallel architecture with Federated Class Balancing (FedCB) algorithm was proposed, so as to enhance the temporal modeling capabilities and optimize the federated aggregation strategy. Experimental results on the NSL-KDD dataset demonstrate that the proposed algorithm achieves superior performance in a five-class classification task. Compared with the intrusion detection model combining federated learning and convolutional neural networks — CNN-FL and FL-SEResNet (Federation Learning Squeeze-and-Excitation network ResNet), the proposed algorithm improves the accuracy by 3.30 and 1.48 percentage points, respectively, highlighting the effectiveness of the proposed method in federated learning-based intrusion detection.

    Advanced computing
    Low-latency scheduling for data-dependent cryptographic services
    Mingxu GONG, Wei ZHANG, Wendi FENG, Huaping MU
    2026, 46(3):  867-876.  DOI: 10.11772/j.issn.1001-9081.2025040391
    Asbtract ( )   HTML ( )   PDF (934KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the diversity of cryptographic service types and complex interdependencies between cryptographic tasks in cryptographic clouds, leading to frequent communication between heterogeneous cryptographic engines, which can cause large communication latencies, a mathematical model for optimal low-latency task mapping for cryptographic services for data dependency types in cryptographic clouds was constructed. The objective of the model is to minimize the maximum service completion time. And it is proved that the problem is an Non-deterministic Polynomial hard (NP-hard) problem. To this end, an efficient heuristic scheduling algorithm was designed. Firstly, a task-length threshold mechanism was established on the basis of historical scheduling data analysis for initial task allocation optimization. Secondly, a critical task identification method was used to locate potential latency bottleneck tasks, and the scheduling order of critical tasks was adjusted dynamically to reduce the impact on the overall completion time. Finally, a task transfer-execution time balancing strategy was adopted to further optimize the distribution of tasks between heterogeneous engines, so as to reduce the overall task latency. Experimental results show that on a small-scale dataset, the average gap between the cryptographic service completion time of the proposed algorithm and the optimal solution is only 8.67%, with a scheduling speed improvement of 8.62 times. On a large-scale dataset, the cryptographic service completion time is reduced by 84.67% and 82.15% compared to the random and long-task-first methods, respectively.

    Multi-objective discrete hiking optimization algorithm for emergency medical supply scheduling considering psychological cost under trauma
    Yong LIU, Siwen HUANG, Liang MA, Jiawei WU
    2026, 46(3):  877-886.  DOI: 10.11772/j.issn.1001-9081.2025040418
    Asbtract ( )   HTML ( )   PDF (772KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the emergency medical supply scheduling problem in public health emergencies, the psychological cost under trauma for disaster victims was introduced as an optimization objective based on minimizing transportation time and the number of vehicles, measuring the psychological stress difference experienced by victims due to delayed material delivery, and a multi-objective emergency medical supply scheduling model was proposed, aiming to minimize psychological cost under trauma, transportation time, and the number of vehicles. Given Non-deterministic Polynomial-time hard (NP-hard) nature of the model, a Multi-objective Discrete Hiking Optimization Algorithm (MDHOA) was designed. The emergency medical supply scheduling solution was encoded as a delimiter-free integer sequence, which was then decoded using the Split segmentation method. An improved nearest neighbor heuristic method was employed to optimize the initial solution, and a hiking population-driven multi-objective optimization mechanism was introduced to enhance the search capability. Experimental results show that the proposed algorithm overall outperforms comparison algorithms such as Non-dominated Sorting Genetic Algorithm Ⅱ (NSGA-Ⅱ), Improved NSGA-Ⅱ (INSGA-Ⅱ), and Improved Multi-Objective Honey Badger Algorithm (IMOHBA) in terms of three metrics: HyperVolume (HV), Overall Nondominated Vector Generation (ONVG), and Inverted Generational Distance (IGD) on the Solomon benchmark set, demonstrating superior solution coverage and stability. In a real-world case study of Haidian District, Beijing, the proposed model exhibits strong adaptability and practical feasibility. Sensitivity analysis results indicate that the psychological cost coefficient for disaster victims and vehicle capacity have significant impacts on the scheduling strategy.

    Network and communications
    Federated learning with two-pass communication compression for privacy-sensitive IoT data
    Lei WANG, Wenxuan ZHOU, Ninghui JIA, Zhihao QU
    2026, 46(3):  887-898.  DOI: 10.11772/j.issn.1001-9081.2025040436
    Asbtract ( )   HTML ( )   PDF (1136KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the significant communication overhead and gradient inversion privacy leakage risks of federated learning in Internet of Things (IoT) scenarios, a two-pass communication compression framework named QPR (Quantization and Pull Reduction) was proposed. Firstly, through gradient quantization, training nodes were utilized to compress local gradients before uploading them to the server, thereby reducing gradient transmission overhead. Secondly, a probability threshold-based delayed model download mechanism (lazy pulling) was introduced to reduce model synchronization frequency, and the global model was synchronized by training nodes. For other iterations, locally historical models were reused. Finally, rigorous theoretical analysis was performed to confirm that QPR achieves the same progressive order as the standard federated learning algorithm without communication compression — Federated Average (FedAvg) and possesses the linear speedup property with increasing number of nodes, thereby ensuring system scalability. Experimental results demonstrate that QPR enhances communication efficiency on multiple benchmark datasets and machine learning models significantly. Taking the ResNet18 model training task on the CIFAR-10 dataset as an example, QPR achieves communication speedup ratio up to 8.27 compared to uncompressed FedAvg, without any loss in model accuracy.

    Radiation performance analysis of localizer based on backward ray-tracing method
    Huan LIN, Yuanpeng KANG, Fei LIANG, Zhengbo YANG, Rui SHI, Xiaorong JING
    2026, 46(3):  899-906.  DOI: 10.11772/j.issn.1001-9081.2025030380
    Asbtract ( )   HTML ( )   PDF (903KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Instrument Landing System (ILS) is a critical navigation device for ensuring flight safety, whose signal quality affects the accuracy and safety of aircraft landing accuracy and safety directly. However, the increasingly complex electromagnetic environment around airports leads to more and more multipath effects, thereby impacting the reliability and precision of ILS signals significantly. Therefore, taking LOCalizer (LOC) as the research object, a signal propagation path analysis method based on backward ray-tracing method was proposed. In the method, by establishing an electromagnetic propagation model of the airport environment and incorporating ray-tracing and path validity determination rules, the propagation characteristics of ILS signals in multipath environments were investigated systematically. And the influence of signal reflection and diffraction effects caused by obstacles on the Difference in Depth of Modulation (DDM) of the airborne LOC receiving signals were particularly analyzed. Simulation results demonstrate that when obstacles are near the runway centerline, DDM experiences significant fluctuations, with the maximum jitter reached 283.4% of the value specified by the International Civil Aviation Organization (ICAO) approximately. And after adjusting the obstacle positions appropriately, occlusions between obstacles and the increased distance between obstacles and the runway centerline lead to a reduction in multipath interference. As a result, the fluctuation of DDM decreases significantly and meets the ICAO’s prescribed limit, with the maximum jitter reached 99.0% of the specified value approximately. The above verifies that this method can assess the impact of multipath propagation on ILS performance in complex airport environments effectively.

    Multimedia computing and computer simulation
    Fast neural implicit surface reconstruction based on dynamic occupancy grid
    Suling XUE, Jinxu HE, Wenjie WEI, Yingying HE, Lu LOU
    2026, 46(3):  907-914.  DOI: 10.11772/j.issn.1001-9081.2025030333
    Asbtract ( )   HTML ( )   PDF (1165KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the imbalance between efficiency and accuracy in Neural implicit Surface (NeuS) reconstruction caused by dense sampling and volumetric rendering, a fast and efficient reconstruction method was proposed. Firstly, dynamic occupancy grids were combined with density thresholds to optimize and select effective sampling points, thereby reducing redundant computation and memory usage. Then, multi-resolution hash encoding was integrated with Truncated Signed Distance Field (TSDF) to enhance gradient stability and noise resistance ability through truncated distance constraint and hash feature interpolation. Finally, photometric consistency constraint was introduced to optimize across-view geometric consistency using Normalized Cross-Correlation (NCC), thereby improving the quality of surface reconstruction. Experimental results show that on the synthetic dataset NeRF-Synthetic, the proposed method reduces the training time by 98.8% compared to Neural Radiance Field (NeRF). Although the training time of the proposed method is longer than that of Instant-NGP (Instant Neural Graphics Primitives) accelerated by CUDA (Compute Unified Device Architecture), it achieves significantly superior surface reconstruction quality. On the real dataset DTU, the method achieves a surface reconstruction Chamfer Distance (CD) and a new-view synthesis Peak Signal-to-Noise Ratio (PSNR) significantly better than most existing methods, with CD reduced by 1.21 mm and PSNR higher by 2.93 dB compared to Instant-NGP. Though the metrics of the proposed method are slightly worse than Neuralangelo (with CD higher by 0.02 mm and PSNR lower by 2.05 dB), the training time of the proposed method is reduced by 97.5% compared to Neuralangelo. It can be seen that this method can improve training efficiency and ensure reconstruction accuracy effectively with good practicality, and is suitable for efficient 3D reconstruction in complex structural scenes.

    Human-centric detail-enhanced virtual try-on method
    Peirong SHAO, Suzhen LIN, Yanbo WANG
    2026, 46(3):  915-923.  DOI: 10.11772/j.issn.1001-9081.2025040475
    Asbtract ( )   HTML ( )   PDF (1199KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the limitations of current virtual try-on methods in preserving local details of target garments adequately, and the problem that when diffusion model is used for generation, the Variational AutoEncoder (VAE)'s mapping of input data to low-dimensional space leads to loss of high-frequency detailed features in model’s hands and face, a human-centric detail-enhanced virtual try-on method was proposed. Firstly, the clothing-agnostic human body map, human pose map, and target garment were input into a Geometric Matching Module (GMM) to generate a coarsely warped garment result. Secondly, a Garment Wrap Refinement (GWR) module was constructed to enhance the detailed features of the coarsely warped garment. Thirdly, the warped garment map, clothing-agnostic human body map, and human pose map were concatenated and fed into a UNet with textual features, and textual and image features were fused to generate a clear image progressively through denoising. Fourthly, a Mask Feature Connection (MFC) module was constructed, and a coordinate attention was introduced, so as to localize the model’s position more accurately and preserve high-frequency detailed features in hands and face, thereby ensuring human-centric results. Finally, the output of MFC module and UNet were fused and decoded to obtain the final try-on results. Experimental results demonstrate that the proposed method achieves a 1.41% improvement in Structural Similarity Index Measure (SSIM) metric on the Dress Code dataset, along with reductions of 7.32%, 31.03%, and 64.56% in Learned Perceptual Image Patch Similarity (LPIPS), FID (Fréchet Inception Distance), and KID (Kernel Inception Distance) metrics, respectively, compared to the LADI-VTON (LAtent DIffusion-Virtual Try-ON) method, verifying that the proposed method achieves superior performance in virtual try-on.

    Elastic medical image registration model with high-frequency preservation based on spectrum decomposition
    Yongwei JIANG, Xiaoqing CHEN, Linjie FU
    2026, 46(3):  924-932.  DOI: 10.11772/j.issn.1001-9081.2025030322
    Asbtract ( )   HTML ( )   PDF (4771KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Elastic registration is regarded as a key task in medical image processing, whose performance directly affects the accuracy of subsequent tasks such as segmentation, classification, and prediction. However, due to the insensitivity of neural networks to high-frequency components, the existing methods have difficulty in capturing high-frequency information in images, which affects the fitting accuracy of registration field. To address this issue, a high-frequency-preserving medical image registration model based on frequency spectrum decomposition — DFRes (Decomposition in Frequency domain model for Registration) was proposed. In the model, a frequency decomposition strategy was introduced, and a dual-branch structure was adopted to process high- and low-frequency information from the original image. Meanwhile, an Invertible Neural Network (INN) structure with high-frequency preservation characteristics and a bridge-style feature fusion module with ability to fuse high- and low-frequency information were designed, and an alternating spatial-frequency information extraction module was used to further enhance the model’s ability to extract and fuse frequency- and spatial-domain information. Experimental results of comparing DFRes and the existing advanced models on the IXI, OSSAI, and Huaxi rectal cancer datasets show that DFRes achieves significant improvements on multiple metrics. On IXI dataset, compared to the TransMorph model, DFRes has the Dice Similarity Coefficient (DSC) increased by 2.5 percentage points, the Average Surface Distance (ASD) reduced by 0.012, and the Structural SIMilarity (SSIM) increased by 1.6 percentage points. At the same time, the effectiveness of the module design is verified through ablation experiments.

    Transformer image dehazing based on component collaborative optimization pruning
    Jixin GUO, Ting ZHANG
    2026, 46(3):  933-939.  DOI: 10.11772/j.issn.1001-9081.2025040395
    Asbtract ( )   HTML ( )   PDF (1399KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Image dehazing algorithms based on Transformer achieve good dehazing effects, but there are problems such as large number of network parameters and low dehazing speed. In order to prune redundant parts of the dehazing network directionally and shorten dehazing time without affecting dehazing quality, a Transformer image dehazing method based on component collaborative optimization pruning, CCOP-IDT (Component Collaborative Optimization Pruning Image Dehazing Transformer), was proposed. Firstly, a 5-level Transformer was used to construct a pre-training model of dehazing network. Then, the network pruning was modeled as an optimization problem, Fisher information was used to evaluate the importance of weight parameters, and Hessian matrix was used to measure the joint influence of pruning components on network output, so as to establish a collaborative optimization method for multiple pruning components. Finally, an evolutionary algorithm was employed to solve the optimal pruning rate sequence, so as to obtain the optimal sub-network of the pre-trained model. Experimental results show that after pruning, the number of network parameters is controlled to 0.476×106, which is reduced by 28.8% compared with that before pruning, and the dehazing time is shortened by 25.0%. On the synthetic hazy dataset RESIDE-6K, the proposed method has the Peak Signal-to-Noise Ratio (PSNR) reached 29.60 dB, and the Structural SIMilarity (SSIM) reached 0.968 7, which are reduced by only 1.63% and 0.46% compared with those before pruning, respectively. It can be seen that in terms of both quantitative and qualitative evaluation, the proposed method performs well with great reduction of the model parameters and improvement of the image dehazing speed while maintaining the quantitative indices and subjective perception basically.

    Municipal solid waste incineration state recognition method based on multilayer preprocessing
    Jian ZHANG, Jianbo YU, Jian TANG
    2026, 46(3):  940-949.  DOI: 10.11772/j.issn.1001-9081.2025030368
    Asbtract ( )   HTML ( )   PDF (1682KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Due to the strong contamination, high noise level, excessive exposure, and other problems in flame images from domestic Municipal Solid Waste Incineration (MSWI) processes, traditional target recognition methods are difficult to apply to them. Therefore, an MSWI incineration image classification framework — SAswin with Multilayer Preprocessing Network (SAswin-MPNet) was proposed. Firstly, a Transformer-based Hybrid Attention Super-Resolution Transformer (HASRT) module was designed to perform super-resolution reconstruction to the images. Secondly, a Practical Exposure Correction (PEC) module was introduced to correct the exposure of high-resolution MSWI images, thereby obtaining multilayer preprocessed data. Additionally, a validation algorithm was designed to compare and test the preprocessed images and the originals, and the images meeting a validation threshold were used to replace the originals, thereby obtaining a multilayer preprocessed dataset. Finally, an SAswin classification network was constructed to recognize incineration states. Experimental results based on actual operational data from an MSWI power plant comparing with ResNet-34, ResNet-50, ConvNeXt, ViT (Vision Transformer), Swin-T (Swin-Tiny), and EVA-02 (Enhanced Visual Assistant-02) show that SAswin-MPNet achieves the optimal MSWI image incineration state recognition accuracy and F1-score.

    Real-time vehicle detection algorithm based on YOLOv10
    Yinshan YU, Xu TANG, Mingjian DING, Wenkai HUANG, Jiawen BI, Guochen TAN
    2026, 46(3):  950-958.  DOI: 10.11772/j.issn.1001-9081.2025040487
    Asbtract ( )   HTML ( )   PDF (1273KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the advancement of autonomous driving technology, real-time vehicle detection has become crucial for ensuring system safety and reliability. Therefore, a lightweight detection model based on YOLOv10, named YOLOv10-LITE was designed by introducing four structural improvement modules to reduce model complexity and inference latency while maintaining detection accuracy, for real-time detection tasks in resource-constrained environments. Specifically, the Dynamic Upsampling (DySample) module was applied to enhance the resolution of feature maps while reducing computational cost; the Fast Multi-Scale Network (FastMSNet) module was used to improve multi-scale feature extraction and enhance detection performance for objects of different sizes; the Spatial Pyramid Pooling-Local Selective Kernel Attention (SPPF_LSKA) module was introduced to capture long-range dependencies effectively by combining local feature selection and global contextual modeling; the Adaptive Granular Fine-grained Channel Attention (AGFCA) module was incorporated to improve critical information perception ability through collaboration between spatial and channel attention. Experimental results on the KITTI dataset show that YOLOv10-LITE achieves a mean Average Precision (mAP) of 77.1%, which is 2.4% higher than that of YOLOv10, with the parameter count reduced by 8.7 percentage points. The above results verify the proposed model’s practicality in autonomous driving scenarios with both computational constraints and real-time demands.

    Quantitation and grading method for ceramic tile chromatic aberration based on improved fractal encoding network
    Songsen YU, Huang HE, Guopeng XUE, Hengtuo CUI
    2026, 46(3):  959-968.  DOI: 10.11772/j.issn.1001-9081.2025030361
    Asbtract ( )   HTML ( )   PDF (2616KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the result instability caused by subjective visual estimation dependence in traditional ceramic tile color difference detection methods, a method integrating texture and color features was proposed for quantitation and grading of chromatic aberration in ceramic tiles. A large-scale dataset named TILE-TCR (TILE Texture and Color Recognition), containing both texture and color labels, was constructed to enhance the model’s ability to represent texture and color features. At the same time, a color difference grading dataset named TILE-CAG (TILE Chromatic Aberration Grade) was established to support the color difference classification task. Based on these datasets, the network structure of Fractal Encoding Network (FENet) was improved by introducing Spatial Pyramid Pooling (SPP) and SE (Squeeze-and-Excitation) modules, thereby extracting multi-task features and focusing on critical regions. Then, a clustering algorithm was employed to determine the thresholds for color difference grading adaptively, thereby implementing objective quantification of color difference grading. Experimental results show that the proposed improved method achieves an accuracy of 92.82% in the ceramic tile texture classification task, representing a 1.84 percentage point improvement compared to the FENet; in the color difference grading task, the accuracy, precision, recall and F1 score of the proposed method exceed 90%. Furthermore, a simulated production line was built for industrial deployment and real-time performance test of the model. On commonly used ceramic tiles, the average detection time of the proposed method is under 3 seconds, meeting the real-time requirements for online inspection of industrial conveyor belts.

    Lightweight method for transmission line defect detection
    Ping HUANG, Qing LI, Haifeng QIU, Chengsi WANG, Anzi HUANG, Long FAN
    2026, 46(3):  969-979.  DOI: 10.11772/j.issn.1001-9081.2025030340
    Asbtract ( )   HTML ( )   PDF (1670KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As the core transmission and distribution carrier of the power system, the operating condition of high-voltage transmission lines directly impacts the safety of the power grid. To address the problems of low efficiency and high missed rate in traditional manual inspection, a lightweight method for transmission line defect detection based on a two-stage multi-modal attention mechanism and dynamic feature decoupling was proposed. In the first stage, accurate localization of key components was achieved on the basis of an improved lightweight detection network, Light-YOLO. In the second stage, a dual-branch contrastive learning-based defect detection network, Dual-DifferNet, was built to achieve precise classification and identification of defects. In the design of Light-YOLO, a hybrid structure of hierarchical Separable Vision Transformer (SepViT) and deep Deformable Convolutional Network (DCN) was introduced, and by stacking local perception convolutional layers and global attention Transformer blocks alternately, the model’s modeling capability of long-range dependencies was enhanced while reducing computational cost, thereby improving the detection accuracy of small targets such as insulators and conductor splices effectively. For the defect classification task, in Dual-DifferNet, a dual-branch structure was adopted to embed a Spatial-Channel Dual Attention (SCDA) module in each branch, and the dual-modal feature interaction was promoted using a cross attention mechanism, thereby improving the robustness and generalization capability of defect identification. Experimental results show that the proposed method achieves a mean Average Precision (mAP@50) of 96.9%, which is 16.1 percentage points higher than that of the baseline model YOLOv8, with the floating-point operations reduced by 56.73%, fully verifying the method’s high detection accuracy, excellent computational efficiency, and deployment potential.

    Frontier and comprehensive applications
    Review of deep learning applications in severe convective weather prediction
    Min CHEN, Xiaolin QIN, Shaohan LI, Hao YANG, Taohong LI
    2026, 46(3):  980-992.  DOI: 10.11772/j.issn.1001-9081.2025020184
    Asbtract ( )   PDF (1746KB) ( )  
    References | Related Articles | Metrics

    The groundbreaking advancements in deep learning technology establish a new paradigm for interdisciplinary research in “AI + meteorology”, and severe convective weather prediction emerges as a cutting-edge research focus due to its complex dynamical characteristics and significant socioeconomic impacts. Therefore, the theoretical advancements and methodological innovations of Deep Neural Networks (DNN) in severe convective weather prediction were reviewed systemically, and their specific applications were explored deeply. Firstly, based on the spatio-temporal sequence prediction paradigm, the mechanisms of high-frequency feature extraction by Recurrent Neural Networks (RNNs) and non-RNNs in meteorological sequence modeling were dissected. Secondly, from a generative modeling perspective, the advantages of Generative Adversarial Network (GAN) and diffusion model in probabilistic prediction of extreme weather events were demonstrated. Thirdly, the theoretical breakthroughs of meteorological large-scale models realizing multimodal data fusion and cross-scale feature learning via pre-training and fine-tuning paradigms were revealed, along with their generalization enhancement mechanisms in global numerical weather prediction. Fourthly, aiming at model evaluation systems, the limitations of traditional statistical metrics in extreme weather prediction were analyzed, and pathways for constructing novel evaluation frameworks such as physical consistency constraints were discussed. Finally, the key scientific challenges faced currently and future research directions were distilled, aiming to provide theoretical support and methodological references for constructing the next-generation intelligent system for severe convective weather prediction.

    Intelligent undergraduate teaching evaluation system based on large language models
    Bin SHEN, Xiaoning CHEN, Hua CHENG, Yiquan FANG, Huifeng WANG
    2026, 46(3):  993-1003.  DOI: 10.11772/j.issn.1001-9081.2025030334
    Asbtract ( )   HTML ( )   PDF (1094KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As a critical way of higher education quality assurance, the scientific and rational implementation of undergraduate teaching audit and evaluation impacts the level of talent cultivation in universities directly. However, traditional manual review modes are inefficient and subjective when faced with massive heterogeneous data, making it difficult to meet the demands for accuracy and standardization in undergraduate teaching evaluation. Therefore, an intelligent undergraduate teaching evaluation system based on Large Language Models (LLMs) and multi-agent architecture — SmartEval — was proposed. In the system, input contents were parsed through a semantic understanding module, the tasks were decomposed and scheduled using a planner, and a Retrieval-Augmented Generation (RAG) module was integrated with three types of agents: question-answering, summarization, and diagnostics to realize end-to-end automation of the entire process of “data collection-metric analysis-decision support”. Experimental results based on the “1+3+3” series reports on 2023 undergraduate teaching evaluation of selected universities demonstrate that SmartEval outperforms the existing mainstream LLMs, such as GLM-4 and Qwen2.5, in metrics such as question-answering accuracy, ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation L) score for summarization, and F1-value for diagnostics significantly. Furthermore, consistency tests with expert groups validate the reliability of SmartEval results.

    UAV swarm formation recognition algorithm based on multi-scale complex networks
    Tingquan DENG, Yuling LI, Yonghang REN, Tian XIA, Kunfu WANG, Shengchun WANG
    2026, 46(3):  1004-1010.  DOI: 10.11772/j.issn.1001-9081.2025030362
    Asbtract ( )   HTML ( )   PDF (766KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    When dealing with incoming Unmanned Aerial Vehicles (UAVs), it is crucial to recognize the formation of enemy UAVs quickly and accurately, in order to analyze and judge enemies’ combat intentions and formulate effective countermeasures. Therefore, a UAV swarm formation recognition algorithm based on multi-scale complex networks was proposed. Firstly, an adaptive threshold method was established to construct multi-scale complex networks using the UAV swarm formation, and the combination of eigenvalues corresponding to the adjacency matrices of these complex networks was selected to form a shape signature. Then, by introducing Hellinger distance to measure the difference between the shape signature of the formation to be recognized and the standard formation, so as to obtain the recognition results. Simulation results show that compared with the algorithm of obtaining multi-scale complex networks with hard thresholds, the proposed algorithm has better adaptability and robustness, has a higher recognition rate even when the target information is heavily corrupted, and has fewer parameters and lower time complexity.

    Flight operation performance evaluation method based on gradient boosting regression tree
    Lin WEI, Haimin LI, Yalan YE, Yufei XING, Peng CHEN
    2026, 46(3):  1011-1022.  DOI: 10.11772/j.issn.1001-9081.2025081052
    Asbtract ( )   HTML ( )   PDF (824KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problem that traditional flight operation performance evaluation methods are highly subjective, analyze parameters one-sidedly, and cannot be quantified comprehensively and objectively, a Multi-dimensional Feature Analysis based Gradient Boosting Regression Tree (MFA-GBRT) method was proposed. In the method, time-domain and trend features of Quick Access Recorder (QAR) data were extracted, and an improved Gradient Boosting Regression Tree (GBRT) was combined with a threshold cumulative importance filtering mechanism, an evaluation index system covering “attitude control-power management-environmental response” and a performance level evaluation model were constructed. Experimental results on simulator and flight base data show that the average evaluation accuracy of the proposed method reaches 87.83%, which is 10.78%, 6.06% and 3.55% higher than those of the existing methods Long Short-Term Memory-Deep Neural Network (LSTM-DNN), curve similarity method and wavelet analysis, respectively. Cross-scenario validation show that the method has adaptability over 95% (high adaptability) in three different flight scenarios. It can be seen that the method realizes full-process objective quantitative evaluation of the flight process and provides a scientific scheme with engineering practicability for flight operation performance evaluation.

2026 Vol.46 No.2

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF