Loading...

Table of Content

    10 April 2026, Volume 46 Issue 4
    Artificial intelligence
    Hybrid optimization framework for improving Kolmogorov-Arnold network in federated learning
    Zhi JIANG, Xuebin CHEN, Changyin LUO, Ziye ZHEN
    2026, 46(4):  1023-1033.  DOI: 10.11772/j.issn.1001-9081.2025050536
    Asbtract ( )   HTML ( )   PDF (1022KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    For addressing issues such as data heterogeneity, tendency of gradients to converge to local optimum, and high computational and communication overhead in federated learning, a hybrid training framework of “key edge screening-early-stopping genetic evolution-local fine-tuning” was developed for Kolmogorov-Arnold Network (KAN), called KB-GA-KAN. First, key edges on each client were selected dynamically according to kernel function amplitude and activation sensitivity, and only the kernel coefficients of these edges were evolved genetically, enabling a global search for good initial solutions. Then, an early-stopping criterion was introduced, and collaborative optimization was achieved by combining the evolution with local Stochastic Gradient Descent (SGD). Experimental results on five Non-Independent and Identically Distributed (Non-IID) datasets demonstrate that compared to KAN framework with pure gradient training, KB-GA-KAN has test accuracy raised by an average of 1.34%, and the number of convergence rounds lowered by 42%, and it improves the robustness of heterogeneous scenarios with a slight additional computational cost. Visual results of the kernel functions further confirm that KB-GA-KAN enhances model interpretability. It can be seen that KB-GA-KAN offers a new route to balance accuracy, convergence speed, and computational cost of efficient SGD KAN under privacy-restricted conditions.

    Task-free online sparse continual learning method for high-speed data streams
    Yuchen HAN, Fenglei XU, Fan LYU, Rui YAO, Fuyuan HU
    2026, 46(4):  1034-1041.  DOI: 10.11772/j.issn.1001-9081.2025040452
    Asbtract ( )   HTML ( )   PDF (953KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Task-free online continual learning is a task-agnostic and autonomous machine learning approach, where model dynamic update is achieved through continuous adaptation to new data and forgetting suppression. In the existing online continual learning methods, model accuracy is prioritized typically at the expense of computational efficiency, making it difficult for the model to respond promptly to changes in the data stream due to the lag in training speed in high-speed data stream scenarios. To address the above challenge, an efficiency-performance co-optimized online sparse continual learning framework was proposed to overcome the limitations of conventional approaches through constructing a bidirectional sparse adaptive regulation mechanism. Firstly, a dynamic sparse topology optimization framework for parameter importance measurement was designed, so that unstructured parameter pruning was achieved by incorporating parameter sensitivity analysis. Secondly, a memory-efficiency dual-objective optimization model was established, in which computational budget allocation was adjusted dynamically based on online class distribution estimation, so as to realize optimal computational resource configuration. Finally, a gradient decoupling optimization strategy was developed to employ gradient masking to enable bidirectional optimization of both old and new knowledge, thereby accelerating model updates and preserving the integrity of the knowledge topology at the same time. The benchmark tests results show that the proposed framework has significant advantages. Compared to the baseline ER (Experience Replay), with a memory buffer of 100, on CIFAR-10 dataset the framework achieves average improvements of 4.86% in Average Online Accuracy (AOA) and 6.25% in Test Accuracy (TA) ; on CIFAR-100 dataset, the framework obtains enhancements of 13.77% in AOA and 3.08% in TA; on Mini-ImageNet dataset, it shows performance gains of 17.83% and 25.00% in AOA and TA respectively. Visualization analysis shows that the proposed framework captures underlying concept drift patterns in data streams successfully while maintaining real-time response ability. It can be seen that the proposed framework breaks through the traditional methods’ dilemma of trade-off between computational efficiency and model performance, and establishes a new paradigm for online continual learning systems in open environments.

    Hypergraph learning method via block diagonal representation
    Shengwei ZHANG, Hao WANG, Taisong JIN
    2026, 46(4):  1042-1049.  DOI: 10.11772/j.issn.1001-9081.2025050540
    Asbtract ( )   HTML ( )   PDF (821KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As a mathematical tool of representing the high-order relationship among multiple data objectives naturally, hypergraphs exhibit significant advantages compared to traditional graph-based machine learning methods. The premise of hypergraph-based learning paradigm lies in constructing hypergraph that can reflect the high-order relationships via hypergraph learning methods. However, the lack of robustness of the existing hypergraph learning methods in dealing with noise and data corruption limits their real application effects. To address this issue, a hypergraph learning method via block diagonal representation was proposed. In the method, an objective function of data reconstruction introducing the block diagonal constraint was optimized, and hyperedges were generated and the hyperedge weights were set using the obtained reconstruction coefficients. Experimental results on two image datasets adding noise demonstrate that compared with CR-HG (CorrentRopy-induced low-rank HyperGraph) method, the proposed method achieves improvements of 2.6 and 1.0 percentage points in Normalized Mutual Information (NMI) on the Coil20 image set with 40% Gaussian noise ratio and 30% salt-and-pepper noise density, respectively. Additionally, on the USPS image set with 40% Gaussian noise ratio and 30% salt-and-pepper noise density, the proposed method increases the Classification Accuracy for Classification(ACC) by 2.1 and 1.1 percentage points, respectively. It can be seen that the learning performance of the proposed method is superior to that of the existing mainstream hypergraph learning methods.

    Key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting
    Jie HU, Pengcheng LI, Jun SUN, Jiaao ZHANG
    2026, 46(4):  1050-1057.  DOI: 10.11772/j.issn.1001-9081.2025040517
    Asbtract ( )   HTML ( )   PDF (647KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing unsupervised key phrase extraction models have insufficient capability to capture complex contexts and multi-level semantic information, thereby failing to acquire multi-dimensional information. Therefore, a key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting was proposed. Firstly, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained model was employed to encode the text and candidate phrases, thereby obtaining the embedding representations. Besides, the text embeddings were optimized through weighted average pooling, and the global similarity between them and candidate phrases was calculated to achieve global information enhancement, thereby improving the understanding of semantic associations. Secondly, a graph structure-based boundary-aware local centrality calculation method was introduced to improve the ability to capture local information. Finally, multiple factors were integrated for weight calculation to evaluate the importance of candidate phrases from various dimensions. Experiments were conducted on six public datasets such as Inspec, SemEval 2017, and SemEval-2010. The results show that compared to the baseline model PromptRank, the proposed model achieves improvements in F1@5 score by 0.87 to 2.68 percentage points, has the F1@10 score increased by 1.11 to 2.24 percentage points, and the F1@15 score improved by 0.54 to 2.25 percentage points. It can be seen that the overall performance of the proposed model has been enhanced effectively.

    Multimodal recommendation method based on semantic fusion and contrast enhancement
    Haihua ZHAO, Yijun HU, Rui TANG, Xian MO
    2026, 46(4):  1058-1068.  DOI: 10.11772/j.issn.1001-9081.2025050528
    Asbtract ( )   HTML ( )   PDF (2148KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Multimodal recommendation aims to enhance user and item feature representations by integrating multimodal information, so as to improve recommendation performance. However, the existing methods still face challenges including insufficient cross-modal semantic information fusion, redundant multimodal features, and noise interference. To address these issues, a multimodal Recommendation method based on Semantic Fusion and Contrast Enhancement (SFCERec) was proposed. Firstly, a cross-modal semantic consistency enhancement framework was designed by which a global correlation graph was constructed through a multimodal semantic feature filtering mechanism, so as to aggregate common multimodal features dynamically while suppressing noise propagation. Concurrently, a multi-granularity attribute disentanglement module was introduced to separate coarse-grained common features from user behavior-driven fine-grained features from modal features, so as to mitigate feature redundancy. Secondly, a multi-level contrastive learning paradigm was proposed, so as to joint four tasks: cross-modal consistency alignment, user behavior similarity modeling, item semantic relevance constraint, and explicit-latent feature mutual information maximization, thereby enhancing representation discriminability through contrastive learning. Finally, a graph perturbation enhancement strategy was further incorporated, thereby employing noise injection and dual contrastive regularization to improve model robustness against sparse data and noise interference. Experimental results on Amazon-Baby, Amazon-Sports, and Amazon-Clothing datasets demonstrate that this method outperforms all baseline models in both Recall@20 and NDCG@20 metrics, particularly in sparse scenarios. Ablation studies further validate the effectiveness of the proposed method.

    Multimodal fact verification with cross-modal semantic association
    Huanxian LIU, Hongtao WANG, Xian’ao WANG, Hongmei WANG, Weifeng XU
    2026, 46(4):  1069-1076.  DOI: 10.11772/j.issn.1001-9081.2025050526
    Asbtract ( )   HTML ( )   PDF (726KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Semantic differences between multimodal evidences and among claims and evidences during feature fusion were addressed through a proposed Cross-Modal Semantic Association (CMSA)-based Multimodal Fact Verification (MFV) method, so as to realize cross-level semantic alignment and adaptive feature interaction, thereby eliminating semantic gaps across multi-source information, and enhancing classification performance of complex claim verification. During evidence retrieval, relevant textual evidence was retrieved from claim text, and semantically related image evidence was further filtered using the textual evidence, so as to ensure high cross-modal relevance. During claim verification, semantic alignment between text and multimodal evidences was achieved using the CLIP (Contrastive Language-Image Pretraining) model, and a Linked Claim and Evidence Attention (LCEA) module was designed, so as to reinforce semantic associations among claim text, textual evidence, and image evidence. Experimental results show that CMSA improves the F1 score on the public and self-constructed datasets CEAD (Cross-modal Evidence Augmented Dataset) by 7.27% and 6.65% at least, respectively, compared to the MOCHEG model, demonstrating its effectiveness in MFV tasks.

    Multimodal event extraction based on text-image dual-channel feature gated fusion mechanism
    Delong WANG, Haoyi WANG, Qingchuan ZHANG, Zexi SONG
    2026, 46(4):  1077-1085.  DOI: 10.11772/j.issn.1001-9081.2025050563
    Asbtract ( )   HTML ( )   PDF (1477KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to improve the alignment accuracy and fusion efficiency between different modal features in multimodal event extraction methods and enhance the model’s understanding of semantic relationship between images and texts, a multimodal event extraction model based on dual-channel “text-image” feature gated fusion mechanism named MEE-DF (Multimodal Event Extraction based on Dual-channel Fusion) was proposed. Firstly, the channel of generating text descriptions from images was expanded, the event arguments existed in the images implicitly were mined, and the information representation of event extraction was improved. Secondly, Locality Constrained Cross Attention (LCCA) mechanism was built, the geometric alignment graphs were generated to embed image information, and image features with high discrimination were extracted. Thirdly, an adversarial gating mechanism based on interactive attention maps was constructed to achieve fine-grained alignment of text entities and image objects. Finally, a dual-channel fusion feature strategy was used to filter important Patch features, remove redundant information, and improve feature integration efficiency. Experimental results on the MEED and the M2E2 public datasets show that MEE-DF has the F1 value reached 90.9% and 88.8% on the event type detection task, respectively, and the F1 value reached 73.3% and 68.1% on the Event Argument Extraction (EAE) task, respectively. It can be seen that MEE-DF is better than the existing event extraction models. Ablation experiments further demonstrate that each module of the proposed model has significant contribution to the improvement of event extraction performance.

    Airborne product metrological traceability knowledge graph construction method based on large language models
    Kaizhou SHI, Xuan HE, Guoyi HOU, Gen LI, Shuanggao LI, Xiang HUANG
    2026, 46(4):  1086-1095.  DOI: 10.11772/j.issn.1001-9081.2025040455
    Asbtract ( )   HTML ( )   PDF (2738KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Airborne products with diverse range and extensive industrial chain have a complex testing system, requiring comprehensive metrological verification work. However, airborne product data resources primarily exist in unstructured, fragmented, and multimodal forms, making it difficult to conduct overall analysis of various testing elements or trace the standardization of testing and product quality under a unified framework, thereby posing challenges to metrological work. To address this issue, the construction of knowledge graph for Metrological Traceability of Airborne Products (MT-AP) was explored by combining generative Large Language Model (LLM). Firstly, the resource types and metrological traceability links were sorted out, and a Knowledge Graph (KG) ontological model was constructed. Secondly, LLM-based work modules were designed and integrated into workflow chains. Finally, a method for constructing the MT-AP knowledge graph based on the workflow chains and prompt templates was proposed. Experiments were conducted using airborne product instance data and workflow chains. Experimental results show that the proposed method has the knowledge comprehension and naming capability scored above 0.91 basically, the text segmentation and knowledge decoupling capability scored above 0.83 basically, and the complex parameter extraction and structured capability scored above 0.85 basically. It can be seen that the proposed method exhibits satisfactory performance in key tasks of MT-AP knowledge graph construction, providing technical support for metrology engineering of airborne products.

    Complex query-based question-answering model integrating bidirectional sequence embeddings
    Hao LIANG, Shaojie QIAO
    2026, 46(4):  1096-1103.  DOI: 10.11772/j.issn.1001-9081.2025040497
    Asbtract ( )   HTML ( )   PDF (764KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional Knowledge Graph (KG) embedding methods mainly focus on link prediction for simple triples, and their modeling paradigm of “head entity-relation-tail entity” have significant limitations in handling conjunctive queries containing multiple unknown variables. To address the above issues, a complex query-based question-answering model integrating Bidirectional Sequence Embedding (BSE) was proposed. Firstly, a query encoder was constructed on the basis of a bidirectional Transformer architecture to convert the query structure into a serialized representation. Secondly, positional encoding was utilized to preserve graph structure information. Thirdly, the deep semantic associations among all elements in the query graph were modeled dynamically through Additive Attention Mechanism (AAM). Finally, global information interaction across nodes was realized, and the shortcomings of traditional methods in modeling long-distance dependencies were addressed effectively. Experiments were conducted on different benchmark datasets to verify the performance advantages of BSE model. The experimental results show that on the WN18RR-PATHS dataset, compared with GQE-DistMult-MP, BSE model achieves a 53.01% improvement in the Mean Reciprocal Rank (MRR) metric; on the EDUKG dataset, BSE model outperforms GQE-Bilinear with a 6.09% increase in the Area Under the Curve (AUC) metric. To sum up, the proposed model can be applied to query-based question-answering in different fields, and has high scalability and application value.

    Data science and technology
    Obfuscation-based protection method for scenario data in autonomous driving simulation testing
    Haiyang PENG, Tianyang LIU, Weixing JI, Fawang LIU
    2026, 46(4):  1104-1114.  DOI: 10.11772/j.issn.1001-9081.2025050548
    Asbtract ( )   HTML ( )   PDF (1510KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Simulation testing is a critical technology for verifying the safety and reliability of autonomous driving systems. To address the data leakage caused by plaintext shared use of scenario data during this process, an obfuscation protection method for simulation testing scenario data was proposed, along with a corresponding three-tier obfuscation strategy. In this method, a series of obfuscation techniques were covered, including data re-encoding, name replacement, sequence scrambling, label reconstruction, trigger condition obfuscation, and event obfuscation, and it was divided into three obfuscation levels according to obfuscation intensity, thereby enhancing scenario data security significantly without influencing simulation testing results. Experimental results demonstrate that the simulation results for obfuscated scenario data are consistent with simulation results for the original data, and the error of the method is within a reasonable range. As the obfuscation level increases, the degree of data protection also improves progressively. The first and second-level obfuscation methods have no significant impact on simulation efficiency, whereas the third-level method introduces a slight delay in simulation execution time within a reasonable range. Overall, three-level obfuscation method system maintains reasonable simulation performance while preventing data leakage effectively, providing a practical solution for the protection of autonomous driving simulation testing scenario data.

    Tree-LSTM based recommendation model for form widget
    Junhui LUO, Junbo ZHANG, Zheyi PAN
    2026, 46(4):  1115-1123.  DOI: 10.11772/j.issn.1001-9081.2025040408
    Asbtract ( )   HTML ( )   PDF (917KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Associating form widgets with an urban knowledge system to transform form filling based data into a standardized urban knowledge graph automatically presents a novel solution to the challenge of large-scale data element production. However, the form system integrates numerous widgets named using domain-specific terms such as entities, relationships, and their attributes, making it difficult for users to find the desired widget quickly, thereby giving rise to the task of form widget recommendation. To tackle three key challenges: significant variability in application scenarios, complex contextual dependencies in configuration, and sparse form configuration data, a Tree-structured Long Short-Term Memory (Tree-LSTM) network based Recommendation model for Form Widget (TRFW) was proposed. Firstly, a scenario multi-classification model was trained on rich and general form configuration data to learn dependencies between form texts and scenario features. Then, a network structure of Tree-LSTM was employed to extract structural and thematic features from form contexts, and the user configuration intent features were constructed on this basis. Concurrently, an AutoEncoder (AE)-based widget naming encoder was trained to establish loosely-coupled connections between semantically similar widget names, thereby enhancing the model’s ability to recommend semantically relevant widgets while maintaining robustness for newly added widgets. Experimental results on a publicly sourced dataset demonstrate that the proposed model has superior performance compared to all baseline models, achieving a HitRatio@1 of 84.62% and a HitRatio@10 of 97.31%, with a minimum improvement of 87.0% in HitRatio@1 over unsupervised methods, while outperforming graph convolution-based recommendation methods by at least 32.4% and 12.4% in HitRatio@1 and HitRatio@10, respectively. It can be seen that the proposed model can recommend widget lists that align with user intent effectively, thereby reducing user interaction costs during widget selection significantly.

    Time series representation method based on spectral sensing and hierarchical convolution
    Jing ZHANG, Songhua LIU, Yuanqian ZHU
    2026, 46(4):  1124-1130.  DOI: 10.11772/j.issn.1001-9081.2025040515
    Asbtract ( )   HTML ( )   PDF (758KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Time series data are widely used in fields such as power load forecasting and meteorological analysis. Extracting high-quality representations of time series is crucial for downstream prediction tasks. However, the performance of the existing methods is limited by high-frequency noise interference, difficulty in modeling long-term dependencies, and the scarcity of labels. Therefore, a time series representation method based on Spectral Filtering and Hierarchical Dilation (SFHD) was proposed. Firstly, a Spectral Filtering Block (SFB) was designed to extract multi-scale features through global and local filters, and an adaptive spectral filtering mechanism was used in the frequency domain, so as to weaken the influence of high-frequency noise. Then, a Hierarchical Dilation Block (HDB) was constructed to use exponentially dilated convolutions to enlarge the receptive field progressively, thereby enhancing the ability to capture long-term dependencies. Finally, a change-aware self-supervised pretraining strategy was proposed to force the model to understand underlying structure of the series by masking the highly-dynamic data blocks, thereby alleviating the insufficiency of labeled data. Experimental results on seven public datasets with different prediction lengths demonstrate that, compared with the suboptimal model iTransformer (inverted Transformer), the average Mean Square Error (MSE) of SFHD decreases by 9.47%, and the average Mean Absolute Error (MAE) decreases by 5.36%. It can be seen that SFHD provides stronger representation capabilities and leads to improved performance on downstream time series prediction tasks.

    Time series anomaly detection method based on high-order feature aggregation
    Yifan SUO, Songhua LIU, Qiuzhi HAO
    2026, 46(4):  1131-1138.  DOI: 10.11772/j.issn.1001-9081.2025040448
    Asbtract ( )   HTML ( )   PDF (1526KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the anomaly detection tasks for multivariate time series, the correlations between variables are complex, and such correlation is difficult to be learned by traditional anomaly detection methods clearly. In addition, most models only consider the correlation between variables, with learning time dependencies insufficiently. Therefore, a time series anomaly detection method based on High-order Feature Aggregation (HFA) was proposed. Firstly, a variable relationship diagram was constructed through graph structure learning. Secondly, the traditional Graph ATtention network (GAT) was enhanced by taking full account of higher-order neighbor node correlations, thereby modeling complex inter-variable relationships more accurately. Finally, temporal dependencies of the series were captured fully through the integration of one-dimensional convolutions with self-attention mechanism. Experimental results on four public datasets demonstrate that compared with the suboptimal baseline model Anomaly Transformer, HFA has the F1 score increased by 1.34% on average; compared with the current mainstream baseline method TopoGDN (Topology Graph Deviation Network), HFA has the F1 score increased by 9.05% on average. The results of ablation experiments further verify the effectiveness of each module in the model.

    Cyber security
    Review of DDoS attack defense technology
    Zhige HE, Chang LIU, Junrui WU, Haoran LUO, Shuisong HU, Wenyong WANG
    2026, 46(4):  1139-1157.  DOI: 10.11772/j.issn.1001-9081.2025040402
    Asbtract ( )   HTML ( )   PDF (1363KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Distributed Denial of Service (DDoS) attacks, as a highly destructive type of cyber attacks, have become one of the most severe threats and challenges in the field of cybersecurity in recent years due to their low attack costs, high attack efficiency, and strong concealment. DDoS attacks employ a distributed control approach to mix malicious traffic with legitimate network requests, making it difficult for traditional security defense mechanisms such as Intrusion Detection System (IDS) and firewalls to identify and mitigate such attacks effectively. Consequently, the efficient detection and effective defense against DDoS attacks have become research hotspots and difficulties in the field of cybersecurity. Based on systematic survey of the existing research on DDoS attacks, the following was performed. Firstly, the classification methods of DDoS attacks were sorted out, and DDoS attacks were summed up from multiple perspectives, so as to provide a deeper understanding of DDoS attack mechanisms. Secondly, an analysis of the current development of DDoS attacks was conducted, with particular focuses on discussing the development trends in attack intensity, attack methods, and attack distribution, thereby providing support for the research on more efficient DDoS defense technologies. Thirdly, an in-depth analysis and evaluation of the status of DDoS attack defense technologies was conducted from both industrial and academic perspectives, which focused on DDoS detection and defense methods based on programmable switches and machine learning in the academic aspect, and compared and analyzed the defense architectures adopted by different participants in DDoS defense in the industrial aspect as well as summarized the technical characteristics, application scenarios, and the existing challenges of the architecture. Finally, based on a comprehensive analysis of the current DDoS attack situations, the future development directions, opportunities, and challenges of DDoS defense technology were prospected, providing new ideas and directions for researchers in the field of cybersecurity and promoting further innovation and development of DDoS defense technology.

    Vulnerability classification framework for video surveillance network security based on large language models
    Xiaoyu WANG, Xin LI, Di XUE, Zhangtao JIANG, Wei WANG, Yanjun XIAO
    2026, 46(4):  1158-1170.  DOI: 10.11772/j.issn.1001-9081.2025040474
    Asbtract ( )   HTML ( )   PDF (1962KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Security vulnerabilities in video surveillance networks endanger public safety and even national security. Facing the continuous evolution of security threats, incremental learning methods are needed urgently. However, the existing methods suffer from classification inaccuracies in incremental learning due to three major challenges: insufficient few-shot learning performance, classification bias caused by semantic ambiguity, and limited capability to expand new categories dynamically. Therefore, an Incremental Vulnerability Classification Framework based on Large Language Model (LLM) (IVCF-LLM) was proposed. In the framework, data stratification and a dynamic threshold mechanism were employed to ensure balanced distribution of training data. In the top-level classification stage, firstly, GPT-4o was used for deep analysis to extract vulnerability trigger words from few samples, thereby generating high-quality classification prompt templates, termed as “skills”; then, the keyword extraction mechanism was optimized to identify vulnerability causes and attack methods precisely, thereby matching the optimal skill to guide GPT-3.5 Turbo for accurate classification; finally, the knowledge distillation technology was introduced to achieve seamless fusion of old and new skills, thereby realizing Class-Incremental Learning (CIL). In the sub-layer classification stage, a Common Weakness Enumeration (CWE) knowledge graph was constructed, and static knowledge injection and dynamic relationship retrieval strategies were combined, so as to achieve fine-grained and precise classification. Experimental results demonstrate that on the self-built dataset, IVCF-LLM achieves accuracy of 75.0% and Matthews Correlation Coefficient (MCC) of 65.7%, outperforming models such as Text-to-Weakness mapping (Text2Weak), Semantic Common weakness enumeration Predictor (SCP), and prompt-based classification; on the general network security dataset, the accuracy of IVCF-LLM is significantly higher than that of SCP model by 15.9 percentage points, validating the proposed framework’s effectiveness and cross-scenario stability.

    DCIdentity: on-demand disclosure blockchain digital identity authentication mechanism
    Shiyu WANG, Linpeng JIA, Jian JIN, Zhongcheng LI, Jihua ZHOU, Yi SUN
    2026, 46(4):  1171-1181.  DOI: 10.11772/j.issn.1001-9081.2025040462
    Asbtract ( )   HTML ( )   PDF (1072KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To solve the problems of strong coupling between users and clients and the vulnerability of privacy security due to the plaintext storage of Verifiable Credentials (VCs) in the off-chain clients in the existing Decentralized IDentity (DID) authentication schemes, an on-demand disclosure blockchain digital identity authentication mechanism was proposed, namely DCIdentity. Firstly, based on the World Wide Web Consortium Decentralized IDentifier (W3C DID), user identities’ VCs were encrypted and stored on the blockchain, which reduced users’ dependency on clients and realized loose coupling between the authentication process and the clients. Secondly, a hierarchical encryption mechanism for VCs was designed to support on-demand disclosure of user information, which enhanced efficiency in multi-party authentication and reduced the associated overhead. Experimental results show that compared with the off-chain storage scheme, the proposed mechanism reduces the degree of coupling between the clients and the user authentication process effectively, and achieves the on-demand disclosure of user identity information; compared with the Ciphertext-Policy Attribute-Based Encryption (CP-ABE) scheme, the proposed mechanism has the encryption processing delay and the on-chain storage overhead decreased by 91.5% and 84.1%, respectively. It can be seen that the proposed mechanism provides an efficient solution for unified identity authentication in multi-domain multi-application scenarios, which improves the authentication efficiency significantly while ensuring the privacy of user information, and can support the landing application of DID in actual scenarios strongly.

    Covert communication model assisted by smart contracts
    Wei SHE, Kong CHENG, Shuhui ZHANG, Jiawei MA, Chenhong QI, Guangjun ZAI
    2026, 46(4):  1182-1190.  DOI: 10.11772/j.issn.1001-9081.2025040492
    Asbtract ( )   HTML ( )   PDF (1099KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the problems in the current trace-free blockchain covert communication technology, such as the single dimension of carrier embedding and the risk of network ecological pollution, a trace-free covert communication model based on smart contracts was proposed. The proposed model is composed of three algorithms to work together and is able to explore characteristics of the smart contract opcode sequence deeply. Firstly, the preprocessing algorithm was used by both communicating parties to reconstruct the contract opcode characteristics and construct a triplet feature sequence. Secondly, the binary-coded original message was concealed in the reconstructed sequence through the associated parameter-assisted mapping, so as to generate a binary random key used for transmission. And, the risk of key leakage in the transmission process was reduced effectively by combining the custom transmission protocol. Finally, the key was input into the reverse parsing algorithm by the receiver to realize the accurate restoration of the information. Experimental results show that the proposed model has an average embedding capacity of 5 042 bits per contract and an embedding efficiency of 6 MB/s, and the randomness of the generated key and the irrelevance of the generated key to the original information are proved by the National Institute of Standards and Technology (NIST) test and the mutual information test.

    Certificateless linkable ring signature scheme based on elliptic curves
    Qinkun JIANG, Xianghua MIAO, Bingyu GUO, Xinglei RUAN
    2026, 46(4):  1191-1198.  DOI: 10.11772/j.issn.1001-9081.2025040464
    Asbtract ( )   HTML ( )   PDF (655KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Linkable ring signatures not only inherit the advantages of ring signatures, but also embed link tags in the signature scheme, enabling verifiers to determine whether two signatures are generated by the same signer, which addresses the issues of abuse or repeated signatures effectively. The CertificateLess Public Key Cryptography (CL-PKC) system can solve the problems of key escrow and certificate management, thereby enhancing security. By combining the advantages of the above two, a CertificateLess Linkable Ring Signature scheme (CL-LRS) based on elliptic curves was proposed, the system model and security model of the scheme were constructed, and the scheme was proven secure against Type Ⅰ/Ⅱ adversary attacks under random oracle model with anonymity, unforgeability, linkability, and non-slanderability. To verify the practical performance of the proposed scheme, the signature overhead and signature verification overhead of the proposed scheme were compared with those of several existing elliptic-curve-based ring signature schemes by calculating the time of each cryptographic operation used in the signature scheme. Experimental results indicate that without relying on bilinear pairing operations, the proposed scheme can still resist attacks from malicious Key Generation Centers (KGCs) and prevent potential signature abuse. Meanwhile, the total signature time of the proposed scheme is reduced by 80.1% compared with that of the elliptic-curve-based linkable ring signature. It can be seen that the scheme is suitable for applications in resource-constrained scenarios.

    Advanced computing
    Subregion-based many-objective evolutionary algorithm
    Jinglei GUO, Shiyuan LIU, Shouyong JIANG
    2026, 46(4):  1199-1210.  DOI: 10.11772/j.issn.1001-9081.2025050534
    Asbtract ( )   HTML ( )   PDF (1469KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the challenge of balancing convergence and diversity in Multi- Objective Evolutionary Algorithm (MOEA) when solving Many-objective Optimization Problem (MaOP), a SubRegion-based Many-Objective Evolutionary Algorithm (SR-MaOEA) was proposed. In this algorithm, distribution structure of the objective space was constructed through a subregion partitioning strategy, the subregion density was quantified using Shifted Density Estimation (SDE), and a hierarchical subregion dominance relation ranking strategy was designed to enhance the selection pressure of individuals. Furthermore, a convergence-diversity adaptively fused weighted selection mechanism was proposed, in which the potential value of each subregion was evaluated by calculating the difference in weighted sums of adjacent generations of subregions dynamically, and then the individuals in subregions with higher potential values were retained preferentially to update the population. Experimental results on the MAF benchmark test set comparing multiple mainstream many-objective evolutionary algorithms show that on most test problems, SR-MaOEA outperforms the comparison algorithms in terms of both Inverted Generational Distance (IGD+) and HyperVolume (HV) metrics, thereby demonstrating the effectiveness and robustness of this algorithm in high-dimensional objective spaces.

    Observer-based leader-following consensus of heterogeneous descriptor multi-agent system with disturbance
    Shuai SU, Chenglin LIU
    2026, 46(4):  1211-1217.  DOI: 10.11772/j.issn.1001-9081.2025040441
    Asbtract ( )   HTML ( )   PDF (1080KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the leader-following consensus problem of heterogeneous descriptor multi-agent systems under unknown nonlinear exogenous disturbance, a distributed control protocol was proposed. Firstly, a disturbance observer was designed to observe the nonlinear exogenous disturbance, and the convergence condition of the observation error was derived. Secondly, a distributed control protocol was designed on the basis of the state observer, so as to achieve leader-following consensus of the system to the leader, and its sufficiency and rationality were proved on the basis of graph theory, matrix theory, and descriptor system theory. Finally, the design and analysis of the control protocol were extended to the leader-following bipartite consensus problem of heterogeneous descriptor multi-agent systems based on competition and cooperation relationships. Simulation results demonstrate that the proposed control protocol achieves the object control effectively. Comparison experimental results with systems without disturbance observer further verify the necessity of the disturbance observer design. It can be seen that the proposed protocol further extends the applicability of consensus theory in complex environments.

    Design of LDLT matrix decomposition FPGA accelerator based on mixed precision strategy
    Chaoyun MAI, Xiaopeng KE, Dongzhou ZHONG, Xiaochun HONG, Panrong CHEN, Zhiyuan SU
    2026, 46(4):  1218-1226.  DOI: 10.11772/j.issn.1001-9081.2025050535
    Asbtract ( )   HTML ( )   PDF (1354KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the problems of high resource consumption and difficulty in balancing computational accuracy and efficiency when implementing symmetric positive definite matrix decomposition algorithms on Field Programmable Gate Array (FPGA), an LDLT decomposition acceleration structure based on mixed precision strategy was proposed. In the structure, half-precision numbers were used at the storage level to reduce resource consumption, and single-precision numbers were used at the computational level to ensure computational accuracy and numerical stability. In addition, a parallel pipeline structure of multiple processing units was constructed, and a dual arbitration mechanism was introduced, so as to optimize data scheduling and memory access processes. The acceleration structure was deployed on the xczu4ev-sfvc784 FPGA platform, and experiments were conducted on symmetric positive definite matrices of order 4 to 256 under three parallel configurations of 4PE, 8PE, and 16PE. The results show that the proposed structure has the relative errors of calculation results of the matrix decomposition all within 10-3. Compared with some contrast methods, this structure reduces the occupied LUTs resources by more than 40%, and the occupied DSP resources by 70%. It can be seen that this structure maintains computational accuracy while achieving low hardware overhead and improving throughput, demonstrating excellent scalability and engineering adaptability.

    Computer software technology
    Survey of automated code edit suggestion
    Haoxuan CHEN, Peichang YE, Lei LIU, Chengming LIU, Wenhua HU
    2026, 46(4):  1227-1237.  DOI: 10.11772/j.issn.1001-9081.2025040486
    Asbtract ( )   HTML ( )   PDF (1301KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Code editing, as a core component of software development, is crucial for the continuous optimization of software systems. With the discovery of regularities in code editing behaviors, Automated Code Edit Suggestion (ACES) techniques have emerged as a key direction to enhance editing efficiency and reduce human errors. However, the existing research suffers from issues such as scattered findings, and lack of systematic integration and a unified framework. Therefore, a systematic review of ACES techniques was conducted, reviewing relevant research findings published from 2004 to 2025 comprehensively. Firstly, the publication trends in this field were sorted out, and the technical evolution of suggestion models along with the development of related auxiliary technologies were summed up from three dimensions: traditional intelligent methods, deep learning models, and large language models. Based on the above, suggestion tasks were classified into four different types: context information-based, task descriptions and instructions-based, historical edits-based, and input-output examples-based suggestion tasks, and the technical approaches and research findings of each task type were elaborated in detail. Secondly, through analysis of the ACES evaluation systems, the programming languages, editing granularity, scale distribution of the existing evaluation datasets, as well as evaluation metrics such as text similarity and functional correctness of the code were introduced systematically. Finally, an in-depth analysis of the current research status was conducted, a series of prominent challenges in the research were identified, and potential future opportunities were prospected, providing theoretical references and directional guidance for the further development of this field.

    Multimedia computing and computer simulation
    Survey on BEV 3D object detection algorithm system
    Yang GUO, Hailiang WANG, Xu GAO, Haitao WANG, Yibo WANG
    2026, 46(4):  1238-1252.  DOI: 10.11772/j.issn.1001-9081.2025040419
    Asbtract ( )   HTML ( )   PDF (1111KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Visual perception, as one of the core technologies of environmental understanding, provides accurate environmental information for intelligent mobile systems (such as autonomous driving) and is an important prerequisite for ensuring safety decisions. 3D object detection technology based on Bird’s Eye View (BEV) has become the mainstream paradigm in the field of environmental perception because of its efficiency and accuracy. To further promote the research of 3D object detection algorithms based on BEV, the following was performed. Firstly, the BEV 3D object detection algorithms were classified systematically, and according to the modals of the input data, they were divided into three categories: pure camera algorithms, pure LiDAR algorithms and camera-LiDAR fusion algorithms. Secondly, the role of pre-training algorithms in improving detection performance was explored. Thirdly, the advantages and disadvantages of the algorithms fusing temporal features in dynamic scenarios and the performance of the algorithms fusing height features in complex environments were analyzed. Fourthly, the breakthrough progress made by large model collaborative BEV object detection in object detection accuracy and scenario understanding was sorted out. Finally, the core conclusions of BEV 3D object detection algorithms were summarized, and future research directions were looked forward, so as to provide new ideas for research work in this field.

    RGB-D dual-stream mirror network for camouflaged object detection
    Peng CHEN, Xu LI, Xiaosheng YU
    2026, 46(4):  1253-1263.  DOI: 10.11772/j.issn.1001-9081.2025040488
    Asbtract ( )   HTML ( )   PDF (2108KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Camouflaged objects have high visual similarity to surrounding background in terms of texture, color, and other attributes, and RGB-based representations are particularly vulnerable to such interference. So that, it is difficult to distinguish object location accurately, often resulting in incomplete segmentation structure or even object lack, thereby degrading detection performance. To address the issue, an RGB-D Dual-stream Mirror Network (RDMNet) for Camouflaged Object Detection (COD) was proposed. Firstly, a hybrid backbone, composed of TransNeXt and Vision Mamba, was adopted to reduce model parameters, and a Multi-modal Feature Fusion (MFF) module was designed to enhance depth features by fusing RGB and depth information. Secondly, a Depth Positioning Module (DPM) and a Positioning-Guided feature integrity Aggregation (PGA) module were designed. The former was used to generate complete contour localization features, while the latter was employed to locate camouflaged objects rapidly and predict complete features efficiently. After cross-refinement fusion of the above two, the global structure of camouflaged objects was focused on, and the segmentation features as well as contour localization features were refined continuously. Finally, a Convolutional gated Channel Attention (CCA) module was designed to extract structural details from low-level features. Experimental results show that on COD and RGB-D Salient Object Detection (SOD) datasets confirm the superiority of RDMNet over 15 representative methods; on CAMO, COD10K, and NC4K datasets, RDMNet achieves gains compared to MVGNet (Multi-View Guided Network), with average improvements of 2.0% in structural similarity index measure (S-measure), 1.5% in mean enhanced alignment measure (mean E-measure), 3.2% in weighted F-measure, and a 17.2% reduction in mean absolute error. RDMNet’s effectiveness can be seen in enhancing both segmentation completeness and accuracy in COD.

    Progressive dual-stage modality interaction for single-domain generalized object detection
    Yongbing ZHANG, Lirong YAN, Xiaofen TANG
    2026, 46(4):  1264-1274.  DOI: 10.11772/j.issn.1001-9081.2025050543
    Asbtract ( )   HTML ( )   PDF (3166KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing vision-language-based single-domain generalization models rely on fixed unidirectional text guidance for local visual alignment, which limits their ability to model local-global context. Aiming at the problem, a Progressive Dual-stage Modality Interaction (PDMI) framework was proposed. In PDMI, global domain-invariant features were extracted hierarchically within modalities, and the complementary semantic information was fully exploited between visual and textual modalities, thereby capturing fine-grained semantic knowledge. Firstly, fixed domain-agnostic prompts and learnable Adaptive Domain Prompts (ADP) were integrated to guide the obtaining of the semantic awareness of samples toward specific domains. At the same time, based on the ResNet-101 visual backbone, a Multi-level Intra-Modality Interaction (MIMI) module was designed, in which Intra-Modality Mamba Interactions (IMMI) were performed on source domain images based on the guidance of adaptive visual prompts to extract global domain-invariant features, thereby improving the distribution of visual representations. Then, a Cross-Modality Bidirectional Interaction and Fusion (CMBIF) mechanism was adopted to extract and align fine-grained cross-modality feature, realizing fine-grained interactions between modalities through bidirectional guidance of visual or textual prompts. Finally, a Cross-Modality Adaptive Fusion (CMAF) module was employed to search for the optimal combination of inter-modal information automatically, thereby reducing redundant features of interactions between modalities. Experiments were conducted on three challenging domain shift datasets: Diverse Weather, Virtual-to-Reality, and UAV-OD. The results show that PDMI achieves higher mean Precision on the Target domain (mPT), compared to C-Gap, SRCD (Semantic Reasoning with Compound Domains), and FDD (Frequency Domain Disentanglement) methods by 2.0, 4.0, and 4.2 percentage points, respectively and averagely. It can be seen that PDMI can extract global-local domain-invariant features effectively to enhance the generalization to unseen target domains significantly, which is essential for scenarios with significant distribution shifts between the source and target domains as well as limited target domain data.

    Object detection algorithm with few-shot learning based on YOLO-World
    Shuai HE, Chunhua DENG
    2026, 46(4):  1275-1282.  DOI: 10.11772/j.issn.1001-9081.2025050589
    Asbtract ( )   HTML ( )   PDF (1815KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Object detection has been widely applied in the field of computer vision. However, most existing methods rely on large-scale labeled data heavily, which make it difficult to address the problem of limited samples in new categories under real-world conditions. Although current Open-Vocabulary object Detection (OVD) methods have certain cross-category generalization ability, issues such as rough semantic matching and inadequate spatial localization accuracy commonly occur when facing new categories with similar structure. To overcome these issues, an object detection algorithm with few-shot learning based on YOLO-World was proposed. Firstly, Category-aware Convolution Kernel Construction Module (CCKCM) was proposed to fuse textual semantic embeddings with visual features, thereby enhancing semantic perception ability for new categories under few-shot setting. Secondly, an efficient object matching and localization mechanism was introduced by combining sliding convolution with spatial geometric constraints, thereby realizing fast matching and accurate localization of target regions while maintaining low computational complexity. Finally, an image dataset for Few-Shot Object Detection (FSOD) tasks was built, covering multiple classic scenes and object categories. Experimental results show that on the PASCAL VOC 2007+2012 dataset, the 10-shot average precision for novel classes of the proposed algorithm reaches 73.4%, which is 1.4 percentage points higher than that of FM-FSOD. It can be seen that the proposed algorithm provides a feasible technical path for the rapid recognition of objects in new categories in real-world scenarios.

    CDC-DETR: multi-scale real-time human-vehicle detection method for complex traffic scenarios
    Xinyi YAN, Linglong ZHU, Yonghong ZHANG
    2026, 46(4):  1283-1291.  DOI: 10.11772/j.issn.1001-9081.2025040472
    Asbtract ( )   HTML ( )   PDF (2440KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The complexity and variability of traffic scenarios challenge existing human-vehicle target detection algorithms, especially when dealing with occlusion, illumination changes and multi-scale targets, existing algorithms tend to have insufficient accuracy and low computational efficiency. To solve the above problems, an improved detection model, CDC-DETR (CPPA-DWRC-CGNET-DETR), was developed based on the RT-DETR (Real-Time DEtection TRansformer) architecture. Firstly, a Context Pre-activation Pooling Attention (CPPA) module was designed to enhance long-range dependencies and optimize feature extraction. Secondly, a Dilation-Wise Residual Connection (DWRC) module was introduced to improve multi-scale feature representation. Thirdly, a lightweight Context Guided Block (CG Block) was proposed to fuse local, surrounding, and global information and reduce computational cost. Finally, these modules were integrated to construct a high-accuracy and efficient real-time human-vehicle detection model suitable for complex traffic scenarios. Experimental results on the BDD100K dataset show that compared to RT-DETR, when the Intersection over Union (IoU) is 0.5, CDC-DETR improves the mean Average Precision (mAP) by 6.12%, increases the recall by 4.35%, and decrease the number of floating-point operations by 11.23%, enhancing computational efficiency significantly and providing an effective solution for deployment on edge devices.

    Face forgery detection method based on tri-branch feature extraction
    Shengwei XU, Jianbo WANG, Jijie HAN, Yijie BAI
    2026, 46(4):  1292-1299.  DOI: 10.11772/j.issn.1001-9081.2025040461
    Asbtract ( )   HTML ( )   PDF (1012KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problems of insufficient feature representation, poor robustness, and weak cross-domain generalization in handling diverse forgery types and low-quality images, a face forgery detection method based on tri-branch feature extraction, Tri-BranchNet (Tri-Branch feature extraction Network), was proposed to achieve the complementarity and integration of multiple types of features, and enhance forgery trace representation and model’s detection performance. The architecture was designed as: 1)global semantic representation were captured by using Vision Transformer (ViT); 2)local texture feature modeling ability was improved by introducing Invertible Neural Network (INN); 3)an edge feature extraction branch was designed to solve the problem that traditional models inadequately extracted features from boundary forgery regions. Experimental results on multiple public datasets show that the proposed method achieves 98.75% accuracy on FaceForensics++ (C23) dataset, outperforming F3-Net (Frequency in Face Forgery Network) and CORE (COnsistent REpresentation learning) by 1.26% and 1.17%, respectively. In cross-compression and cross-dataset tests, the proposed method has the Area Under Curve (AUC) scores reached 85.26% and 81.09% on C40 and Celeb-DF, respectively, demonstrating strong robustness and generalization. It can be seen that the proposed tri-branch fusion mechanism enhances detection accuracy in complex forgery environments significantly and provides a novel idea for multi-dimensional feature modeling of forgery images.

    Unsupervised face attribute editing method based on dynamic convolutional autoencoder
    Xuan CUI, Bo LIU
    2026, 46(4):  1300-1308.  DOI: 10.11772/j.issn.1001-9081.2025040398
    Asbtract ( )   HTML ( )   PDF (1424KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Unsupervised face attribute editing methods based on the latent space of Generative Adversarial Networks (GANs) offer advantages of high efficiency and only label-free data required, but they still face challenges in terms of decoupling and controllability, for instance, modifying a specific face attribute may alter other attributes inadvertently, thereby affecting editing quality, and precise control over the degree of attribute modification remains difficult. To address these issues, a dynamic convolutional Autoencoder-based Unsupervised Face Attribute Editing (AUFAE) method was proposed to achieve precise face attribute editing by learning effective semantic vectors in the latent space. Specifically, a Dynamic Convolutional AutoEncoder Network (DCAE-Net) was designed as the backbone, where Dynamic Convolution (DyConv) was utilized by the encoder to extract local latent-space features adaptively, thereby learning semantic vectors with local characteristics. A Channel Attention (CA) mechanism was incorporated into the decoder to establish nonlinear dependencies between channels, thereby allowing the model to focus on feature channels relevant to different semantics autonomously and enhancing the independence of semantic vector learning. To improve decoupling and controllability of semantic vectors, an attribute boundary vector-based loss function was introduced to train DCAE-Net. Additionally, a soft orthogonality loss was applied to ensure mutual independence of semantic vectors, thereby further boosting decoupling performance. Experimental results show that on three pre-trained GAN generation models, compared with three mainstream face attribute editing methods, AUFAE has the Fréchet Inception Distance (FID) decreased by 37.43%-50.21%, the Learned Perceptual Image Patch Similarity (LPIPS) decreased by 23.61%-42.85%, and the Structural Similarity Index Measure (SSIM) increased by 7.04%-13.42%. On intuitive vision, AUFAE does not exhibit attribute coupling during face attribute editing process. It can be seen that AUFAE can alleviate the attribute coupling in face editing process effectively and achieve more accurate face attribute editing.

    Frontier and comprehensive applications
    Review of evolution and changes in crowd evacuation calculation methods
    Xiayu WU, Hong ZHANG
    2026, 46(4):  1309-1322.  DOI: 10.11772/j.issn.1001-9081.2025040491
    Asbtract ( )   HTML ( )   PDF (1587KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Crowd evacuation modeling is evolving from classic physical simulation to data- and behavior-driven intelligent systems, leading to dual trends: continuous improvement of classic models and emergence of new intelligent technologies. However, the existing research lacks an analytical framework that can integrate these two trends systematically and reveal their inherent evolutionary logic, making it difficult for researchers to evaluate comprehensively and select appropriately the existing methods. To address this challenge, a three-stage progressive analytical framework was proposed. In the first stage, the evolution of models driven by multiple factors, including physical environment, emergencies, and subjective psychology, towards higher realism was sorted out. In the second stage, the classic physical models, represented by Social Force Models (SFMs) and Cellular Automata (CA), and their performance improvements under the integration of multiple factors were evaluated systematically. In the third stage, the cutting-edge intelligent methods were focused on, and the application of Artificial Intelligence (AI) technology in the field of evacuation was deconstructed and induced functionally according to the closed-loop logic of "situation awareness-behavior prediction-decision optimization". This framework reveals the core technological contradiction in this field clearly, that is the trade-off between efficiency and realism, and clarifies the fundamental driving force for the continuous evolution of next-generation intelligent evacuation models. The proposed analytical framework provides a structured, high-level logical view for understanding the complex technology panorama of crowd evacuation modeling, and offers a systematic reference benchmark for researchers in the field when evaluating, selecting, and innovating evacuation models.

    Urban traffic flow prediction based on dual-layer multi-scale dynamic graph convolutional network model
    Wenhao LI, Yinzhang GUO
    2026, 46(4):  1323-1333.  DOI: 10.11772/j.issn.1001-9081.2025040522
    Asbtract ( )   HTML ( )   PDF (1023KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the limitations of the existing traffic flow prediction models in utilizing fusion information of traffic regions and node layer features effectively, as well as these models’lack of dynamic representation of spatio-temporal features, a Double-layer Multi-scale Dynamic Graph Convolutional Network (DM-DGCN) was proposed for urban traffic prediction. Firstly, a double-layer network architecture was adopted to fuse the spatio-temporal features of regions and nodes, so as to handle node and region traffic flow data simultaneously. Secondly, in the spatial dimension, a Spatial-dynamic Graph Convolutional Network (S-GCN) module was constructed to capture dynamic spatial correlations. Thirdly, in the temporal dimension, a Multi-Scale Temporal Convolutional Network (MSTCN) module was designed to capture potential temporal dependencies under different semantic environments. At the same time, the idea of spatial relationship learning was introduced into the temporal domain by designing a Temporal-dynamic Graph Convolutional Network (T-GCN) module, so as to construct a dynamic time-varying temporal relationship matrix. Finally, a dynamic fusion module based on the attention mechanism was designed to integrate region and node layer features, and the final prediction results were generated through the fusion layer. Experimental results on the Jinan and Xi’an traffic datasets show that for the 60-minute prediction task, DM-DGCN model reduces the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) by 5.79% and 5.56%, respectively, compared to Spatio-Temporal Pivotal Graph Neural Network (STPGNN) model on the Jinan dataset; and by 3.73% and 4.19%, respectively, compared to Hierarchical Spatio-Temporal Graph Ordinary Differential Equation network (HSTGODE) model on the Xi’an dataset. The above verifies that DM-DGCN model outperforms the existing baseline models, captures dynamic multi-scale spatio-temporal dependencies in traffic data effectively, and predicts future traffic flow accurately.

    Power image retrieval method based on improved Swin Transformer
    Xiang BAI, Juchuan LI, Huimin WANG, Chao JING, Jian NIU, Xingzhong ZHANG, Yongqiang CHENG
    2026, 46(4):  1334-1343.  DOI: 10.11772/j.issn.1001-9081.2025040416
    Asbtract ( )   HTML ( )   PDF (2309KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing image retrieval methods struggle to distinguish and extract similar structural information and texture details of power equipment effectively, resulting in low retrieval accuracy and efficiency. To solve these problems, a Power Image Retrieval method based on improved Swin Transformer (PIR-iSwinT) was proposed. Firstly, a Multi-Feature Structure Cross-Enhancement module (MFSCE) was introduced to enhance the model's perception ability of equipment structural and edge features by combining cross-attention mechanism of the gradient magnitude map. Secondly, an Adaptive Inter-class Difference Center Loss module (AIDCL) was designed to strengthen the model's ability to distinguish between similar and dissimilar samples. Finally, a Hierarchical Clustering Retrieval module (HCR) was constructed to optimize the sample matching strategy during retrieval and reduce computational complexity, thereby further enhancing retrieval accuracy and efficiency. Experimental results on the self-built power scenario dataset and the NUS-WIDE dataset show that PIR-iSwinT achieves the mean Average Precision (mAP) of 96.76% and 92.68%, respectively, at a 32 bit hash code length, outperforming HRMPA (Hash image Retrieval based on Mixed attention and Polarization Asymmetric loss) by 2.35% and 0.56%, respectively. It can be seen that PIR-iSwinT extracts and distinguishes detailed structural features of power equipment effectively, enhances retrieval efficiency, and demonstrates good generalization capability, verifying effectiveness of the proposed method.

    Drug-target interaction prediction based on structure-network collaborative features and grid-attention enhanced Kolmogorov-Arnold network
    Xumeng DOU, Bin XIE, Zhaohui ZHANG, Zhengang ZHAO, Hanyu DUAN, Aolei GUO
    2026, 46(4):  1344-1353.  DOI: 10.11772/j.issn.1001-9081.2025040505
    Asbtract ( )   HTML ( )   PDF (1121KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Drug-Target Interaction (DTI) prediction is a key task in drug discovery and repurposing. The challenge lies in how to integrate multi-source heterogeneous features to characterize the complex relationships between drugs and targets comprehensively. To address the shortcomings of traditional methods that rely on a single data source and model complex nonlinear relationships in a low-quality way, a DTI prediction method based on Structure-Network collaborative features and grid-attention enhanced KAN (Kolmogorov-Arnold Network) (SNKDTI) was proposed. Firstly, a feature extraction strategy based on structure and network collaboration was designed: for drug representation, molecular fingerprints were fused with graph embedding methods to quantify chemical structures; for target representation, traditional physicochemical encoding was combined with pre-trained models to extract sequence features. Meanwhile, heterogeneous networks such as drug-disease associations and protein-protein interactions were introduced, network topological features were extracted by using the Random Walk with Restart (RWR) algorithm, and the features were compressed by using a Denoising AutoEncoder (DAE), so as to integrate structural and network information of drugs and targets. Secondly, a Heterogeneous Biological Information Network (HBIN) was constructed to carry out feature propagation by using a Graph Convolutional Network (GCN). Additionally, a Grid-Attention enhanced KAN (GA-KAN) was proposed, which introduced multiple learnable B-spline basis function grids and attention mechanisms to achieve adaptive combinations of multiple nonlinear mapping modules, thereby enhancing the model’s expressive power and input adaptability. Finally, a Gradient Boosting Decision Tree (GBDT) was used to build an end-to-end prediction framework. Experimental results of comparing the proposed method and benchmark methods on public datasets show that SNKDTI achieves improvements of 0.81%, 1.36%, and 3.29% in Area Under the receiver operating characteristic Curve (AUC), Area Under the Precision-Recall curve (AUPR), and F1-score, respectively, over the best-performing benchmark methods. The above prove that SNKDTI enhances accuracy, robustness, and generalization ability significantly, providing an efficient tool for new drug target screening.

    Skin cancer classification integrating improved ResNet50 with ensemble classifier
    Chuandong QIN, Zhiqiang SUO
    2026, 46(4):  1354-1362.  DOI: 10.11772/j.issn.1001-9081.2025040513
    Asbtract ( )   HTML ( )   PDF (2385KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As a malignant tumor with continuously rising incidence rates globally, skin cancer requires early and precise diagnosis to reduce the mortality rate. To address the challenges of insufficient model performance for clinical requirements and low diagnostic accuracy in minority categories of skin cancers, a model of an improved ResNet50 integrated with an ensemble classifier was proposed. Firstly, hair-induced noise was eliminated through grayscale black-hat threshold processing and Telea algorithm, and Synthetic Minority Over-sampling TEchnique (SMOTE) was used to balance class distribution. Secondly, the deep-level features were extracted using the ResNet50 model, with a soft attention module combining spatial and channel attention mechanisms introduced to focus on skin lesion regions. Finally, an ensemble classifier integrating random forest, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), K-Nearest Neighbor (K-NN), and Support Vector Machine (SVM) via soft voting was employed, and the proposed model was applied for early diagnosis of skin cancers. The results of three separate experiments on the HAM10000, ISIC2019 and ISIC2020 datasets indicate that the proposed model improves the accuracy to (98.33±0.03)%, (96.15±0.06)% and (99.19±0.02)%, respectively. Compared with current mainstream networks, the proposed model exhibits superior feature extraction and classification capabilities, and is helpful to improve early diagnostic effects.

2026 Vol.46 No.4

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF