Loading...

Table of Content

    10 February 2026, Volume 46 Issue 2
    Artificial intelligence
    Review of multi-modal knowledge graph completion methods
    Xue WANG, Liping ZHANG, Sheng YAN, Na LI, Xuefei ZHANG
    2026, 46(2):  341-353.  DOI: 10.11772/j.issn.1001-9081.2025030273
    Asbtract ( )   HTML ( )   PDF (1039KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional Knowledge Graph (KG) provides a unified and machine-interpretable representation of information on the web, but its limitations in handling multimodal applications are increasingly recognized. To address these limitations, Multi-Modal Knowledge Graph (MMKG) was proposed as an effective solution. However, the integration of multi-modal data into KG often leads to problems such as inadequate modality fusion and reasoning difficulties, which constrain the application and development of MMKG. Therefore, Multi-Modal Knowledge Graph Completion (MMKGC) techniques were introduced to integrate cross-modal information fully in the construction phase and to predict missing links after construction, thereby solving issues in modality fusion and reasoning. Subsequently, an overview of MMKGC methods were presented. Firstly, the basic concepts, widely used benchmark datasets and evaluation metrics of MMKGC were elaborated in detail. Secondly, the existing methods were classified into fusion tasks during the MMKG construction phase and reasoning tasks after construction. The former focused on key techniques such as entity alignment and entity linking, while the latter encompassed three techniques: relation inference, missing information completion, and multi-modal expansion. Thirdly, various MMKGC methods in each category were introduced thoroughly and their characteristics were analyzed. Finally, the problems and challenges faced by MMKGC methods were examined, and a summary of the above was provided.

    Entity discovery method for non-intelligent sensors by integrating knowledge graph and large models
    Jindong HE, Yuxuan JI, Tianci CHEN, Hengming XU, Ji GENG, Mingsheng CAO, Yuanning LIANG
    2026, 46(2):  354-360.  DOI: 10.11772/j.issn.1001-9081.2025020238
    Asbtract ( )   HTML ( )   PDF (1077KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the field of industrial Internet of Things (IoT), device entity discovery constitutes a critical component of device management. Compared to intelligent sensors, the discovery process of non-intelligent sensors is particularly complex due to their lack of inherent communication protocols, making efficient and accurate recognition of non-intelligent devices a technical challenge urgent to be solved. Therefore, a knowledge graph and Large Language Model (LLM)-based efficient recognition method for non-intelligent sensors was proposed. Firstly, a three-layer knowledge graph was constructed by extracting attribute values from non-intelligent sensors. Secondly, the feature vectors of sensors were extracted from the knowledge graph. Finally, the feature vector information was fed into LLM for fine-tuning, and the optimal fine-tuning parameters for the model were obtained through optimization via a series of experiments. Experimental results demonstrate that the proposed method achieves a recognition accuracy of 96.2% on the public IoT sensor dataset SensorData, enhancing recognition efficiency significantly.

    Rumor detection method based on cross-modal attention mechanism and contrastive learning
    Hu LUO, Mingshu ZHANG
    2026, 46(2):  361-367.  DOI: 10.11772/j.issn.1001-9081.2025030266
    Asbtract ( )   HTML ( )   PDF (900KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Social media multi-modal rumor detection faces challenges such as weak cross-modal feature correlation and insufficient intrinsic representation of data. Therefore, a rumor detection method based on cross-modal attention mechanism and contrastive learning was proposed. In the method, fine-grained features of text and vision were extracted by a multi-modal feature module, cross-modal co-attention mechanism and discriminative learning were utilized to enhance inter-modal correlation, complex semantic contexts were captured by using multi-head self-attention, and a contrastive learning module was introduced innovatively to achieve feature optimization under machine supervision. Experimental results on the public Twitter-16 and Weibo datasets show that the accuracy of the proposed method is improved by 5.47 and 4.44 percentage points, respectively, compared with that of the existing optimal model MMFN (Multi-Modal Fusion Network), verifying the key roles of fine-grained feature mining and cross-modal similarity modeling in detection performance. It can be seen that analyzing multi-modal content differences deeply and strengthening cross-modal association mechanism can improve the recognition accuracy of social media rumors effectively.

    SetaCRS: Conversational recommender system with structure-enhanced hierarchical task-oriented prompting strategy
    Haoqian JIANG, Dong ZHANG, Guanyu LI, Heng CHEN
    2026, 46(2):  368-377.  DOI: 10.11772/j.issn.1001-9081.2025020256
    Asbtract ( )   HTML ( )   PDF (2341KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, many studies on conversational recommender systems use pre-trained language models as unified frameworks, aiming to address the issue of inadequate coordination among modules in traditional multi-module architecture. However, these methods cannot fully exploit the coordination between tasks and fail to capture potential structured information in the input effectively. These problems weaken the performance of conversational recommender systems in real-world application scenarios significantly. Therefore, a Conversational Recommender System with Structure-enhanced hierarchical task-oriented prompting strategy named SetaCRS was proposed. In SetaCRS, a heterogeneous graph attention neural network was used to model the sequence co-occurrence information in historical conversations between the user and the system. In addition, hierarchical global task descriptions and specific sub-task descriptions were constructed to help the model capture and utilize the relationship between the current sub-task and the overall task sequence. Experimental results on two public datasets, DuRecDial and TG-ReDial, show that compared with UniMIND (Unified MultI-goal conversational recommeNDer system), SetaCRS achieves improvements of 8.53% and 1.55% in semantic F1, respectively, and improvements of 3.02% and 9.54% in Mean Reciprocal Rank (MRR)@10, respectively. It can be seen that the task dependencies and conversational structured information captured by SetaCRS improve recommendation accuracy and response quality effectively.

    Chinese automated essay scoring based on joint learning of multi-scale features using graph neural network
    Hongjian WEN, Ruijiao HU, Baowen WU, Jiaxing SUN, Huan LI, Qing ZHANG, Jie LIU
    2026, 46(2):  378-385.  DOI: 10.11772/j.issn.1001-9081.2025020228
    Asbtract ( )   HTML ( )   PDF (1616KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing Automated Essay Scoring (AES) methods based on Pre-trained Language Model (PLM) tend to use global semantic features extracted from PLM directly to represent essay quality, while neglecting the associations between essay quality and more fine-grained features. In order to solve the problem, focused on Chinese AES research, the quality of essays was analyzed and evaluated from various textual perspectives, and a Chinese AES method was proposed by learning multi-scale essay features jointly using Graph Neural Network (GNN). Firstly, discourse features at both the sentence level and paragraph level were extracted by utilizing GNN. Then, joint feature learning was performed on these discourse features and the global semantic features of essays, so as to achieve more accurate scoring of essays. Finally, a Chinese AES dataset was constructed to provide a data foundation for Chinese AES research. Experimental results on the constructed dataset show that the proposed method has an improvement of 1.1 percentage points in average Quadratic Weighted Kappa (QWK) coefficient across six essay topics compared to R2-BERT(Bidirectional Encoder Representations from Transformers model with Regression and Ranking), validating the effectiveness of joint multi-scale feature learning in AES tasks. Meanwhile, ablation experimental results further demonstrate the contribution of essay features at different scales to scoring effect. To prove the superiority of small models in specific task scenarios, a comparison was conducted with the currently popular large language models, GPT-3.5-turbo and DeepSeek-V3. The results show that BERT (Bidirectional Encoder Representations from Transformers) model using the proposed method has the average QWK across six essay topics 65.8 and 45.3 percentage points higher than GPT-3.5-turbo and DeepSeek-V3, respectively, validating the observation that Large Language Models (LLMs) underperform in domain-specific discourse-level essay scoring tasks due to the lack of large-scale supervised fine-tuning data.

    Benchmark dataset for retrieval-augmented generation on long documents
    Yixin LIU, Xianggen LIU, Wen LIU, Hongbo DENG, Ziye ZHANG, Hua MU
    2026, 46(2):  386-394.  DOI: 10.11772/j.issn.1001-9081.2025030275
    Asbtract ( )   HTML ( )   PDF (1432KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the development of Pretrained Language Model (PLM), Retrieval-Augmented Generation (RAG) is widely concerned as an emerging task. A comprehensive and objective evaluation of RAG is considered essential to reveal the limitations of the existing methods and to indicate future research directions. However, a lack of systematic evaluation benchmarks for RAG is observed, especially in the context of long documents. To address this issue, an automatic question-answering construction strategy based on focused fragments was proposed, aiming to build large-scale QA datasets efficiently and accurately. Based on this strategy, the first bilingual RAG evaluation benchmark dataset for long documents, named LoRAG, was constructed, covering English-Chinese bilingual documents from multiple domains such as law, finance, and literature, with an average document length of 57 000 tokens in English and 76 000 tokens in Chinese. Systematic experiments on the two key stages of RAG — retrieval and generation were conducted using the LoRAG dataset. In the retrieval stage, multiple mainstream embedding models, including text-embedding-ada-002, the bge-large series, bge-m3, and Multilingual-E5-large-instruct, were evaluated, and the reranking model bge-reranker-v2-m3 was introduced for performance optimization and comparison. In the generation stage, representative Large Language Models (LLM), including Vicuna-13B, ChatGLM2-6B, Llama2-7B, and Claude2, were tested comprehensively. Experimental results show that the constructed dataset LoRAG reveals the positioning challenges faced by current embedding methods in long-document retrieval, as well as the limitations of LLM in balancing relevance and conciseness during the generation process, providing clear research directions for future method improvements.

    Neural network architecture search algorithm guided by hybrid heuristic information
    Qianlong XIONG, Jin QIN
    2026, 46(2):  395-405.  DOI: 10.11772/j.issn.1001-9081.2025030300
    Asbtract ( )   HTML ( )   PDF (1352KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at Neural network Architecture Search (NAS) tasks, an NAS guided by Hybrid Heuristic Information (GHHI-NAS) algorithm was proposed. Firstly, a heuristic information construction module integrating prior knowledge and local search feedback was designed to generate multi-dimensional dynamic heuristic indicators, and a hybrid update strategy was combined to guide architecture search, thereby solving the problems of global exploration deficiency and local optimal traps caused by unidirectional updates in traditional NAS effectively. Secondly, a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) update framework was adopted, enhanced by a hybrid fitness evaluation function, so that the algorithm was able to escape from small-model traps during early stage. Finally, a fitness sharing strategy was implemented to realize smooth evaluation of noise and improve population diversity. Additionally, a Monte Carlo swap sampling method with penalty mechanism was introduced to further reduce performance degradation caused by sampling. Experimental results demonstrate that GHHI-NAS has the validation accuracies of 97.55% and 83.44% on the CIFAR-10 and CIFAR-100 datasets, respectively, along with a test error rate of 24.7% on the ImageNet dataset and outstanding performance on the NAS-Bench-201 dataset, which is similar to or slightly surpasses those of Evolutionary NAS (ENAS) algorithm while requiring only 0.12 GPU-Days for search time, achieving low search costs and high test performance.

    Double decision mechanism-based deep symbolic regression algorithm
    Zeyi GUO, Fenglian LI, Lichun XU
    2026, 46(2):  406-415.  DOI: 10.11772/j.issn.1001-9081.2025020174
    Asbtract ( )   HTML ( )   PDF (816KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Concerning the problem that the Deep Symbolic Regression (DSR) algorithm, which generates expression trees through Recurrent Neural Network (RNN) automatically, cannot ensure both accuracy and structural simplicity simultaneously, a Double decision mechanism-based DSR (DDSR) algorithm was proposed. Firstly, a dual scoring mechanism was employed to evaluate the accuracy and simplicity of the expression trees comprehensively on the basis of initial RNN decision. Then, reinforcement learning was used to train the expression trees, and Risk Proximal Policy Optimization (RPPO) algorithm was utilized to perform reward feedback, so as to update model parameters of the next batch. Experimental results on public datasets show that compared with DSR algorithm, DDSR algorithm achieves a maximum improvement of 0.396 and a minimum improvement of 0.001 in the coefficient related to fitness, with an average gain of 0.116. The above proves the effectiveness of DDSR algorithm.

    Multistage coupled decision-making framework for researcher redeployment after discipline revocation
    Fei GAO, Dong CHEN, Dixing BIAN, Wenqiang FAN, Qidong LIU, Pei LYU, Chaoyang ZHANG, Mingliang XU
    2026, 46(2):  416-426.  DOI: 10.11772/j.issn.1001-9081.2025030271
    Asbtract ( )   HTML ( )   PDF (1931KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing researcher redeployment after discipline revocation relies on manual decision-making, which makes it difficult to coordinate discipline associations effectively. In this context, Large Language Model (LLM) with strong knowledge analysis capabilities provides new ideas for discipline optimization based on researcher redeployment after discipline revocation. However, on university research data represented by scientific research information, they face challenges such as difficulty in understanding professional terms and obvious long-tail distribution. Therefore, a multistage coupled decision-making framework for the redeployment of researchers after discipline revocation, namely MCRF (Multistage Coupled Redeployment Framework), was proposed. MCRF was composed of four stages: recall, semantic enhancement, pairing, and reordering, and was able to decompose difficult problems into multiple relatively simple sub-problems effectively. Firstly, a discipline research word cloud association dataset was constructed to alleviate the problem of general models’ difficulty in understanding specialized academic terms. Secondly, an association recall algorithm was designed to recall Top-K related disciplines of scientific research information quickly, thereby reducing the overall decision-making time overhead. Finally, an implicit optimization module was introduced to generate diverse representations of scientific research information, thereby ensuring that tail discipline research information was able to be fully associated with researchers’ research directions, and accurate semantic matching was achieved through a fine-grained scientific research project ordering model. Experimental results show that on multiple datasets, the recall of the proposed framework reaches 92% in the recall stage, and the accuracy of the proposed framework is 96% in the reordering stage, verifying the effectiveness of MCRF in the task of discipline structure optimization effectively.

    MATCH: multimodal stock prediction framework integrating time-frequency features and hybrid text
    Hanyue WEI, Chenjuan GUO, Jieyuan MEI, Jindong TIAN, Peng CHEN, Ronghui XU, Bin YANG
    2026, 46(2):  427-436.  DOI: 10.11772/j.issn.1001-9081.2025080955
    Asbtract ( )   HTML ( )   PDF (2000KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The existing stock prediction models are mainly based on unimodal, and ignore inter-industry linkage effects and information heterogeneity. Although some studies have introduced textual modalities, they still struggle with challenges such as time lag effects and multi-granularity caused by modality inconsistency. Therefore, MATCH(Multimodal stock prediction frAmework inTegrating time-frequenCy features and Hybrid text), a multimodal fusion framework for stock prediction that integrates heterogeneous information across modalities effectively was proposed. Specifically, a Mixture of Experts (MoE) pretraining strategy was designed to build industry-specific pretrained models of representations, so as to select matched expert networks dynamically and incorporate industry features information. At the same time, a frequency-domain decomposition and hierarchical fusion mechanism was designed to jointly model temporal patterns at multiple frequencies, and a dual-stream pretraining architecture was used to obtain representations of high-frequency future fluctuations and low-frequency future trends, which were interacted with text information across multiple time scales cross-modally, thereby capturing market dynamics more precisely and enabling effective interaction between time-series and textual data in multi-granular scenarios. Experimental results on two real-world stock datasets, S&P 500 and CMIN-US for comparing MATCH and mainstream methods such as ESTIMATE (Efficient STock Integration with teMporal generative filters and wavelet hypergraph ATtEntions) and PatchTST demonstrate that, on S&P 500 dataset, MATCH has the Sharpe Ratio (SR) improved by 50.5% over the sub-optimal baseline model Adv-ALSTM, while on the more challenging CMIN-US dataset, MATCH achieves a 2.35% SR improvement, with other metrics reaching the best. It can be seen that MATCH provides a novel and efficient solution for multimodal data fusion in finance.

    Vehicle insurance fraud detection method based on improved graph attention network
    Jinjiao LIN, Canshun ZHANG, Shuya CHEN, Tianxin WANG, Jian LIAN, Yonghui XU
    2026, 46(2):  437-444.  DOI: 10.11772/j.issn.1001-9081.2025020151
    Asbtract ( )   HTML ( )   PDF (973KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that the existing vehicle insurance fraud detection methods ignore complex correlation in the data, a vehicle insurance fraud detection method based on improved graph attention network was proposed. This method enhances the ability to capture complex correlation in the data through collaborative design of dynamic attention mechanism and serialized global modeling. Firstly, each case of vehicle insurance fraud was abstracted as a node of graph structure. Secondly, the similarity between multiple attributes such as time, age, and amount of the nodes was calculated by K-Nearest Neighbor (KNN) algorithm, so as to construct the complex correlation between the cases. Thirdly, the graph data of the cases was input into GATv2(dynamic Graph ATtention network), and local features of the adjacent nodes were aggregated by allocating node weights dynamically, thereby obtaining new feature representation of each case node. Fourthly, Transformer was introduced to serialize the graph structure output of GATv2. Finally the fusion module was used to perform nonlinear integration expression on the final features, so as to obtain the classification results of the case nodes. Experimental results show that compared with the baseline methods, the proposed method has the accuracy on the two datasets improved by at least 1.11 and 1.34 percentage points, respectively, and the False Positive Rate (FPR) of as low as 0.035% on the insurance company dataset, which provides a new technical solution for improving the accuracy and efficiency of vehicle insurance fraud detection.

    Cyber security
    Detection and defense mechanism for poisoning attacks to federated learning
    Qi ZHONG, Shufen ZHANG, Zhenbo ZHANG, Yinlong JIAN, Zhongrui JING
    2026, 46(2):  445-457.  DOI: 10.11772/j.issn.1001-9081.2025020146
    Asbtract ( )   HTML ( )   PDF (2659KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problem that malicious clients in federated learning destroy the reliability of the global model by uploading malicious updates, a poisoning attack detection and defense algorithm for federated learning, FedDyna, was proposed. Firstly, an abnormal client detection scheme was designed to use the historical standard deviation of cosine similarity and Euclidean distance to detect abnormal updates preliminarily, and a multi-view model evaluation mechanism was combined to further detect suspicious clients. Then, an adaptive adjustment strategy was proposed to reduce the participation weights of abnormal clients gradually according to the weight adjustment factor until the malicious updates were removed from the model training process. The defense performance of FedDyna in different attack scenarios was evaluated on the EMNIST and CIFAR-10 datasets, and the algorithm was compared with the existing advanced defense algorithms. Experimental results show that, under the condition of a fixed attack frequency, a comparison of the effectiveness between the FedDyna algorithm and the Scope algorithm is conducted: When faced with three attack types, namely Projected Gradient Descent (PGD), Model Replacement (MR), and PGD+MR, FedDyna achieves the best results, reducing the Attack Success Rate (ASR) by 1.07 and 0.53, 1.49 and 1.45, 10.55 and 1.25 percentage points respectively; Under the EMNIST dataset subjected to Cosine Constraint Attack (CCA), although FedDyna experiences a slight decrease in ASR, it still achieves the second-best results. Additionally, when evaluated against comparison algorithms in different attacker pools, FedDyna’s ASR performs optimally under most conditions and ranks second-best under the remaining conditions. Notably, in scenarios with varying attack intensities, FedDyna achieves an impressive average global Model Accuracy (MA) of up to 98.5%. It can be seen that FedDyna has shown significant robustness against poisoning attacks in different attack scenarios and can detect and eliminate poisoning models effectively.

    Neighborhood-enhanced unsupervised graph anomaly detection
    Limei DONG, Yanzi LI, Jiayin LI, Li XU
    2026, 46(2):  458-466.  DOI: 10.11772/j.issn.1001-9081.2025020216
    Asbtract ( )   HTML ( )   PDF (1196KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issue of unreliable neighborhood information due to the presence of anomalies, an efficient unsupervised graph anomaly detection method was proposed. In this method, a neighborhood enhancement strategy was utilized to construct multi-type neighborhood sets for central nodes, thereby capturing high-quality node representations and obtaining accurate self-neighborhood similarity. Firstly, an information extraction module based on dynamic neighborhood enhancement was optimized to select the optimal neighborhood strategy adaptively, thereby overcoming the feature homogeneity limitation of traditional fixed neighborhood selection methods during information extraction. Secondly, in order to reduce the interference of redundant information of the nodes during node feature fusion, an anonymous message passing scheme was proposed. In this scheme, features of the nodes were isolated, and neighborhood information was focused on solely, thereby enhancing the quality of message aggregation. Finally, an adaptive weighted anomaly scoring module was designed, using the distance between nodes as an evaluation scale to obtain node anomaly scores, thereby refining the anomaly detection results. Experimental results on five datasets demonstrate that the proposed method outperforms mainstream method CoLA (Anomaly detection on attributed networks via Contrastive self-supervised Learning) in detecting anomalies in complex graph structure, achieving an at least 8.0% AUPRC (Area Under the Precision-Recall Curve) improvement in identifying anomalous samples.

    GAB3D-SEVSN: enhanced video steganography model via invertible neural network
    Qianhui XU, Ke NIU, Shunzhe ZHU, Lin SHI, Jun LI
    2026, 46(2):  467-474.  DOI: 10.11772/j.issn.1001-9081.2025050577
    Asbtract ( )   HTML ( )   PDF (2093KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues of insufficient long-range motion modeling and over-parameterization caused by channel redundancy in video steganography tasks under small-sample conditions, an enhanced video steganographic network — GAB3D-SEVSN was proposed by integrating 3D Global Attention Block (GAB-3D) and Squeeze-and-Excitation (SE) channel attention. In the model, through the optimized GAB-3D module, key motion trajectories in the 3D spatio-temporal domain were focused on adaptively, thereby enhancing the capability of modeling long-range dependencies. Meanwhile, by embedding the SE module into the reversible architecture, the channel-level adaptive calibration was achieved, which suppressed redundant parameters and alleviated over-parameterization effectively. Experimental results on the UCF101 dataset (13K video samples) demonstrate that, compared to the LF-VSN baseline model, the proposed model achieves improvements of 0.5 dB in Peak Signal-to-Noise Ratio (PSNR) and 2.06% in Structural SIMilarity (SSIM). Ablation experimental results verify the effectiveness and synergistic effect of various modules. Test results on high-dynamic scene subsets and videos with different attributes show that the model outperforms baseline models in PSNR and SSIM significantly, demonstrating excellent robustness and generalization ability.

    Image steganography method based on conditional generative adversarial networks and hybrid attention mechanism
    Ming LI, Mengqi WANG, Aili ZHANG, Hua REN, Yuqiang DOU
    2026, 46(2):  475-484.  DOI: 10.11772/j.issn.1001-9081.2025020204
    Asbtract ( )   HTML ( )   PDF (1262KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Current deep image steganography methods based on image-in-image concealment face challenges in practical applications of privacy protection and secure communication due to insufficient security of stego images and distortion in recovered secret images. To address these issues, a Conditional Generative Adversarial Network and Convolutional Block Attention Module-based image-in-image steganography method (CBAM-CGAN) was proposed. Firstly, a hybrid attention module was introduced into the generator network to enable the generator’s comprehensive learning of image features from both channel and spatial dimensions, thereby enhancing the visual quality of stego images. Secondly, residual connections were employed to reduce feature loss of secret images during network learning, and through adversarial training between the extractor and the discriminator, noise-free extraction of secret images was achieved. Finally, adversarial training between the generator and the steganalyzer was implemented to improve the stego image security. Experimental results on public datasets including COCO demonstrate that compared with steganography method StegGAN, the proposed steganography method achieves the Peak Signal-to-Noise Ratio (PSNR) improvements of 4.37 dB and 4.71 dB for stego and decrypted images, respectively, along with Structure Similarity Index Measure (SSIM) enhancements of 9.16% and 6.46%, respectively. For security, the proposed method has the detection Accuracy (Acc) against steganalyzer Ye-Net decreased by 9.35 percentage points with the False Negative Rate (FNR) increased by 12.01 percentage points. It can be seen that the proposed method ensures stego image security while achieving high-quality secret image recovery.

    Advanced computing
    Two-stage infill sampling-based expensive multi-objective evolutionary algorithm
    Chunyu ZHANG, Jianchang LIU, Yuanchao LIU, Wei ZHANG
    2026, 46(2):  485-496.  DOI: 10.11772/j.issn.1001-9081.2025020215
    Asbtract ( )   HTML ( )   PDF (1770KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    For Expensive Multi-objective Optimization Problem (EMOP), although numerous related algorithms have been proposed, most existing algorithms have not achieved satisfactory results. The primary reason is that the infill sampling criteria in these algorithms fail to balance the convergence, diversity and uncertainty of selected individuals. Therefore, a Two-stage Infill Sampling-based Expensive Multi-Objective Evolutionary Algorithm (TISEMOEA) was proposed. In the first stage, a convergence-based infill sampling criterion was proposed, so as to select individuals with both good convergence and diversity, and then balance convergence and diversity. In the second stage, a diversity-based infill sampling criterion was proposed, so as to select individuals with great uncertainty without damaging convergence, and then improve the accuracy of the model and the diversity of the population. Furthermore, an adaptive diversity enhancement strategy was proposed to adjust the frequency of selecting individuals using the diversity-based infill sampling criterion, thereby enhancing population diversity and balancing exploration and exploitation capabilities of the algorithm. TISEMOEA was compared with five state-of-the-art algorithms, MOEA/D-EGO (MOEA/D with the Gaussian process model), HeE-MOEA (Heterogeneous Ensemble-based infill criterion for MOEA), TISS-EMOA (Two-stage Infill Sampling-based Semi-supervised EMOA), PCSAEA (Pairwise Comparison based Surrogate-Assisted Evolutionary Algorithm), and SFA/DE (Evolutionary multiobjective optimization assisted by scalarization function approximation for high-dimensional expensive problems), on the DTLZ and WFG test sets with 28 and 27 test problems, and the Inverted Generational Distance (IGD) metric was analyzed. The results show that TISEMOEA achieves the best results in 19 and 16 test problems, respectively.

    Virus transmission analysis and visualization based on agent-based modeling
    Haiyue CHEN, Xiuwen YI, Shan YAN, Shijiao LI, Tianrui LI, Yu ZHENG
    2026, 46(2):  497-504.  DOI: 10.11772/j.issn.1001-9081.2025020213
    Asbtract ( )   HTML ( )   PDF (1816KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, respiratory infectious diseases break out in crowded places frequently. How to restore transmission path of the virus between people has become a key issue in epidemic prevention and control. However, the existing methods have problems such as the difficulty in describing the crowd movement trajectory in crowded places and the difficulty in simulating spread of the virus caused by contact between people. To this end, a virus transmission’s visual analysis system of Agent-Based Modeling (ABM) was proposed to simulate dynamically and visualize the virus transmission process as well as show individual contact relationships and transmission paths. Firstly, a multi-head rapidly-exploring random tree algorithm was combined with movement statistical data to generate crowd movement trajectories conforming to actual laws. Secondly, by refining the contact rules of people and the virus transmission mechanism, a transmission model based on ABM was constructed to simulate the impact of crowd interaction on virus transmission. Case analysis shows that the virus transmission’s visual analysis system restores the contact relationship between people and the process of virus transmission between people, the transmission chain simulated by the system is highly similar to the actual chain, the number of simulated cases is basically consistent with the actual number of cases, and the multi-head rapidly-exploring random tree algorithm can generate highly realistic trajectories efficiently. The case verifies the decision support value of the system in the optimization of epidemic prevention and control strategies.

    Network and communications
    Performance evaluation method for deterministic networks based on complex-enhanced attention graph neural network
    Junrui WU, Jiangchuan YANG, Haisheng YU, Sai ZOU, Wenyong WANG
    2026, 46(2):  505-517.  DOI: 10.11772/j.issn.1001-9081.2025020200
    Asbtract ( )   HTML ( )   PDF (3596KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Deterministic network is a key focus and hotspot in the current development of the Internet. How to evaluate the performance of non-deterministic traffic in deterministic network systems remains a challenge both in theory and engineering. Although traditional methods, such as modeling network traffic arrival distribution using queuing theory, employing network calculus analysis to establish upper and lower bounds of network behavior, and applying machine learning and deep learning algorithms to predict network performance trends from large-scale historical data statistically, can address this issue partially, they still suffer from poor accuracy and performance facing complex deterministic network systems. Therefore, a Complex-Enhanced Attention Graph Neural Network (CEA-GNN)-based method was proposed for deterministic network performance evaluation. In the method, the gating property of deterministic network systems was utilized fully, and an attention mechanism was employed to deliver critical information to the Graph Neural Network (GNN), thereby improving the evaluation accuracy. At the same time, the relevant information was extracted from both the graph spatial domain and the complex frequency domain to update the GNN, thereby enhancing performance of the evaluation model. Experimental results from experiments conducted on the National Science Foundation Network (NSFNet) topology indicate that compared to the RouteNet-Fermi evaluation method, the proposed method reduces the Mean Absolute Error (MAE) of non-deterministic traffic latency prediction in deterministic networks by 87.4%, decreases the MAE of packet loss rate prediction by 12.7%, and reduces the average processing time per flow by 64.4%.

    Computer software technology
    Software vulnerability detection method based on edge weight
    Qiao YU, Zirui HUANG, Shengyi CHENG, Yi ZHU, Shutao ZHANG
    2026, 46(2):  518-527.  DOI: 10.11772/j.issn.1001-9081.2025020217
    Asbtract ( )   HTML ( )   PDF (1690KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the widespread application of software across various domains, software vulnerabilities have shown a continuous upward trend, so that deep learning-based methods for vulnerability detection have gained wide application. However, the existing graph representation learning methods often neglect the influence of edges in the graph on vulnerability detection, and have the representation of edge weights too coarse. To address this issue, a software vulnerability detection method based on edge weight — EWVD (Edge Weight for Vulnerability Detection) was proposed. Firstly, comments, custom variable names, and function names in the source code were cleaned and represented abstractly. Secondly, Sent2Vec was selected to perform embedding representation after comparative analysis. Thirdly, edge weights were calculated comprehensively using three metrics: connection structure, the importance of neighboring nodes, and Jaccard similarity, so as to identify the information transmission capability between nodes. Finally, by leveraging edge weights, perception capability of the model was enhanced for potential relationships between vulnerable statements, thereby determining the importance of edges in the graph. Compared with the best-performing baseline method VulCNN among seven vulnerability detection baseline methods, EWVD achieves an increase of 1.06 percentage points in Accuracy and a decrease of 1.11 percentage points in False Positive Rate (FPR). It can be seen that EWVD refines the representation of edge weights and improves the overall performance of vulnerability detection.

    Multimedia computing and computer simulation
    Panoramic video super-resolution network combining spherical alignment and adaptive geometric correction
    Xiaolei CHEN, Zhiwei ZHENG, Xue HUANG, Zhenbin QU
    2026, 46(2):  528-535.  DOI: 10.11772/j.issn.1001-9081.2025030311
    Asbtract ( )   HTML ( )   PDF (1058KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional Video Super-Resolution (VSR) methods are ineffective in solving geometric distortion problems caused by equirectangular projection when processing panoramic videos, and have deficiencies in inter-frame alignment and feature fusion, which results in poor reconstruction quality. To further improve the super-resolution reconstruction quality of panoramic videos, a panoramic video super-resolution network combining spherical alignment and adaptive geometric correction, named 360GeoVSR, was proposed. In the network, accurate alignment and efficient fusion of inter-frame features were achieved through a Spherical Alignment Module (SAM) and a Geometric Fusion Block (GFB). In SAM, spatial transformation and deformable convolution were combined to address global and local geometric distortions. In GFB, feature alignment was corrected dynamically using an embedded Adaptive Geometric Correction (AGC) submodule, and multi-frame information was fused to capture complex inter-frame relationships. The results of subjective and objective comparison experiments on the extended ODV360Extended panoramic video dataset show that 360GeoVSR outperforms five representative super-resolution methods, including BasicVSR++ and VRT (Video Restoration Transformer), in both objective metrics and subjective visual effects, verifying its effectiveness.

    Low-overlap point cloud registration network integrating position encoding and overlap masks
    Xiaowei LA, Lihua HU, Jianhua HU, Xiaoling YAO, Xinbo WANG
    2026, 46(2):  536-545.  DOI: 10.11772/j.issn.1001-9081.2024121782
    Asbtract ( )   HTML ( )   PDF (1184KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    For the issues of low registration accuracy and high mismatch rate in low-overlap point cloud registration due to insufficient descriptive information of keypoint features and minimal overlapping regions, this paper proposed a low-overlap point cloud registration network that integrates position encoding and overlap masks was proposed to reduce mismatch rate and improve registration accuracy. First, a PointNet-based point-wise feature encoder was employed to extract keypoints, which were then enhanced by fusing their feature information, coordinate data, and position encoding to generate more discriminative keypoint descriptors. Second, the fused features were processed through self-attention and cross-attention modules to strengthen the descriptive power of point cloud features and enhance contextual interaction, thereby addressing the problem of insufficient keypoint descriptive information. Third, an overlap mask module was introduced after the attention modules to filter out keypoints from non-overlapping regions through learned masks, further reducing mismatch rate. Finally, optimal matching was achieved using the Sinkhorn algorithm, followed by refinement with the Iterative Closest Point (ICP) algorithm to enhance registration accuracy. Experimental results on the CODD and KITTI datasets, compared with various existing low-overlap point cloud registration methods, demonstrate that the network with ICP refinement performs superiorly. Specifically, on the CODD dataset, it reduces the Relative Translation Error (RTE) and Relative Rotation Error (RRE) by 53.29% and 42.72%, respectively, compared to the state-of-the-art method CoFiI2P (Coarse-to-Fine correspondences for Image-to-Point cloud registration), while improving the Registration Recall (RR) by 0.2 percentage points. The results indicate that the proposed network effectively extracts descriptive information from keypoint features and significantly improves point cloud registration accuracy in low-overlap scenarios.

    Low-light image enhancement network guided by reflection prior map
    Rifeng ZHANG, Guangming LI, Yurong OUYANG
    2026, 46(2):  546-554.  DOI: 10.11772/j.issn.1001-9081.2025020222
    Asbtract ( )   HTML ( )   PDF (11604KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, the low-light image enhancement methods based on deep learning are inspired by Retinex theory. First, the illumination map is estimated to adjust the brightness, and then the reflectance is restored to achieve low-light enhancement. Therefore, by analyzing the similarity between the low-light scene reflection map and reference reflection map, a low-light image enhancement Network guided by Reflection Prior map (RP-Net) was proposed. Firstly, the similar reflection map was generated by decomposing in Lab color space, and an Reflection Prior feature Adaptive Extractor (RPAE) was designed to re-encode and filter the guiding features from the similar reflection map in the backbone network at different scales. Then, the guiding information was injected into the backbone network through the designed Reflection Prior feature-Guided attention Block (RPGB). In addition, aiming at the limitations of traditional pixel-by-pixel L1 loss, a harmonic loss function of frequency domain was designed from the perspective of frequency domain analysis, so as to optimize the enhancement effect from the global spectral distribution. Experimental results on LOLv1, LOLv2 and LSRW datasets show that the proposed method is superior to the existing mainstream methods in Structural Similarity (SSIM), and has the Peak Signal-to-Noise Ratio (PSNR) 1.29 dB and 2.08 dB higher than that of Retinexformer and SAFNet (Spatial And Frequency Network), on the LOLv2-syn and LSRW datasets respectively, and performs well in balancing color fidelity and enhancement effect.

    Action quality assessment model based on trajectory-guided perceptual learning with X3D
    Sizhong ZHANG, Jianyang LIU, Linfeng LI
    2026, 46(2):  555-563.  DOI: 10.11772/j.issn.1001-9081.2025020158
    Asbtract ( )   HTML ( )   PDF (2842KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Action Quality Assessment (AQA) has attracted many researchers as a challenging visual task. Current research methods mainly focus on improving the feature extraction capability of backbone networks, ignoring the impact of motion trajectories. However, the consistency of the movements is also an important factor for evaluating execution of the movements in the real world. Firstly, in order to realize the interactive learning between different information, an AQA model with trajectory-guided perceptual learning was proposed by introducing trajectory information, which utilized trajectory descriptors to guide the model to learn information of the consistency of movements perceptually. Secondly, in order to solve the lack of trajectory labels in the current datasets, an unsupervised optical flow trajectory extraction method based on Farneback optical flow method was designed to obtain movement trajectory information, and the acquired optical flow trajectory features were used as cue words to guide the model to learn the video features perceptually. Finally, learnable spline curves of KAN (Kolmogorov-Arnold Network) were used to fit the data distribution of the mixed features, so as to establish a more accurate mapping relationship. The proposed model was evaluated experimentally on the MTL-AQA, AQA-7, FineDiving, and JIGSAWS datasets using Spearman rank Correlation (Sp.Corr) as the evaluation metric. The results show that the proposed model has the Sp.Corr of 0.910 1, 0.912 0, 0.882 0, and 0.990 0, respectively, which is 0.4%, 12.6%, 6.2%, and 57.1% higher than that of USDL (Uncertainty-aware Score Distribution Learning) model, respectively.

    Small object detection method based on improved DETR algorithm
    Jun WU, Chuan ZHAO
    2026, 46(2):  564-571.  DOI: 10.11772/j.issn.1001-9081.2025030277
    Asbtract ( )   HTML ( )   PDF (3580KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problem of low accuracy of DETR (DEtection TRansformer) in small object detection, a small object detection method based on improved DETR algorithm was proposed. Firstly, an improved MetaFormer combined with a multi-scale attention mechanism was adopted as the backbone network, aiming to solve the problems of weak extraction ability, low efficiency, and detail loss in small object feature extraction of backbone network ResNet-50, thereby enhancing the model’s representation capability for small objects. Secondly, a deformable attention decoder was introduced to solve the problems of slow convergence and limited feature space resolution in the Transformer attention module when processing image feature maps, so that the model was able to focus on key sampling regions around reference points, thereby accelerating the model convergence and improving detection accuracy for small objects. Finally, the Wise-IoU (WIoU) v3 loss function was incorporated for inability of the GIoU (Generalized Intersection over Union) loss function in evaluating prediction box quality, so that differentiated gradient gains were assigned to prediction boxes of varying qualities, thereby guiding the model to converge towards higher accuracy. Experimental results on the COCO2017 object detection dataset show that compared with DETR, the proposed method improves the average precision for small objects by 7.6 percentage points and the overall average precision by 4.7 percentage points, demonstrating superior detection precision of the proposed method.

    Hierarchical cross-modal fusion method for 3D object detection based on Mamba model
    Mingguang LI, Chongben TAO
    2026, 46(2):  572-579.  DOI: 10.11772/j.issn.1001-9081.2025030281
    Asbtract ( )   HTML ( )   PDF (3178KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issue that the existing cross-modal fusion methods based on Bird’s-Eye View (BEV) neglect effective preservation of local BEV feature information in the initial fusion stage, leading to insufficient shallow cross-modal interactions, so as to constrain subsequent deep fusion effect and reduce accuracy in 3D object detection, a hierarchical cross-modal fusion method for 3D object detection based on Mamba model was proposed. In the method, the state space modeling mechanism of Mamba was integrated with the hierarchical fusion mechanism deeply, so that cross-modal features were mapped into a hidden state space to facilitate interactions, thereby enriching local information, reducing discrepancies among modal features, and enhancing the consistency of fused feature representations. In the shallow fusion stage, a feature channel exchange mechanism was designed to exchange feature channels from different sensor modalities, thereby improving the preservation ability of shallow local details, and the Visual State Space (VSS) block of Mamba model was improved to strengthen interactions among shallow features. In the deep fusion stage, an attention mechanism and a gating mechanism were introduced to construct hidden feature transformation, so as to identify and fuse complementary long-range dependency features among modalities. Finally, a channel adaptive module was employed to calculate channel attention on normalized original features, and intra-modal channel relationships were learned adaptively to enhance single-modal BEV feature representations, thus compensating for Mamba model’s limitation in modeling inter-channel relationships. Experimental results show that the proposed method achieves superior detection performance compared to methods such as TransFusion and multi-modal fusion method combining local-global modeling LoGoNet (Local-to-Global Network) on the nuScenes and Waymo datasets. On the nuScenes test set, the proposed method has a mean Average Precision (mAP) of 72.4% and a nuScenes Detection Score (NDS) of 73.9%, surpassing the baseline method BEVFusion_mit by 2.2 and 1.0 percentage points, respectively.

    Multi-dimensional frequency domain feature fusion for human-object interaction detection
    Yuebo FAN, Mingxuan CHEN, Xian TANG, Yongbin GAO, Wenchao LI
    2026, 46(2):  580-586.  DOI: 10.11772/j.issn.1001-9081.2025020241
    Asbtract ( )   HTML ( )   PDF (1356KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The task of Human-Object Interaction (HOI) detection aims to identify all interactions between humans and objects in an image. Most existing research employs an encoder-decoder framework for end-to-end training, which relies on Absolute Positional Encoding (APE) heavily and has limited performance in complex multi-object interaction scenarios. To address the limitations of capturing relative spatial relationships between humans and objects due to reliance on APE, as well as the insufficient integration of local and global information in complex multi-object interaction scenarios, an HOI detection model was proposed by combining cross-dimensional interaction feature extraction with frequency domain feature fusion. Firstly, the conventional Transformer encoder was improved by introducing a Relative Position Encoding (RPE), and through the fusion of RPE and APE, the model was enabled to capture the spatial relationships between humans and objects. Then, a new feature extraction module was introduced to enhance image information integration by capturing interaction features across channel, spatial, and feature dimensions, while Discrete Cosine Transform (DCT) was applied to extract frequency domain features to capture richer local and global information. Finally, the Wise-IoU loss function was adopted to improve detection accuracy and class discriminative capability, thereby allowing the model to process targets of various categories more flexibly. Experiments were conducted on two public datasets, HICO-DET and V-COCO. The results show that the proposed model achieves an improvement of 0.95 percentage points in mean Average Precision (mAP) on all categories of the HICO-DET dataset and 0.9 percentage points in AP on scenario 1 of the V-COCO dataset, compared to the GEN-VLKT (Guided Embedding Network Visual-Linguistic Knowledge Transfer) model.

    Graph spatiotemporal learning model-based method for detecting dynamic changes of leg length discrepancy
    Tinghu WEI, Haoyan LIU, Jianning WU
    2026, 46(2):  587-595.  DOI: 10.11772/j.issn.1001-9081.2025020225
    Asbtract ( )   PDF (1252KB) ( )  
    References | Related Articles | Metrics

    To address the issue that the existing methods for detecting dynamic changes in Leg Length Discrepancy (LLD) under non-clinical environments fail to exploit the spatiotemporal features of LLD gait fully, a novel method based on a graph spatiotemporal learning model was proposed to detect LLD dynamic changes. In the method, the detection of LLD dynamic changes was considered as a gait graph pattern classification problem, and a spatiotemporal gait graph classification model integrating spatial graph convolution and Long Short-Term Memory (LSTM) network was constructed, so that the superior spatiotemporal graph topology learning ability was used to explore significantly different features hidden in LLD gait patterns, which improved generalization ability of the gait classification model effectively and detect LLD dynamic changes accurately. Skeletal data simulating different degrees of LLD were collected from 26 healthy subjects using Azure Kinect to build LLD gait patterns, which were used to evaluate the effectiveness of the proposed method. Experimental results show that the constructed model can explore significantly different features hidden in LLD gait graph patterns effectively with a classification accuracy of 99.62%, and its classification generalization performance is improved by 5.83 percentage points compared to that of the model combining Convolutional Neural Network (CNN) with LSTM, providing a novel technical solution for detecting LLD dynamic changes in outdoor environments accurately.

    Real-time face blurring method based on head skeleton point detection
    Ping HUANG, Qing LI, Haifeng QIU, Chengsi WANG, Anzi HUANG, Xiang ZHANG
    2026, 46(2):  596-603.  DOI: 10.11772/j.issn.1001-9081.2025020246
    Asbtract ( )   HTML ( )   PDF (1330KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In power monitoring scenarios, real-time monitoring and analysis of personnel behavior are crucial for ensuring the safe and stable operation of power systems. However, directly exposing facial information in monitoring videos without proper processing poses serious privacy risks. Traditional face detection-based blurring methods face challenges such as insufficient robustness and high computational costs in complex power environments, making them hard to meet both accuracy and real-time requirements. To address these issues, a real-time face blurring method based on head skeleton point detection was proposed. Firstly, a lightweight head skeleton point detection framework based on a hierarchical processing strategy was designed to locate personnel regions in compressed videos rapidly and stitch the cropped areas at original resolution to batch-detect head skeleton points of all the people, thus improving detection efficiency and accuracy. Secondly, an adaptive inter-frame optimization strategy was introduced, to use frame differencing to detect changes in the number of personnel quickly and adjust detection frequency dynamically by incorporating a tracking mechanism for personnel detection boxes, thereby reducing redundant computational overhead effectively. Finally, a prototype system for real-time face blurring was constructed on edge nodes, and its performance was validated through experiments. Experimental results indicate that taking the KAPAO-S model as an example, the proposed method improves the face blurring accuracy in monitoring videos by 3.6 percentage points and reduces the processing time per frame by 2.5 ms approximately compared to the original model, thereby ensuring accuracy and real-time performance at the same time.

    Vehicle trajectory anomaly detection based on multi-level spatio-temporal interaction dependency
    Feng HAN, Yongfeng BU, Haoxiang LIANG, Shuwen HUANG, Zhaoyang ZHANG, Shijie SUN
    2026, 46(2):  604-612.  DOI: 10.11772/j.issn.1001-9081.2025020234
    Asbtract ( )   HTML ( )   PDF (1904KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the complexity and dynamic nature of vehicle trajectory anomaly detection in intelligent transportation systems, a novel method named DSTGRU (Dynamic Spatio-Temporal Gated Recurrent Unit) was proposed on the basis of Multi-level Spatio-Temporal Interaction dependency Dynamic Graph (MSTIDG). In DSTGRU, by constructing dynamic graphs for short-term and long-term spatio-temporal interaction dependencies, the complex interactions between vehicles were captured effectively. In this process, the Multi-level Spatio-temporal interaction feature Fusion Bidirectional Gate Recurrent Unit (MSF-BiGRU) module was introduced to fuse multi-level spatio-temporal features, so as to integrate spatio-temporal information at different scales, thereby alleviating conflicts in shared information extraction and enhancing the model’s robustness, which improved the ability to identify anomalous trajectories. Experimental results demonstrate that DSTGRU outperforms the existing methods significantly on the TrackRisk and HighD datasets, achieving Pre@100 of 0.90 and 0.89, respectively, and AUROC of 0.913 and 0.827, respectively. Compared to existing advanced methods, DiffTAD and ImDiffusion, DSTGRU ranks first in multiple evaluation metrics. Additionally, DSTGRU exhibits strong robustness in complex scenarios, and identifies anomalous behaviors accurately, providing a solution for trajectory anomaly detection in intelligent transportation systems.

    Unsafe driving behavior detection under complex lighting conditions
    Quanjie LIU, Zhaoyi GU, Chunyuan WANG
    2026, 46(2):  613-619.  DOI: 10.11772/j.issn.1001-9081.2025020170
    Asbtract ( )   HTML ( )   PDF (2221KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    For real-time and effective detection of unsafe driving behaviors such as not wearing seatbelts and using mobile phones under various complex lighting conditions, a deep learning-based unsafe driving behavior detection method under complex lighting conditions was designed. In the method, with YOLOv8n model selected as the foundation, a series of targeted improvements were implemented to enhance the detection performance. Firstly, a P6 scale was added to enable the model to capture the diversity of unsafe driving behaviors under various lighting conditions more comprehensively. Secondly, Spatial Separable Adaptive Convolution (SSAC) module was used to replace the traditional convolution module in the backbone network, thereby achieving lightweight design while improving feature extraction accuracy. Thirdly, Channel Prior Convolutional Attention (CPCA) was introduced to enhance the network’s focus on important features effectively and improve feature expression capability. Finally, the Selective Attention Feature Fusion (SAFF) structure was used to replace the original YOLOv8n neck network, thereby further improving the mode’s comprehensive performance. Experimental results show that compared to the original model, the improved YOLOv8n model increases the overall mean Average Precision (mAP) by 2.17%; under normal lighting conditions, the improvement is 1.76%; in night scenes, the improvement is 1.75%; in backlit environments, the improvement is 2.42%. Meanwhile, the improved YOLOv8n reaches 118 Frames Per Second (FPS) in comparison with other models (such as YOLO11n, RT-DETR(Real-Time DEtection TRansformer)), balancing precision and speed, demonstrating distinct advantages.

    Frontier and comprehensive applications
    Hypergraph-based eaves tile dating method under data imbalance conditions
    Xing QIU, Zuxing XUAN, Kejia HUANG, Wen ZHANG, Xiao ZHUANG
    2026, 46(2):  620-629.  DOI: 10.11772/j.issn.1001-9081.2025010030
    Asbtract ( )   HTML ( )   PDF (2193KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address inefficiency and subjectivity of manual eaves tile dating methods, a Hypergraph-based Eaves Tile Dating method under Data Imbalance Conditions (HETD-DIC) was proposed to provide a more objective assistive tool for archaeological dating. Firstly, a dual-weight computation mechanism was designed, which means that a hyperedge weight computation module was used to aggregate associated node features, so as to generate hyperedge weights, and these hyperedge weights were then used to calculate node weights, which reduced the impact of imbalanced sample distribution. Secondly, a Hyperedge Node Relation Matrix (HNRM) was constructed to establish a feature encoding-decoding channel, so as to enhance the representation capability of nodes. Finally, extensive experiments were conducted to evaluate different models, and UniGIN (Unified Graph Isomorphism Network) was selected as baseline classification model due to its superior performance. Experimental results demonstrate that on the self-built eaves tile dataset, with only 20% of training data used, HETD-DIC achieves improvements of 4.67, 4.55, 4.67, and 5.09 percentage points, respectively, compared to UniGIN in accuracy, weighted precision, weighted recall, and weighted F1-score. It can be seen that HETD-DIC solves data imbalance problem effectively and provides reliable automatic assisted decision basis for archaeological dating.

    Adolescent scoliosis screening method considering class imbalance and background diversity
    Jie CAO, Lingfeng XIE, Bingjin WANG, Changhe ZHANG, Zidong YU, Chao DENG
    2026, 46(2):  630-639.  DOI: 10.11772/j.issn.1001-9081.2025020197
    Asbtract ( )   PDF (1752KB) ( )  
    References | Related Articles | Metrics

    Concerning the problems of the difference in the number of samples between classes and the diversity of image background caused by environmental factors in the screening method of adolescent scoliosis based on back images and deep learning, a scoliosis screening method was designed, including steps such as back image data enhancement, back region extraction, and scoliosis diagnosis. Firstly, an improved diffusion generation model based on double residual U-Net structure and Convolutional Self-Attention Mechanism (CSAM) was proposed to generate high-quality pseudo-samples for the minority class back images, so as to balance the class distribution. Secondly, a back region extraction model with multi-loss constraint balance was designed to identify and extract back region from the back image, so as to eliminate the influence of image background difference on the diagnosis model. Thirdly, based on the selective kernel feature extraction and Spatial Pyramid Pooling (SPP) technologies, a classification model was constructed to realize the early screening and severity diagnosis of scoliosis through the back region. Finally, by integrating the above methods, the computer software and mobile software were developed to facilitate the actual back image acquisition and scoliosis screening business. Experimental results show that on the self-made scoliosis dataset, the proposed method achieves 98.64% and 73.06% accuracy in scoliosis early screening and severity diagnosis tasks, respectively, which is 2.52 and 6.48 percentage points higher than that of ResNet101. It can be seen that the proposed method can complete the diagnosis of scoliosis conveniently and quickly, and has certain application scenarios in large-scale rapid screening of scoliosis.

    Method for retinal vessel segmentation and coronary artery disease prediction using optical coherence tomography angiography
    Kejian CUI, Zhiming WANG, Zhaowen QIU
    2026, 46(2):  640-651.  DOI: 10.11772/j.issn.1001-9081.2025020220
    Asbtract ( )   HTML ( )   PDF (5546KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Addressing the problem that the existing retinal vessel segmentation models lose topological information in 3D feature extraction, leading to branch breakage, poor continuity in 2D segmentation results, and missing cross-modal associations in vascular analysis and disease prediction, a collaborative framework, MA_DNet(Multi-scale topology-Aware Disease Network), was proposed. The framework consists of an enhanced segmentation model, MA_Net+, and a disease prediction module. Based on MA_Net(Multi-scale topology-Aware Network), an intermediate feature retraining module was introduced by MA_Net+ to refine vessel structures and reconnect broken branches. Firstly, the GMSF (Gated Multi-Scale Fusion) module was employed to extract multi-scale spatial convolutions and fuse complex branch features, and the ResMamba module was combined to model long-range topological dependencies within vessels, so as to enhance 3D feature representations, thereby suppressing topological breakage in segmentation results effectively. Then, the convolutional layers of 2D module MA_Net+ were used to further optimize the continuity of local vascular structure. Finally, a cascade prediction module was designed, combining morphological parameters with clinical indicators, so as to establish cross-modal associations between image features and Coronary Artery Disease (CAD) risk. Experimental results show that the MA_Net+ framework achieves a Dice score of 93.02% and a Jaccard index of 87.04% on one subset of the OCTA-500 public dataset, with improvements of 0.28 and 0.37 percentage points, respectively, compared to the IPN-V2+(Image Projection Network V2+) model; on another OCTA-500 subset, the MA_Net+ framework achieves the two indicators of 89.76% and 81.52%, respectively, with gains of 0.35 and 0.57 percentage points, respectively; the disease prediction module of the MA_Net+ framework achieves an AUC(Area Under Curve) of 86.23% on a private dataset. It can be seen that MA_DNet framework enhances the continuity of vascular segmentation effectively through 3D topological modeling and multi-scale fusion mechanism; meanwhile, the framework explores cross-modal correlation prediction between retinal images and CAD risks, offering a new solution for non-invasive cardiovascular diagnosis.

    Short-term wind power prediction using hybrid model based on Bayesian optimization and feature fusion
    Jincheng FU, Shiyou YANG
    2026, 46(2):  652-658.  DOI: 10.11772/j.issn.1001-9081.2025030317
    Asbtract ( )   HTML ( )   PDF (2425KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To enhance the accuracy of short-term wind power prediction, an xLSTM (extended Long Short-Term Memory)-Transformer model based on Bayesian optimization and feature fusion was proposed. In the proposed model, the temporal processing capability of Long Short-Term Memory (LSTM) network was integrated with the dynamic feature fusion ability of Transformer’s self-attention mechanism. By employing Bayesian optimization method, the hyperparameters of the model were optimized efficiently in a few iterations, thereby reducing the computational resources significantly. The experimental results demonstrate that, on a dataset from a wind farm in Inner Mongolia, when compared with the single LSTM model, Transformer model, Gated Recurrent Unit (GRU) model, and the xLSTM-Transformer model without Bayesian optimization and feature fusion, the proposed model achieves an average increase of 1.2% to 11.3% in the coefficient of determination (R2) compared to the benchmark models when the LookBack are 4 and 8; it also shows an average reduction of 12.8% to 38.4% in the Mean Absolute Error (MAE) and an average reduction of 8.6% to 35.8% in the Root Mean Square Error (RMSE). The results indicate that the proposed model exhibits higher prediction accuracy and stability under conditions of short historical input.

    Ground penetrating radar clutter suppression algorithm for airport runways
    Haifeng LI, Wenqiang LIU, Nansha LI, Zhongcheng GUI
    2026, 46(2):  659-665.  DOI: 10.11772/j.issn.1001-9081.2025020243
    Asbtract ( )   HTML ( )   PDF (968KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problem of complex background clutter and interference of strong inter-layer reflection in airport runway Ground Penetrating Radar (GPR) data, an improved U-Net-based deep learning algorithm for clutter suppression was proposed. In the algorithm, a detail enhancement module DE-Conv was introduced at the skip connections of U-Net to improve the network ability to capture details of target signals in multi-scale shallow features, and a feature-pixel dual-level fusion loss function was adopted using clutter-contaminated and clutter-free image pairs to optimize the training process. Specifically, high-dimensional features from both clutter-contaminated and clutter-free data were extracted by a shared-weight encoder, feature-level losses were computed to guide the training of the encoder, and pixel-level losses were computed using the images output by the decoder and the corresponding clutter-free simulated images to optimize the decoder’s performance. Experimental results show that the proposed algorithm achieves a Peak Signal-to-Noise Ratio (PSNR) of 37.114 7 dB and a Structural SIMilarity (SSIM) of 0.999 8 on a synthetic dataset, and an average Signal-to-Clutter Ratio (SCR) of 8.28 dB and Improvement Factor (IF) of 5.90 dB on a real airport runway dataset, outperforming the baseline model by 0.952 8 dB, 0.000 4, 6.58 dB, and 5.32 dB in these four metrics, respectively. Compared with Robust Nonnegative Matrix Factorization (RNMF), Robust Principal Component Analysis (RPCA), and the deep learning-based Clutter-removal Neural Network (CR-Net), the proposed algorithm has superior clutter suppression performance and computational efficiency. At the same time, a lot of ablation experimental results validate the effectiveness of the detail enhancement module and feature-pixel dual-level loss function to clutter removal and target signal recovery.

    Adaptive tracking and gradient circle detection method for load swing angle of bridge crane
    Weixiang HUA, Weimin XU
    2026, 46(2):  666-675.  DOI: 10.11772/j.issn.1001-9081.2025030298
    Asbtract ( )   HTML ( )   PDF (1779KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of non-uniform motion blur, illumination changes, and background interference in the detection for load swing angle of bridge cranes, a real-time monocular vision-based method for load swing angle detection was proposed. In this method, spherical markers were utilized as feature targets and a collaborative framework integrating Channel and Spatial Reliability Tracker with Adaptive Feature Tracking (CSRT-AFT) and Gradient Hierarchical Adaptive Circle Detection (GHACD) algorithm was constructed. Firstly, robust tracking was achieved by CSRT-AFT through a dynamic trajectory filtering and feature adaptation mechanism; an adaptive multi-modal trajectory filtering was designed, and the filtering strategy was switched intelligently based on curvature change rate and acceleration mutation index, so as to suppress intense motion-induced trajectory jitter; dynamic Oriented FAST and Rotated BRIEF (ORB) feature extraction, weighted K-Dimensional (K-D) feature screening, and Least Median of Squares (LMedS) were combined for feature matching and elastic template updating, thereby enhancing feature matching stability under motion blur and complex illumination. Secondly, for rapid and precise target localization, based on using image preprocessing for robustness enhancement, the GHACD algorithm achieves sub-pixel level fast circle detection through gradient field-guided circle center candidate generation, multi-stage probability sampling, and geometric constraint verification. Finally, a load swing angle measurement model was established on the basis of the bridge crane’s workspace. Experimental results demonstrate that this method can detect the load target’s swing angle stably under various trolley speeds and complex conditions including illumination changes and occlusions, and improve detection accuracy and real-time performance significantly.

    Efficiency optimization in superconducting quantum measurement and control based on variable trigger mechanism
    Zhiqiang FAN, Lixin WANG, Haoran HE, Geyuyan MA, Feng YUE
    2026, 46(2):  676-682.  DOI: 10.11772/j.issn.1001-9081.2025081022
    Asbtract ( )   HTML ( )   PDF (3538KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In superconducting Quantum Measurement and Control (QMC), the interval between consecutive measurements (referred to as the measurement gap) is a critical phase in the control sequence. It is typically configured to be several times the qubit energy relaxation time, significantly exceeding the duration of actual signal transmission and acquisition, which results in substantial idle waiting time. To address this issue, a variable trigger-based optimization method was proposed. Firstly, a temporal model of the QMC process was established, and the trigger period was identified as the key factor limiting throughput. And experimental results further verified a “non-sensitive interval” of QMC’s effectiveness to the trigger period. Based on this, a mechanism was designed to adjust the trigger period dynamically according to the actual qubit relaxation time without specialized chip design required by Reset gate or extremely low-latency control equipment needed in Restless measurement. Simulation results show that for typical system with relaxation time values between 100 μs and 500 μs, the proposed method can improve the throughput of QMC by 2-6 times, compared to the conventional method using a fixed trigger period (4 times of relaxation time). When relaxation time is increased to 1 000 μs, the predicted throughput is improved by 10 times approximately with QMC’s effectiveness not damaged. It can be seen that the proposed method provides a software-level solution to enhance the measurement and control efficiency of large-scale superconducting quantum systems without hardware modifications and being easy to integrate and deploy, demonstrating significant practical value.

2026 Vol.46 No.2

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF