Loading...

Table of Content

    10 May 2026, Volume 46 Issue 5
    Artificial intelligence
    ContraStacker: an ensemble approach for extremely imbalanced fraud detection
    Xingcan LI, Lizhong DING, Junyu ZHANG, Chunhui ZHANG
    2026, 46(5):  1363-1369.  DOI: 10.11772/j.issn.1001-9081.2025050692
    Asbtract ( )   HTML ( )   PDF (776KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Machine learning, relying on data modeling and feature recognition techniques, constructs social risk prediction models, enabling intelligent decision-making in risk prevention and control systems. However, fraud detection tasks are constrained by the severe imbalance between positive and negative samples. In cases of extreme imbalance, even if the model predicts all transactions as normal, the accuracy can still exceed 99%, while the detection rate of fraudulent transactions is close to zero. Moreover, a single model can only capture fraud features with specific dimensions and struggles to comprehensively predict multiple fraud patterns. To address this, a ContraStacker ensemble method was proposed to overcome data imbalance limitations, compensate for the shortcomings of a single model, and accurately identify various fraud patterns to improve fraud detection rate. ContraStacker balanced the data distribution through oversampling, undersampling, and their combined strategies, constructed multiple risk predictors, and integrated contrastive loss functions into the Stacking framework to deeply fuse model predictions and original features, enhancing the model's generalization ability, successfully tackling the challenge of extreme imbalance in fraud detection. Experimental results show that ContraStacker effectively reduces False Positive Rate (FPR) (the proportion of normal transactions predicted as fraudulent ones) while maintaining a low False Negative Rate (FNR) (the proportion of fraudulent transactions predicted as normal ones), demonstrating its potential for application in financial transaction security.

    Graph convolutional network enhanced by graph diffusion and dual-view feature learning
    Baoyuan ZHENG, Chaobo HE
    2026, 46(5):  1370-1377.  DOI: 10.11772/j.issn.1001-9081.2025050610
    Asbtract ( )   HTML ( )   PDF (876KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Graph Convolutional Networks (GCNs) have demonstrated significant potential in graph representation learning. However, existing methods still exhibit limitations in learning global topological relationships and fusing topological structure with attribute features. To address these challenges, a Graph Convolutional Network enhanced by Graph Diffusion and Dual-View feature learning (GCN-GDDV) was proposed. Firstly, a generalized graph diffusion mechanism was introduced to construct diffusion graphs containing global topological structure information. Then these diffusion graphs were combined with attribute-feature-based K-Nearest Neighbor (KNN) graphs to perform dual-view feature learning via GCN, capturing relationship dependencies in the global structure and the semantic similarities of node attributes, respectively. Finally, an attention network was designed to adaptively fuse topological structures and attribute features. Node classification experimental results on three benchmark graph datasets demonstrate that GCN-GDDV outperforms the suboptimal method, achieving average improvements of 1.78%, 1.60%, and 0.30% in accuracy, Macro-F1, and Micro-F1 metrics, respectively.

    Graph neural network framework for topology semantic dual-domain collaboration
    Kun FU, Haoyu WEI, Weijing LIU, Xing DANG, Zezheng LIU, Jianwei LI
    2026, 46(5):  1378-1387.  DOI: 10.11772/j.issn.1001-9081.2025050566
    Asbtract ( )   HTML ( )   PDF (1339KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Graph Neural Network (GNN) commonly faces the Under?Reachability problem (Under?Reach) when processing graph data with sparse and unevenly distributed labeled nodes, which means that distant unlabeled nodes cannot effectively receive supervised signals due to topological constraints, leading to limited model generalization ability. Although existing methods can partially address this issue, they still have limitations, including over?smoothing, high computational complexity, and noise sensitivity. Therefore, a GNN framework for topology semantic dual?domain collaboration, named TriMix, was proposed to address the above challenges through three key improvements. First, a dynamic mixing ratio mechanism was designed to adjust mixing weights between pseudo?labels and ground?truth labels adaptively across training epochs, relying on ground?truth labels for stable convergence in the early stage while incorporating high?confidence pseudo?labels gradually in the latter?stage training for decision boundary expansion. Second, a topology?semantic dual?domain collaborative node?weighted sampling strategy was constructed by integrating node degree, PageRank value, and feature similarity, so as to quantify node importance and optimize information propagation paths, enhancing the reachability of low?centrality nodes. Third, a contrastive learning module was implemented with a triple?level negative sample generation strategy of category?driving, feature?similarity weighting, and pseudo?label guidance, to refine the discriminability between positive and negative samples in the embedding space, thereby enhancing the semantic understanding of unlabeled data. Experimental results on benchmark datasets such as Cora and PubMed showed that TriMix achieved node classification accuracy 2.1% to 4.4% higher than baseline models like Graph Convolutional Network (GCN) and Graph ATtention network (GAT), with improved F1?score and generalization ability. The TriMix framework significantly improves learning efficiency on sparsely labeled graph data through dual?domain collaboration of topological structure and semantic features, providing a new approach to node classification tasks in complex graph structures.

    Multiple active learning method based on concept drift detection
    Xiaobo QI, Jing ZHANG, Ying SHI, Hui QI, Hangyuan DU
    2026, 46(5):  1388-1396.  DOI: 10.11772/j.issn.1001-9081.2025050659
    Asbtract ( )   HTML ( )   PDF (1231KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The real-time, unboundedness, and dynamically changing characteristics of data streams lead to time-varying data distributions, a phenomenon termed concept drift. Traditional methods for detecting and adapting to concept drift typically rely on the assumption of complete label availability. However, the prohibitively high cost of data annotation in real-world scenarios makes fully supervised learning approaches infeasible. Consequently, active learning is commonly utilized for classification tasks with scarce labels. Nevertheless, in streaming environments, factors such as concept drift and single-label strategies often introduce sampling bias into active learning. To address these challenges, a Multiple Active Learning method based on Concept Drift detection (MALCD) was proposed. An online deep neural network model incorporating dynamically weighted skip connections was designed and combined with a weakly supervised drift detection method to detect concept drift. At the same time, multiple sampling strategies were incorporated to apply differentiated processing strategies across different sample regions. By integrating multiple active learning methods with concept drift detection techniques, this method can precisely select data with high uncertainty and categorical diversity while efficiently avoiding redundancy. Experimental results on eight real-world and synthetic datasets demonstrate that MALCD achieved the highest average ranking in cumulative accuracy compared to Online Ensemble Adaptive Classification (AC_OE) method, Weakly Supervised Concept Drift Detection (WSCDD) method, etc. This indicates that the MALCD can quickly learn new concept distributions after drift occurs, thereby enhancing the model's overall generalization performance.

    HEFSL: high-efficient federated split learning framework for edge heterogeneity
    Hao YU, Jing FAN, Enkang XI, Yadong JIN, Hua DONG, Yihang SUN
    2026, 46(5):  1397-1407.  DOI: 10.11772/j.issn.1001-9081.2025050601
    Asbtract ( )   HTML ( )   PDF (987KB) ( )  
    Figures and Tables | References | Supplementary Material | Related Articles | Metrics

    Federated Learning (FL) in edge-heterogeneous environments faces challenges such as significant disparities in terminal computing capabilities, inconsistent data distributions, and high communication overhead, which severely constrain its deployment and application in practical intelligent systems. To address these issues, a High-Efficiency Federated Split Learning (HEFSL) framework was proposed by integrating the advantages of FL and Split Learning (SL). Through a triple mechanism of “model partitioning, client selection, and dual-layer aggregation”, HEFSL achieves joint optimization of system and statistical heterogeneity. In the HEFSL framework, an Adaptive Splitting Strategy (ASS) was first introduced to dynamically determine the model partition structure based on the computing capacity of each client, alleviating the straggler effect. Secondly, a Client Diversity-based Heuristic Selection (CDHS) mechanism was designed, using a low-complexity label-entropy-driven strategy to enhance data representativeness. Finally, an Asynchronous Dual-end Aggregation (ADA) scheme was developed to enable layered asynchronous updates between clients and edge servers, breaking the synchronous communication bottleneck and accelerating model convergence. The theoretical section provided a rigorous analysis and proof of the convergence and error bounds of the HEFSL framework. Experimental results on three datasets with label heterogeneity characteristics, FMNIST, CIFAR-10, and CIFAR-100, showed that HEFSL achieved the highest model accuracy, outperforming FedAvg, FedProx (Federated Proximal), MOON (MOdel-cONtrastive learning), Federated Split learning (SplitFed), SplitMix (Split Mixing), and FedCRS (Federated Cluster-based Round Splitting) by at least 4.3, 10.5, and 4.1 percentage points, respectively. Additionally, its convergence speed was improved by at least 78.6%, 89.8%, and 64.5%, respectively. HEFSL exhibits significant advantages in distributed collaborative intelligence for edge-heterogeneous scenarios, providing a practical pathway for efficient and scalable federated learning in resource-constrained environments with strong engineering adaptability and application prospects.

    Improved DeepLabV3+ method based on adaptive attention and nested receptive field
    Changzheng XING, Xin ZHENG, Di JIA, Junfeng LIANG
    2026, 46(5):  1408-1415.  DOI: 10.11772/j.issn.1001-9081.2025050595
    Asbtract ( )   HTML ( )   PDF (1402KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problems of high complexity and low segmentation accuracy for certain classes in DeepLabV3+ caused by atrous convolutions with different dilation rates, an improved method that integrates Evolutionary Nested Receptive Field (ENRF) module with Adaptive Class-Channel Attention (ACCA) mechanism was proposed. In this method, the original Atrous Spatial Pyramid Pooling (ASPP) module was replaced by ENRF module, and ACCA mechanism was incorporated into the fused features, enabling continuous expansion of receptive field and more fine-grained feature representation, and reducing the number of parameters and computational overhead to enhance the model’s efficiency and lightweightness. Firstly, ACCA mechanism was constructed by combining channel-adaptive and class-adaptive attention mechanisms, which exploited inter-channel and inter-class feature dependencies to strengthen the representation of critical information in feature maps. Secondly, ENRF module was designed by introducing convolution kernels of different sizes and dilation rates, forming a nested evolutionary receptive field structure that gradually enlarged the receptive field to capture multi-scale contextual information and fine-grained boundary details. The improved method was compared with Fully Convolutional Network with 8s skip connections (FCN8s), Pyramid Scene Parsing Network (PSPNet), Unified Perceptual parsing Network (UPerNet), Bilateral Segmentation Network Version 2 (BiSeNet V2), Deep Feature Aggregation Network (DFANet), and the original DeepLabV3+ in terms of FLOPs (FLoating-point OPerations), parameter count, mean Intersection over Union (mIoU), inference speed, and memory usage. Experimental results show that the improved DeepLabV3+ reduces parameters and FLOPs, accelerates inference, and improves segmentation performance.

    Supervised contrastive generative sentiment analysis method with uncertainty-aware unlikelihood learning
    Dirui ZHANG, Jiayu LIN, Zuhong LIANG
    2026, 46(5):  1416-1423.  DOI: 10.11772/j.issn.1001-9081.2025050581
    Asbtract ( )   HTML ( )   PDF (697KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Existing models still face multiple challenges in Aspect Sentiment Quad Prediction (ASQP) task. They have difficulty in dealing with implicit sentiment expressions (such as implicit aspects or opinions) that lack explicit lexical cues, making it difficult for models to accurately capture sentiment tendencies. A quad prediction is considered correct only when all predicted elements of this prediction exactly match the correct elements. However, models may generate easily confused synonyms or synonymous words, leading to completely incorrect quad predictions. Moreover, existing models focus on improving the probability of predicting correct words, ignoring the suppression of easily confused words. Additionally, the cross-entropy loss used by these models makes them overconfident about incorrect predictions, lacking uncertainty modeling and thus failing in actively suppressing high-risk errors. These problems limit the performance of existing models in Aspect-Based Sentiment Analysis (ABSA) tasks. To address these problems, a Supervised Contrastive generative sentiment analysis method with Uncertainty-Aware Unlikelihood Learning (SCUAUL) was proposed. Firstly, supervised contrastive learning was used to shorten the semantic space distance of similar samples (e.g., same sentiment polarity) through contrastive loss, enhancing the model's ability to distinguish key features (e.g., sentiment polarity, implicit aspects) of input data. Secondly, Monte Carlo Dropout (MC Dropout) was used to capture the model's inherent uncertainty and identify easily confused words. By marginalizing unlikely learning, the generation probability of easily confused words was dynamically suppressed while maintaining the probability of generating correct words, and a minimum entropy constraint was combined to balance generation diversity and accuracy. Average results of five experiments on the Rest15 and Rest16 datasets showed that, compared with the suboptimal model AugABSA (data Augmentation by text generation for ABSA) and the classic model PARAPHRASE, SCUAUL improved precision by 0.40, 3.98 and 0.38, 3.83 percentage points, the recall by 0.30, 2.87 and 0.48, 2.88 percentage points, and the F1 score by 0.35, 3.43 and 0.42, 3.37 percentage points, respectively, verifying the effectiveness of SCUAUL in ABSA tasks.

    Word sense disambiguation method of modal verbs based on causal partial order diagram
    Jilin FU, Jianping YU, Tao ZHANG, Weihua XU, Enliang YAN, Liyang WANG
    2026, 46(5):  1424-1432.  DOI: 10.11772/j.issn.1001-9081.2025050624
    Asbtract ( )   HTML ( )   PDF (751KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The semantic analysis of modal verbs faces many challenges due to their inherent complexity, requiring the extraction of diverse feature sets for Word Sense Disambiguation (WSD), including semantic, syntactic, pragmatic, and genre features. These features vary in the contribution to semantic disambiguation, and some are confusing or redundant. To eliminate the influence of confusing and redundant features on WSD, a word sense disambiguation approach based on Causal Partial Order Diagram (CPOD) was proposed. Learning from the concept of intervention in causal reasoning, eliminating confusing factors through do-calculus, and combining with the idea of constructing an Attribute Partial Order Diagram (APOD), a CPOD was developed as a model for WSD of modal verbs. Results of WSD experiments on 15 English modal verbs showed that the proposed method achieved an average accuracy of 93.42%, eliminating confusing and redundant features. Furthermore, to quantify the specific contribution of each feature to WSD, each feature's contribution was calculated and ranked. It is found that semantic features contribute the most, followed by syntactic features, while genre features contribute relatively less. Specifically, in semantic features, features with low mutual information contribute much more to WSD than those with high mutual information, making them the most influential features.

    Adaptive multi-feature fusion detection method for AI-generated text
    Jiali ZHENG, Gang ZHOU, Jing CHEN, Shunhang LI
    2026, 46(5):  1433-1440.  DOI: 10.11772/j.issn.1001-9081.2025050657
    Asbtract ( )   HTML ( )   PDF (1662KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problems posed by highly realistic AI-generated text, driven by the rapid development of Large Language Models (LLMs), and the performance degradation of traditional detection methods, an adaptive multi-feature fusion detection method for AI-generated text was proposed. Firstly, a language style feature set covering text statistical features, language structural features, and language uncertainty features was constructed to capture differences between real and AI-generated texts; then, deep semantic features of texts were extracted using independent encoding technology. Based on these, a dual-path mapping feature-adaptive fusion strategy was designed: language-style features and deep semantic features were first fused at a primary level, and secondary fusion was then performed using deep learning to enhance the capability of adaptive feature fusion. Experimental results demonstrate that the proposed method achieves detection accuracies of 98.1% on the Chinese SocialAI-Detect dataset and 98.5% on the English TuringBench dataset; compared with the best-performing baseline, J-Guard (Journalism Guided adversarially robust detection of AI-generated news), the improvements are 2.3 and 2.1 percentage points, respectively, verifying the effectiveness of the proposed method.

    Dual-channel feature fusion representation method for short-text clustering based on large language model
    Qianfei WANG, Yang LI, Deyu LI, Suge WANG
    2026, 46(5):  1441-1449.  DOI: 10.11772/j.issn.1001-9081.2025050716
    Asbtract ( )   HTML ( )   PDF (833KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the problems of insufficient global semantic representation and weak local discriminability in current short-text clustering methods, a Dual-Channel Feature Fusion representation method for short-text clustering based on Large Language Model (LLM), named DCFF, was proposed. From a global perspective, a semantic-enhanced pseudo-label contrastive learning module was established, in which the LLM-generated keyword phrases were dynamically weighted and fused with original texts to enrich representations. Furthermore, high-confidence pseudo-labels were produced via self-adaptive optimal transport, while intra-cluster compactness and inter-cluster separation constraints were integrated into end-to-end training to achieve globally consistent embeddings. From a local perspective, a triplet representation optimization module based on entropy and discrepancy was established, which filtered high-informativeness samples via entropy and discrepancy. The embedding model was then fine-tuned with a confidence-weighted loss and a denoising mechanism to generate a vector representation with strong local discrimination. Finally, the global and local representations were fused using self-attention mechanism for direct application in clustering algorithms. Comparative experimental results on eight public short text clustering datasets against mainstream baselines showed that DCFF outperformed the baselines in accuracy on all datasets, achieving the lowest improvement of 3.19 percentage points on the GoogleNews-T dataset; in Normalized Mutual Information (NMI), DCFF outperformed the baselines on six datasets, achieving the lowest improvement of 3.46 percentage points on the SearchSnippets dataset. The experimental results demonstrate that DCFF is well-suited for clustering tasks in various scenarios.

    Continual few-shot event detection model based on hierarchical adaptive fusion mechanism and category boundary distillation
    Jie HU, Tong XU, Yan ZHANG
    2026, 46(5):  1450-1459.  DOI: 10.11772/j.issn.1001-9081.2025050583
    Asbtract ( )   HTML ( )   PDF (838KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the challenges of catastrophic forgetting and limited generalization in Continual Few-shot Event Detection (CFED), a new CFED model based on hierarchical adaptive fusion mechanism and category boundary distillation was proposed. Firstly, feature reconstruction was introduced by combining global average pooling with a learnable mapping to enhance the structural modeling of text representations and optimize feature distribution. Secondly, a hierarchical adaptive fusion mechanism was designed to dynamically integrate shallow, intermediate, and deep features from the pretrained model. Gaussian perturbation was introduced to improve feature robustness, and a self-attention mechanism was employed to achieve adaptive cross-layer feature weighted fusion. Finally, a category-boundary distillation strategy was proposed, which aligned the class distributions of old and new tasks using KL (Kullback-Leibler) divergence and refined the decision boundary features via cosine similarity, effectively mitigating knowledge forgetting. Experimental comparisons with 9 baseline models and the large language model GPT-3.5-Turbo were conducted on the MAVEN and ACE2005 datasets. On MAVEN, the proposed model achieved average F1 value improvements of 2.92 and 1.80 percentage points over the suboptimal model HANet (Hierarchical Augmentation Networks) across 5 subtasks under the 4-way 5-shot and 4-way 10-shot settings, respectively; on ACE2005, it outperformed the suboptimal models HANet and Combined Retrain by 1.83 and 2.00 percentage points across 5 subtasks under the 2-way 5-shot and 2-way 10-shot settings, respectively. Compared to GPT-3.5-Turbo, the proposed model achieved average F1 score improvements of 3.47 and 8.77 percentage points on MAVEN, and 4.47 and 2.39 percentage points on ACE2005 under 2-way 1-shot and 2-way 2-shot settings, respectively. The results demonstrate the superior performance of the proposed model.

    Judicial element extraction method by integrating global and local semantics
    Yuqian HUANG, Hui HUANG, Yongbin QIN, Ruizhang HUANG, Yanping CHEN, Yulin ZHOU, Qian SUN
    2026, 46(5):  1460-1467.  DOI: 10.11772/j.issn.1001-9081.2025050558
    Asbtract ( )   HTML ( )   PDF (1660KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Judicial information extraction aims to identify fine-grained key elements in judicial documents, helping legal professionals efficiently manage large volumes of paperwork. Compared to general domains, elements in judicial documents are typically longer and semantically more dispersed, while fine-grained requirements place particularly strict demands on local detail extraction, making the model capable of handling long-range dependencies and precisely capturing fine-grained local semantic information. To address this challenge, a judicial element extraction method integrating global and local semantics was proposed. Firstly, element labels were concatenated with the content of judicial documents, and deep embeddings were generated using the BERT (Bidirectional Encoder Representations from Transformers) model. Secondly, a self-attention mechanism was introduced to enhance the model's comprehension of global context, while an adaptive multi-head attention mechanism dynamically adjusted attention weights to better capture rich, precise semantic features at the local level. Finally, to improve the model's generalization performance in identifying element boundaries, a combined loss function was designed that incorporated binary cross-entropy and KL (Kullback-Leibler) divergence with Gaussian-smoothed boundaries. Experimental results show that compared with sequence labeling methods, span-based extraction methods, and other methods, the proposed method achieves improvements in the F1 score on both the LAIC2023 and CAIL2021 legal element extraction datasets. Specifically, it outperforms the second-best model, DiffusionNER, by 2.88 percentage points on the LAIC2023 dataset, and on the CAIL2021 dataset, it outperforms the second-best Machine Reading Comprehension (MRC) model by 1.01 percentage points.

    Deep learning-based patent value evaluation for power grid enterprises
    Xing SHENG, Sunxian WENG, Kuosong CHEN, Zhongping WANG, Ruifeng REN, Yong LIU
    2026, 46(5):  1468-1474.  DOI: 10.11772/j.issn.1001-9081.2025070850
    Asbtract ( )   HTML ( )   PDF (756KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Patent value evaluation is a crucial tool for optimizing resource allocation and guiding intellectual property strategy decisions. However, traditional manual evaluation methods are limited by subjective expert experience and low evaluation efficiency, making it difficult for enterprises in the digital economy era to meet the large-scale demand for patent value evaluation. In recent years, machine learning technologies, with their powerful high-dimensional feature extraction capabilities, have provided a feasible technological approach to innovating patent value evaluation paradigms. However, existing studies mainly focus on small-scale models within single technical dimensions and do not fully explore the potential of Large Language Models (LLMs) for quantifying value indicators. Moreover, current methods struggle to handle cases where some patent data indicators are missing. To address the challenges of processing unstructured textual data and incomplete patent value indicators in the patent database of power grid enterprises, a deep learning-based patent value evaluation method for power grid enterprises was proposed. It used Large Language Model (LLM) technology to process unstructured textual information in power grid enterprise patents; adopted a Semi-Supervised Learning (SSL) paradigm to expand the labeled patent database used for training; and employed ensemble learning techniques to train the model on the power grid enterprise patent database and conduct patent value evaluation. Empirical results demonstrate that the proposed method can efficiently evaluate the patent value of power grid enterprises with low evaluation error.

    Data science and technology
    Competitive loss-driven generative imbalanced node classification
    Fengwei CHENG, Bingqi ZHANG, Guohua XU, Wenjian WANG
    2026, 46(5):  1475-1481.  DOI: 10.11772/j.issn.1001-9081.2025050656
    Asbtract ( )   HTML ( )   PDF (1091KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Graph Neural Networks (GNNs) have achieved significant success in node classification tasks, but their performance typically relies on abundant labeled data in majority classes, which may lead to representation bias for nodes belonging to minority classes with scarce labels. Traditional oversampling techniques mitigate class imbalance by replicating minority samples, but they can easily lead to local neighborhood overfitting. Recent approaches have attempted to synthesize new nodes based on minority-class anchors, but they have failed to fully exploit relationships between minority and adjacent classes, resulting in blurred class boundaries in the generated samples. To address the above challenges, a Competitive loss-driven Generative imbalanced node classification algorithm (GraphCG) was proposed. A feature-structure collaborative auxiliary node selection mechanism was designed to precisely identify auxiliary points from neighboring classes that can enhance class boundaries. Furthermore, a competitive boundary-constrained loss function was constructed to enforce the maintenance of geometric boundary separability between generated nodes and majority classes in the embedding space. Experimental results showed that, compared to current state-of-the-art methods, GraphCG achieved significant improvements across multiple class-imbalanced datasets.GraphCG not only enhances data diversity but also improves class separability, preventing minority classes from being overshadowed by majority classes.

    Distributed multi-label feature selection method with feature-label neighborhood collaborative correlation
    Xipei TAO, Hengrong JU, Xiaoxue FAN, Xiaoyang ZOU, Weiping DING
    2026, 46(5):  1482-1489.  DOI: 10.11772/j.issn.1001-9081.2025050567
    Asbtract ( )   HTML ( )   PDF (834KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Traditional multi-label neighborhood rough sets treat all labels as a whole when calculating feature importance, failing to effectively distinguish the differences in contribution to feature selection among different labels and ignoring the noise interference caused by irrelevant labels. To address these issues, a Distributed Multi-Label feature selection method with Feature-label Neighborhood Collaborative Correlation (DML-FNCC) was proposed. Firstly, bidirectional spectral clustering was utilized to simultaneously mine the internal associations between labels and feature spaces: decision-representative primary label clusters were extracted in the label space to reduce noise interference, while a spectral clustering map based on semantic relevance was constructed in the feature space to achieve modular aggregation of semantically correlated features. Secondly, neighborhood dependency was employed to quantify the association degree between feature clusters and label clusters, selecting the feature subsets most closely related to each label cluster. Finally, a distributed framework was adopted to distribute computational tasks across multiple nodes, further accelerating the model training process. Experimental results on 12 public datasets demonstrate that DML-FNCC outperforms existing multi-label feature selection approaches, such as PMLFS (Partial Multi-Label Feature Selection) and WFDP (Weak-label Fuzzy Discernibility Pairs). It achieves the top ranking in terms of average precision, Hamming loss, one error, ranking loss, and coverage, leading to improved classification performance.

    Construction and recommendation application of expert communities based on user-centric approach
    Shijie YANG, Zhonghui LIU, Fan MIN
    2026, 46(5):  1490-1498.  DOI: 10.11772/j.issn.1001-9081.2025050639
    Asbtract ( )   HTML ( )   PDF (740KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues of redundant recommendations and high time complexity in community-based recommendation methods under network formal contexts, a user-centric Expert Community Construction Algorithm (ECCA) and an Expert Community-Based Recommendation Algorithm (ECBRA) were proposed. Firstly, a screening method for expert nodes was determined by defining comprehensive node influence based on both network structure and positive rating; nodes with stronger comprehensive influence were selected as expert nodes. Secondly, the expert community for each user was constructed, consisting of the user node and all its adjacent expert nodes, ensuring that each user had an independent expert community. Finally, recommendations were made based on expert communities. Combined with user preferences, dynamic and static recommendation confidence thresholds were designed, and their sum was served as the recommendation confidence threshold. Personalized recommendation for each user was generated by calculating the recommendation confidence. Comparison results with the Group Recommendation Algorithm based on Weak-concept Similarity (GRAWS) showed that ECBRA's total time consumption was only 0.1% of GRAWS's, with no recommendation redundancy. Compared to classic collaborative filtering algorithms, including k-Nearest Neighbor (kNN) and Item-Based Collaborative Filtering (IBCF), as well as Concept Set Based Recommendation (CSBR), Concept Set based-Personalized Recommendation algorithm (CSPR), and the combination of GreConD and kNN (GreConD-kNN) algorithms on 9 real datasets, ECBRA achieved better performance. Specifically, on the Netflix2 dataset, compared with CSPR, ECBRA had recall increased by 63.3%, F1-score increased by 13.9%; compared with the kNN algorithm, ECBRA had F1-score increased by 62.7%. Overall, ECBRA offers low redundancy and low time complexity.

    Long time series prediction based on hybrid self-attention and differentiated normalization
    Ruirui SONG, Leichun WANG, Yunping HE, Jinxiang WEI, Xiangfeng LU, Xiaomeng LIU
    2026, 46(5):  1499-1506.  DOI: 10.11772/j.issn.1001-9081.2025050628
    Asbtract ( )   HTML ( )   PDF (850KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems of error accumulation, modeling difficulty, and low computational efficiency in long time series prediction, a long time series prediction model based on hybrid self-attention and differentiated normalization, namely HSADN (Hybrid Self-Attention and Differentiated Normalization), was proposed. Firstly, the model used a stacked multi-head self-attention mechanism in the encoder to capture long-distance dependencies in time series, thereby reducing computational complexity, and used a multi-layer sparse self-attention mechanism in the decoder to dynamically adjust the generation strategy. Secondly, in the encoder, Batch Channel Normalization (BCN) was used to extract, fuse, and reconstruct the features, while in the decoder, Layer Normalization (LN) was adopted to alleviate the gradient vanishing and improve the training stability, generating predicted sequence values. Experimental results show that compared with CALF (Cross-modAl Large Language Model Fine-tuning) model, HSADN has the Mean Squared Error (MSE) and Mean Absolute Error (MAE) of univariate prediction reduced by 6.2% and 6.9% on ECL-960, respectively, and by 13.1% and 2.9% on ETTh-720, respectively; the MSE and MAE of multivariate prediction reduced by 3.5% and 2.6% on ETTm-672, respectively, and by 1.8% and 0.9% on Weather-720, respectively; the running time for univariate and multivariate predictions reduced by an average of 4.6% and 28.7%, respectively.

    Time-interdependency-aware dynamic Bayesian network for traffic prediction
    Huijie GUO, Tianfeng DOU, Zhenlin ZHANG, Kaiyuan QI, Dong WU, Zhijian QU, Zhao LI, Chongguang REN
    2026, 46(5):  1507-1517.  DOI: 10.11772/j.issn.1001-9081.2025050570
    Asbtract ( )   HTML ( )   PDF (1117KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Accurate traffic forecasting not only improves the efficiency and safety of the traffic system, but also promotes the sustainable social and economic development. Although a large number of studies have been devoted to modeling spatiotemporal correlation, existing methods still have significant limitations: most models tend to collectively predict the traffic flow of all regions in all time periods, ignoring spatio-temporal heterogeneity, especially the impact of the traffic status of the current region on the future traffic status of related regions. To address this problem, a Time-Interdependency-aware Dynamic Bayesian Network for traffic prediction (TIDBN) method was proposed. Using pre-trained modules, TIDBN employed a time-varying dynamic Bayesian network to capture the complex temporal relationships in time-series data arising from simultaneous and lagged effects. To further improve its ability to capture spatio-temporal correlation, a spatio-temporal attention mechanism was introduced for in-depth analysis. Subsequently, a Graph Convolutional Network (GCN) was utilized to model the spatio-temporal topological structure, generating more accurate traffic predictions. The experimental results show that TIDBN performs excellently on two real traffic prediction tasks, especially for 1-hour prediction. On the PeMS-BAY dataset, the Mean Absolute Error (MAE) of TIDBN is 4% lower than that of the second-best baseline method.

    Cyber security
    VLMDs-Privacy: privacy-enhanced strategy for cooperative decision-making in socially-aware multi-agent systems
    Yunle WANG, Xiang FENG, Huiqun YU
    2026, 46(5):  1518-1525.  DOI: 10.11772/j.issn.1001-9081.2025050654
    Asbtract ( )   HTML ( )   PDF (1329KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Socially-aware Multi-Agent Systems (MASs) rely on leadership structures to enhance collaborative decision-making efficiency, but are vulnerable to privacy leakage risks caused by the analysis of state trajectories in Markov Decision Processes (MDPs), exposing critical nodes to targeted attacks. To address these challenges, a Virtual Leader Minimal Dependency Privacy protection strategy (VLMDs-Privacy) was proposed, which achieved secure and efficient collaborative decision-making in MAS through the following methods: 1) A State Transition Adaptive Differential Privacy mechanism (STADP) was designed to establish a dynamic mapping between state-transition probabilities and privacy budgets, protecting MDP state trajectories from reverse inference attacks; 2) A Virtual Leader Minimal Dependency strategy (VLMDs) was developed to reduce reliance on virtual leaders while achieving globally optimal decision-making, thereby significantly improving resistance to single-point failures; 3) A privacy-efficiency dual-regulation mechanism was constructed to dynamically allocate privacy budgets based on agent behavior credibility, achieving an adaptive trade-off between social awareness and privacy protection. Experimental results showed that under a strong privacy constraint (ε= 0.1), VLMDs-Privacy achieved an average arrival success rate of 94.2% in navigation and dynamic maintenance scenarios, outperforming the conventional leader-based differential privacy scheme VLDPs-Privacy (27.9%) by 66.3 percentage points, with only a 3.3% drop compared to non-private settings. These findings validate the robustness of VLMDs-Privacy in maintaining system collaboration capability and privacy preservation efficiency under strong privacy constraints, providing theoretical and technical support for collaborative decision-making in privacy-sensitive MAS deployed in open environments.

    Markov chain model and profit analysis of two-selfish-miner strategy in ethereum classic network
    Junling WANG, Junjun BIAN, Jianian LIU, Zhiqiang XU
    2026, 46(5):  1526-1533.  DOI: 10.11772/j.issn.1001-9081.2025050596
    Asbtract ( )   HTML ( )   PDF (876KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Selfish mining disrupts the normal consensus process of blockchain networks by concealing mining and delaying the release of new blocks, resulting in increased fork rate and reduced system efficiency. To quantitatively analyze the impact of selfish mining on the EThereum Classic (ETC) network under a multi-attacker environment, a Markov chain model with multiple attackers was constructed aiming at the ETC network's unique uncle block and nephew block reward mechanism, and the revenue dynamics of miners under different scenarios were quantitatively analyzed. Experimental results show that when the attacker's computing power reaches 0.3, a coordinated attack by two selfish mining pools increases the stale block rate from 26.35% to 36.21% and reduces the system throughput (Transactions Per Second (TPS)) from 20.15 to 16.44. Compared with the case of a single-selfish-miner, this attack strategy further increases the attacker's relative revenue while exacerbating the problems of harming honest miners' revenue, increasing the stale block rate, and decreasing the system efficiency. The above reveals the complex revenue mechanism of multi-attacker selfish mining and provides a theoretical and quantitative basis for designing targeted defense strategies.

    Multimedia computing and computer simulation
    VU-RED-F: improved CAD model replacement for U-RED single-view point clouds
    Gengxin FAN, Huiyan HAN, Liqun KUANG, Ziyang JIN, Huafeng ZHAO
    2026, 46(5):  1534-1544.  DOI: 10.11772/j.issn.1001-9081.2025050575
    Asbtract ( )   HTML ( )   PDF (4107KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In robotic environmental perception tasks, single-view point clouds suffer from severe geometric information loss due to sensor viewpoint limitations. Point cloud reconstruction methods based on Computer-Aided Design (CAD) model replacement can avoid the risks of structural instability associated with reconstruction directly from point clouds by retrieving similar models and applying deformation. Unsupervised 3D shape REtrieval and Deformation (U-RED) algorithm achieves topologically consistent CAD model replacement while maintaining the editability of the reconstruction results. However, when dealing with objects with complex topology, it still faces problems such as insufficient rotation and translation invariance in point cloud representations, difficulty in distinguishing neighboring components due to geometric similarity among homologous components, and parameter update failures caused by scattered attention weights and gradient vanishing or explosion. To address these challenges, a Vector neuron enhanced Unsupervised REtrieval and Deformation algorithm with Feature affine residual (VU-RED-F) was proposed based on U-RED. Firstly, a Vector Neuron Encoder (VNE) was constructed to improve the robustness of the feature extraction module in representing rotation and translation invariance of point clouds. Secondly, learnable affine transformation residuals were introduced to reconstruct the feature mapping process, adaptively adjust the feature distribution, and enhance the network's ability to discriminate local geometric structures between components. Finally, by integrating soft-threshold gating and residual correction, the stability of gradient propagation was enhanced while constraining the sparsity of the attention distribution, thereby boosting network convergence efficiency and reducing loss during retrieval and deformation. Experimental results on the synthetic PartNet and ComplementMe datasets, as well as the real Scan2CAD dataset, show that the VU-RED-F algorithm has the lowest average chamfer distance (cd) loss, improving the fidelity of local geometric details in CAD models.

    3D part assembly method based on line drawing segmentation
    Huaze ZHU, Weihao WANG, Mingyu YOU, Hongjun ZHOU
    2026, 46(5):  1545-1550.  DOI: 10.11772/j.issn.1001-9081.2025050711
    Asbtract ( )   HTML ( )   PDF (872KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Three-dimensional (3D) part assembly is an important task in 3D computer vision. It aims to estimate the poses of a set of 3D parts and accurately combine them into a target structure. However, existing methods mainly rely on large-scale data for training and learn from past experience to complete assembly, resulting in weak generalization and poor adaptability to new assembly tasks. To address the problem of insufficient generalization in 3D part assembly, assembly instructions with line drawings were introduced as auxiliary information, with the expectation that robots could establish correspondence between 3D parts and regions in 2D line drawings. Nevertheless, establishing such a correspondence faced many challenges. Firstly, multiple identical 3D parts often existed in the assembly, but their corresponding 2D regions had different shapes and positions, which posed difficulties for neural networks in establishing such 3D-2D correspondence. Secondly, occlusions among parts in the line drawings further complicated the establishment of these correspondences. Therefore, a 3D part assembly method based on line-drawing segmentation was proposed, consisting of two main stages. In the first stage, point cloud information was used to perform part instance segmentation on the line drawings, effectively establishing the 3D-2D correspondence of the parts; in the second stage, a graph convolutional network was used to integrate the image information with the segmentation results for component pose estimation, thereby completing the assembly task. On the PartNet dataset, the proposed method was compared with three baseline methods: single-stage, layer-by-layer assembly, and two-stage approaches, demonstrating that it consistently improves component assembly accuracy and validating its effectiveness.

    Appearance-motion collaborative modeling for video anomaly detection
    Binhong XIE, Erdan ZHU, Rui ZHANG
    2026, 46(5):  1551-1559.  DOI: 10.11772/j.issn.1001-9081.2025050571
    Asbtract ( )   HTML ( )   PDF (1731KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Video anomaly detection currently faces several challenges. First, insufficient integration of appearance and motion information in complex environments results in a lack of semantic associations between the two modalities. Second, excessive reliance on prior information weakens the model's capacity for effective feature representation. Therefore, an Appearance-Motion Collaborative modeling for Video Anomaly Detection (AMC-VAD) method was proposed. It achieved pixel-level appearance-motion feature weight adjustment through a Pixel-level Dynamic Adaptation (PDA) module, used a dual-branch DepthWise Separable Convolution (DWSConv) to extract multi-scale semantic information, and enhanced the semantic relevance of feature fusion through dynamic activation and residual connection. In addition, an Auxiliary Memory Module (AMM) was designed to extract prototype features from a memory pool via a query-driven semantic alignment strategy, and a Dynamic Aggregation Mechanism (DAM) was incorporated to enhance the query feature saliency representations, alleviating the feature weakening caused by prior information coverage. A diversity loss was introduced to reduce redundancy in memory item distribution, thereby enhancing the model's discriminative ability for abnormal patterns. Experimental results showed that the proposed method achieved Area Under the receiver operating Characteristic curve (AUC) of 98.5% and 88.5% on the UCSD Ped2 and CUHK Avenue datasets, respectively, outperforming AMMC-Net (Appearance-Motion Memory Consistency Network) by 1.9 and 1.9 percentage points, respectively. The above validates the effectiveness of the proposed method in complex dynamic scenarios.

    Multi-band image captioning method based on scene concept-guided feature fusion
    Wenchao MING, Suzhen LIN, Zanxia JIN
    2026, 46(5):  1560-1567.  DOI: 10.11772/j.issn.1001-9081.2025050631
    Asbtract ( )   HTML ( )   PDF (1049KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    When processing multi-band images in complex scenes, existing image captioning models fail to effectively align and fuse features using simple cross-attention mechanism, as the features in multi-band images have significant spatial differences. Additionally, variations in the imaging principles of multi-band images and the complexity of scenes make it difficult for models to capture key visual semantic information, leading to the absence of key targets and incompleteness in generated captions. To address these issues, a multi-band image captioning method based on scene concept-guided feature fusion was proposed. Firstly, the regional features of infrared and visible images were extracted using a pre-trained feature extractor named Faster Region-based Convolutional Neural Network (Faster R-CNN), and a scene concept-guided multi-band Feature Alignment and Fusion Module (FAFM) was constructed. Secondly, to enhance the model's capability in modeling visual semantic information, a Concept-Guided Module (CGM) was designed to retrieve and encode scene concepts for images. Finally, an Adaptive Gating Mechanism (AGM) was built on this foundation. When the decoder generated words at each time step, the model dynamically adjusted the weights of the fused and concept features of multi-band images according to different situations, thereby achieving feature fusion. Experimental results on the visible-infrared image captioning datasets show that the proposed method achieves 56.7% and 119.5% in BLEU4 (BiLingual Evaluation Understudy with 4-grams) and CIDEr (Consensus-based Image Description Evaluation) metrics, respectively, which are 1.1 and 2.9 percentage points higher than those of the suboptimal method. The proposed method effectively improves the accuracy of multi-band image captioning.

    Pansharpening based on two-stage collaborative optimization
    Jiaxin DUAN, Jing HU, Wu WEN, Zhenxia YU, Yongjun ZHANG
    2026, 46(5):  1568-1577.  DOI: 10.11772/j.issn.1001-9081.2025050598
    Asbtract ( )   HTML ( )   PDF (2850KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Although deep learning-based pansharpening methods for remote sensing images have made certain progress, most of them rely on supervised training with downsampled data, making them susceptible to scale bias and difficult to maintain stable performance at full resolution. In contrast, unsupervised methods optimize directly on full-resolution images, avoiding issues caused by downsampling, but generally exhibit weak robustness due to the lack of explicit supervisory signals. Therefore, a pansharpening Network based on Two-stage Collaborative Optimization (TCONet) was proposed. In the first stage, through supervised training on downsampled data, and combining an Improved Multi-Resolution Analysis (IMRA) method with an attention mechanism, spatial details and spectral preservation capability were improved. In the second stage, an Unsupervised Information Compensation Network (UCIN) was constructed to directly optimize on full-resolution images, thereby compensating for information loss caused by scale inconsistency. Experimental results on three satellite datasets: QuickBird(QB), WorldView-2 (WV-2), and WorldView-4 (WV-4) indicate that TCONet outperforms comparative methods in terms of both visual quality and evaluation metrics.

    CSAF-YOLO: improved YOLO11 algorithm for underwater small object detection
    Hongrui ZHANG, Weiming FENG, Luxia YANG, Yongjie MA
    2026, 46(5):  1578-1585.  DOI: 10.11772/j.issn.1001-9081.2025101310
    Asbtract ( )   HTML ( )   PDF (1783KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address challenges in underwater small object detection, such as light scattering, low contrast, and complex background, an underwater small object detection algorithm named CSAF-YOLO (Cross-Scale Adaptive Fusion YOLO) was proposed based on YOLO11. Firstly, a Multi-Scale Collaborative Fusion (MSCF) module was designed to enhance cross-scale feature synergy and contextual information extraction through spatial fusion and channel interaction mechanisms. Secondly, a Dynamic Kernel Scale Modulation (DKSM) module was constructed to adaptively generate local and global modulation matrices, optimizing convolutional kernels for improved adaptability to complex underwater environments. Thirdly, a Multi-Scale Enhanced detection Head (MSE-Head) was proposed to improve small-object localization accuracy via scale-aware enhancement and dynamic cross-scale feature fusion. Finally, the MPDIoU (Modified Penalized Distance Intersection over Union) loss function was introduced to optimize bounding box regression for underwater small objects through minimum point distance and multi-scale penalty mechanisms. Experimental results on the URPC2020 dataset demonstrate that CSAF-YOLO achieves an mAP50 (mean Average Precision at 50% Intersection over Union (IoU) threshold) of 85.0%, representing an improvement of 1.6 percentage points over YOLO11. The proposed algorithm provides an effective solution for visual tasks in fields such as marine resource exploration and underwater robotic navigation.

    Lightweight underwater small object detection based on graph Transformer and RT-DETR
    Minqi WU, Yuanhua YANG, Hang LI, Yaqin HU, Zhihao TANG, Teng MEI
    2026, 46(5):  1586-1595.  DOI: 10.11772/j.issn.1001-9081.2025050565
    Asbtract ( )   HTML ( )   PDF (2938KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Existing underwater small object detection methods are primarily based on deep learning algorithms, which face challenges in balancing lightweight design and detection accuracy, so that they unable to meet the requirements of real-time and resource-constrained platforms. Therefore, Graph-DETR, a lightweight underwater small object detection model based on RT-DETR (Real-Time DEtection TRansformer) and a graph Transformer, was proposed. The model used a lightweight MobileNetV4 backbone improved with the Large Separable Kernel Attention mechanism (LSKAttention) and the Context-Mixing dynamic convolutional block (CM block) to enhance feature extraction efficiency and reduce model complexity. Additionally, a hierarchical Graph Transformer Feature Pyramid Network (GTFPN) was proposed to strengthen multi-scale feature fusion, and the hybrid encoder was optimized via Wavelet Transform Convolution (WTConv), Adaptive downsampling (Adown), and path pruning, thereby achieving convolutional receptive field expansion of the CNN-based Cross-scale Feature Fusion (CCFF) module with low parameterization. Experimental results on the underwater public dataset URPC2020 show that, compared to RT-DETR, Graph-DETR reduces the parameters by 66.9% and the reasoning latency by 6.8 ms, achieving a mean Average Precision (mAP) of 53.2% and an Average Precision of 86.8% at an IoU threshold of 0.5 (AP@0.5); on URPC2021, it has 81.3% recall, 54.1% mAP, 87.6% AP@0.5 with only 10.5 ms latency, outperforming the existing methods. Graph-DETR exhibits excellent performance in underwater small object detection and is practical for deployment on resource-constrained underwater platforms.

    Bispectrum-based nonlinear feature coupling method for speech enhancement
    Zhengtao YU, Yixue LUAN, Wenjun WANG, Ling DONG, Yan XIANG, Shengxiang GAO
    2026, 46(5):  1596-1603.  DOI: 10.11772/j.issn.1001-9081.2025050674
    Asbtract ( )   HTML ( )   PDF (1186KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issue that current time-frequency domain-based speech enhancement methods commonly model the linear characteristics of signals using second-order spectral statistics after Short-Time Fourier Transform (STFT), while neglecting the potential higher-order nonlinear interaction information in speech, a Bispectrum-based Nonlinear Feature Coupling method for speech enhancement (BNFC) was proposed. An encoder-decoder structure was employed as the overall framework, and a bispectral feature extraction module was introduced after the encoder to capture phase coupling and nonlinear structural information revealed by third-order statistics. By fusing the extracted bispectral features with encoder features through skip connections, deeper amplitude and phase modeling was achieved. Experimental results on the VoiceBank+DEMAND dataset showed that BNFC achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.57, representing a 15.53% improvement over the baseline model BREM (Bispectral Refinement Enhancement Module). In addition, Mean Opinion Score of Signal Distortion (CSIG), Background Noise Intrusiveness (CBAK), and Overall Speech Quality (COVL) were improved by 5.51%, 3.08%, and 10.31%, respectively, validating the importance of higher-order nonlinear feature modeling for speech enhancement tasks.

    Frontier and comprehensive applications
    Construction and application of knowledge graph for fault diagnosis of key components of aviation equipment
    Ronghui ZHAO, Chao DENG, Zidong YU
    2026, 46(5):  1604-1613.  DOI: 10.11772/j.issn.1001-9081.2025050586
    Asbtract ( )   HTML ( )   PDF (1334KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In response to the problems of strong professionalism, low value density, scattered domain knowledge, and lack of effective integration and utilization methods in the fault data of key components of aviation equipment, driven by the demand for intelligent fault diagnosis, a knowledge graph was introduced to organize the knowledge contained in fault records for sharing and reuse, and the construction and application of fault knowledge graph were studied. Firstly, based on the analysis of prior fault knowledge and fault records, a hierarchical fault diagnosis knowledge ontology model for key components of aviation equipment was designed, which defined entity types and their relationship constraints, effectively avoiding unclear entity boundaries and laying the foundation for the structured representation of knowledge. Secondly, an improved knowledge extraction method based on set prediction, namely SPN-BiLSTM-CRF, was proposed to efficiently extract knowledge triple sets directly from unstructured Chinese fault records, and a knowledge graph of hydraulic piston pump faults was constructed using aircraft component hydraulic piston pump as an example. Finally, combined with the FP-Growth association rule mining algorithm, association rules among fault modes, fault causes, and fault states were extracted from the fault knowledge dataset, and fault diagnosis was realized on this basis. SPN-BiLSTM-CRF can effectively address the knowledge application problem in fault data and provide a knowledge-driven solution for intelligent operation and maintenance of aviation equipment.

    Prediction-evaluation framework for anomaly detection in electric vehicle lithium-ion battery
    Xuechao LIAO, Rui CHEN
    2026, 46(5):  1614-1623.  DOI: 10.11772/j.issn.1001-9081.2025050574
    Asbtract ( )   HTML ( )   PDF (1968KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the challenges of high complexity in multi-source heterogeneous time-series data, scarcity of anomaly samples, and strong inter-variable dependencies in lithium-ion battery fault detection for electric vehicles, a prediction-evaluation framework based on Dynamic Transformer Memory autoencoder for Anomaly Detection (DTMAD) was proposed to enhance fault identification accuracy and model generalization capability. Firstly, a joint feature encoder was designed by integrating Dynamical autoencoder for Anomaly Detection (DyAD) with Gated Recurrent Unit (GRU) to perform feature fusion and dimensionality reduction on multi-source time-series data, extracting deep cross-modal representations. Simultaneously, a pre-response encoder based on a self-attention mechanism was constructed to capture long-term dependencies in time-series data, enhancing the efficiency and accuracy of feature extraction. Furthermore, a memory parsing module was introduced, which fused predicted path with the actual response path via residual contrastive learning, improving the model's capability to detect anomalous patterns. Secondly, based on the distribution characteristics of reconstruction error, an evaluation model was designed through a collaborative anomaly detection algorithm. Finally, through the comprehensive prediction-evaluation framework, key response patterns were extracted from complex multi-source data, and latent anomalies were identified under unsupervised learning conditions. Experimental results on multi-group and multi-source electric-vehicle lithium-ion battery datasets demonstrate that the proposed framework significantly outperforms baseline methods such as AutoEncoder (AE), Deep Support Vector Data Description (DeepSVDD), and Graph Deviation Network (GDN) in terms of detection accuracy and model robustness. Notably, compared to the DyAD model, the DTMAD model achieves an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.900 8, with result variability reduced from 0.029 to 0.026, indicating superior detection stability and generalization capability.

    Defect detection algorithm for train bearing rollers based on FHC-DETR
    Yuanhao HE, Jun ZHAO
    2026, 46(5):  1624-1633.  DOI: 10.11772/j.issn.1001-9081.2025050592
    Asbtract ( )   HTML ( )   PDF (2299KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issues such as low precision in small object recognition and large-scale variations in the detection of train bearing roller defects, a detection algorithm based on FHC-DETR(Fourier-fused High-low frequency interactive and Context-aware DEtection TRansformer) was proposed. Firstly, aiming at the problems of complex model computation and small object features being disturbed by noise, a frequency-domain feature extraction module C2f-FG (C2f with Fourier-Gated bottleneck) was designed for feature extraction. It synchronously acquired spatial-domain local features and frequency-domain global features via Fourier transform, and their fusion enhanced the accuracy of small-object detection while reducing computational complexity. Secondly, to tackle feature confusion caused by variations in defect scale, a high-low-frequency feature interaction module, HiLo (High-Low frequency), was introduced. The high-frequency branch focused on local defect textures, while the low-frequency branch captured overall semantics via global attention, thereby improving multi-scale adaptability. Subsequently, to resolve the issue of small object feature attenuation in feature fusion, a Context-aware Multi-scale Bidirectional Feature Pyramid Network (CM-BiFPN) was constructed. By dynamically perceiving context and strengthening cross-layer interaction, it reduced feature transmission loss and improved fusion efficiency. Finally, the EMASlideVarifocalLoss adaptive loss function was adopted to dynamically adjust classification thresholds and optimize weights of hard examples, further enhancing localization and category discrimination capabilities. Experimental results show that FHC-DETR achieves a mean Average Precision (mAP) of 91.5%, which is 2.3 percentage points higher than that of the original RT-DETR (Real-Time DETR). Additionally, its parameter count is reduced by 28.1%, its computational load is reduced by 23.0%, and its memory usage is reduced by 23.7%, demonstrating a balance between precision and efficiency and confirming its practicality in industrial scenarios.

    Wavelet-domain sparse Bayesian learning for uncertainty-aware MRI reconstruction
    Kaiyan CUI, Shuna WEI
    2026, 46(5):  1634-1646.  DOI: 10.11772/j.issn.1001-9081.2025101275
    Asbtract ( )   HTML ( )   PDF (11465KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Full-sampling Magnetic Resonance Imaging (MRI) requires a long scan time, which not only limits the examination efficiency, but also tends to induce motion artifacts due to subject movement. To address the issues of high parameter sensitivity and inability to quantify result uncertainty in traditional Compressed Sensing MRI (CS-MRI) reconstruction methods, an uncertainty-aware MRI reconstruction method based on wavelet-domain Sparse Bayesian Learning (SBL) was proposed, namely BU-MRI (Bayesian Uncertainty-guided MRI). Firstly, the advantage of the wavelet transform in multi-resolution image representation was leveraged, and a hierarchical Bayesian probability model was constructed by characterizing the sparsity of MRI images in the wavelet domain as a prior. Secondly, a posterior inference strategy combining Gibbs sampling and marginal likelihood maximization was adopted to achieve effective estimation of high-dimensional sparse coefficients and adaptive hyperparameters updating. Finally, based on the updated model parameters, high-quality images were iteratively reconstructed from undersampled K-space data. Furthermore, pixel-wise posterior confidence intervals could be provided, offering a quantitative assessment of uncertainty in the reconstruction results. Experimental results on both simulated and real MRI data demonstrated that BU-MRI outperformed methods such as Zero-Filled Inverse Discrete Fourier Transform (ZF-IDFT) and k-t Robust Principal Component Analysis (k-t RPCA) in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). On real cardiac MRI data and brain MRI data with a sampling rate of 0.5, the PSNR of BU-MRI achieved 44.42 dB and 40.37 dB, and SSIM reached 0.976 5 and 0.954 7, respectively. BU-MRI exhibits excellent performance in structural fidelity, error suppression, and frequency-domain consistency. It shows stable convergence and robustness across various sampling rates and noise levels, providing a reliable reconstruction framework with uncertainty quantification capability for clinical MRI.

    Chromosome cascaded classification framework integrating image texture enhancement and super-resolution
    Wen PENG, Bokai ZHANG, Jinwei LIN
    2026, 46(5):  1647-1657.  DOI: 10.11772/j.issn.1001-9081.2025050568
    Asbtract ( )   HTML ( )   PDF (1195KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Chromosome karyotype analysis is of great significance in prenatal screening and genetic disease diagnosis. However, existing chromosome classification models are generally limited by insufficient feature extraction capability, high sensitivity to image quality, and inadequate attention to local details, leading to low overall classification accuracy, particularly the frequent misidentification of short chromosomes. Therefore, a coarse-to-fine chromosome cascaded classification framework integrating image texture enhancement and super-resolution techniques was proposed. Firstly, chromosomes were coarsely classified based on the International System for human Cytogenetic Nomenclature (ISCN), and they were divided into long chromosomes and short chromosomes groups to mitigate class imbalance and feature confusion. Secondly, for the long chromosome classification task, a feature enhancement module was added to optimize the classification model's ability to perceive details of long chromosomes. Thirdly, considering the characteristics of short chromosome images, the super-resolution technique was introduced to improve image quality and the model's perceptual capability. Experimental results on a private dataset showed that the proposed framework achieved an overall chromosome classification accuracy of 98.91% and an overall chromosome F1-score of 98.77%, with 99.01% for long chromosomes and 98.31% for short chromosomes. By adopting differentiated classification strategies and task-specific models, this cascaded chromosome classification framework significantly enhances both classification accuracy and model robustness.

    MD-FVR: cascaded finger vein recognition network based on multi-domain feature fusion
    Chi ZHANG, Xianjing MENG, Changhao DOU, Qian WANG, Leilei GENG, Xiaoming XI
    2026, 46(5):  1658-1666.  DOI: 10.11772/j.issn.1001-9081.2025050658
    Asbtract ( )   HTML ( )   PDF (1137KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Existing finger vein recognition methods primarily focus on extracting features from the spatial domain, while neglecting the frequency-domain representation of tubular tree-like veins and the embedded multi-scale details. To address this issue, a cascaded Finger Vein Recognition network based on Multi-Domain feature fusion (MD-FVR) was proposed. The method adopted a hierarchical cascaded network architecture, in which each stage consisted of a multi-domain feature fusion module and basic operations, enabling progressive feature extraction and enhancement. The core multi-domain feature fusion module operated as follows: firstly, structural information was enhanced through Depthwise Separable Wavelet Convolution blocks (DSWC). Secondly, frequency-domain global features were extracted using Frequency-Domain Spatial and Coupling blocks (FDSCs) to compensate for limitations in spatial domain representation. Combined with an improved Half-Wavelet Spatial Attention block (HWSA), the local frequency-domain details were further refined. Finally, the final enhanced feature representation was obtained by integrating multi-domain features through the Adaptive Feature Fusion module (AFF). Experimental results on the HKPU and SDUMLA-HMT datasets showed that MD-FVR achieved recognition accuracies of 99.68% and 99.53%, with Equal Error Rates (EERs) of 0.35% and 0.41%, respectively. Compared with methods that rely solely on spatial or frequency features, MD-FVR demonstrates significant improvements in both recognition accuracy and robustness.

    SAM Meibomian gland unified dense segmentation method with introduction of automatic prompt encoder
    Ying JING, Ran LI, Zhuo JIANG, Ziyang FU, Jingyi DU, Qi LIU, Jihang LIU
    2026, 46(5):  1667-1676.  DOI: 10.11772/j.issn.1001-9081.2025050613
    Asbtract ( )   HTML ( )   PDF (3389KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The traditional Segment Anything Model (SAM) relies on manual prompts during segmentation of Meibomian gland images, making it difficult to handle issues such as dense glands, irregular shapes, and blurred boundaries. To address this, an improved model, namely ResSAM, was proposed. ResSAM eliminated the reliance on manual intervention by introducing an automatic prompt encoder. The backbone network was pruned and optimized to further enhance the model's segmentation efficiency. Focal Loss and Smooth IoU Loss were used for training optimization, and the SE (Squeeze-and-Excitation) and cross-attention mechanisms were integrated to reduce the impact of individual differences and blurred boundaries, thereby improving the model's segmentation accuracy. Experimental results on two self-built datasets, Lower Lid and Upper Lid, showed that ResSAM achieved the best performance in terms of the number of parameters and Giga FLoating-point OPerations (GFLOPs); its segmentation results obtained the highest Dice scores (88.69% and 87.75%, respectively) and the highest Intersection-over-Union (IoU) values (79.69% and 78.58%, respectively). The research results indicate that the ResSAM optimizes both efficiency and accuracy, supporting early prevention and clinical diagnosis of Meibomian Gland Dysfunction (MGD).

    Fine-grained Chinese herbal medicine image classification based on feature fusion and channel information compensation
    Xinyao LIU, Jun LIANG, Jiahao LONG, Renliang YAN
    2026, 46(5):  1677-1683.  DOI: 10.11772/j.issn.1001-9081.2025050632
    Asbtract ( )   HTML ( )   PDF (1212KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the field of fine-grained image classification of Chinese herbal medicine, the lack of a comprehensive and balanced dataset has been a major obstacle. To advance research on fine-grained image recognition of Chinese herbal medicine, a Herb-150 fine-grained Chinese herbal medicine dataset was constructed, with balanced sample distribution and comparable counts per category. To address the issue of deep neural networks easily losing discriminative, detailed features in this task, a fine-grained feature-enhanced CHMRN (Chinese Herbal Medicine Recognition Network) was proposed. By introducing a top-down feature fusion module, it integrated multi-scale semantic information to capture comprehensive contextual features. Additionally, a bottom-up channel feature information compensation module was designed to enhance the expressive power of fine-grained features, ensuring the accurate capture of subtle differences among traditional Chinese medicine categories. Experimental results showed that CHMRN achieved an accuracy of 93.910% on the Herb-150 dataset, outperforming mainstream models such as CMAL-Net (Cross-layer Mutual Attention Learning Network), validating its effectiveness in fine-grained classification tasks. The CHMRN not only improves the accuracy of traditional Chinese medicine identification, but also provides valuable references for similar fine-grained image classification applications.

    High-precision recognition method for imperfect grain images based on TransNeXt
    Miaomiao YUAN, Yihong CHU, Guanjun YIN, Chunhua DENG
    2026, 46(5):  1684-1691.  DOI: 10.11772/j.issn.1001-9081.2025050593
    Asbtract ( )   HTML ( )   PDF (1355KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Existing deep learning methods still face the following challenges in the research on high-precision imperfect grain recognition: the key discriminative features of imperfect grains are often distributed across image regions of varying scales and random positions, making it difficult to perceive these regions stably and comprehensively; meanwhile, the fine-grained discriminative features of multiple imperfect grains have diverse representations, and a unified modeling path struggles to optimize recognition performance of all categories simultaneously. To address these issues, a globally guided two-stage local feature learning framework was proposed based on TransNeXt. Deep representations of key discriminative regions were extracted under holistic perception and further refined through fine-grained modeling. Independently optimized network branches were designed for different categories, with all branches sharing the backbone to enable efficient adaptation and lightweight scalability. To support the above methods, an imperfect grain dataset covering multiple grain varieties with standardized category and discriminative region location annotations was constructed. Experimental results show that the proposed method achieves accuracy of 99.62% on the test set, verifying the framework's effectiveness and scalability in complex fine-grained image recognition tasks.

    Ore image segmentation with linear deformable convolution and dual-domain synergistic dynamic attention
    Jing HU, Shikun CHEN, Fang WANG, Rui ZHANG, Yong WANG
    2026, 46(5):  1692-1702.  DOI: 10.11772/j.issn.1001-9081.2025050645
    Asbtract ( )   HTML ( )   PDF (2625KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to solve the problems of blurred boundaries and insufficient accuracy in ore image segmentation caused by complex texture, irregular shape and uneven illumination, a segmentation network with Linear Deformable Convolution (LDConv) and dual-domain synergistic dynamic attention was proposed, namely LDDA-Net (Linear Deformable Dual-domain Attention Network). LDDA-Net adopted an encoder-decoder architecture. In the serial dual-feature encoder, an adaptive sampling point distribution was constructed through LDConv to flexibly fit the irregular shapes of the ore, and effectively control the computational overhead with its linear characteristics. Secondly, a Dynamic Attention Modulation (DAM) module was designed for spatial domain features, which realized dynamic focusing and reinforcement of the key information in the feature map and the ore edge through pooling sampling, learnable attention matrix and boundary-sensitive weight allocation mechanism. Finally, a new Dynamic Progressive Attention Guided Loss function (DPAG Loss) was proposed, which guided the model to focus on hard-to-divide areas such as fuzzy boundaries and small-sized ore particles during the training process by dynamically generating attention maps in multiple stages, and a space-loss dual-domain synergy was formed by DPAG Loss and DAM module, creating a feedback closed-loop mechanism of feature perception and learning strategies. Experimental results on the self-built open-pit ore dataset (OpenPitOre dataset) and the public ore dataset (Ore dataset) showed that LDDA-Net achieved a HD95 boundary error of only 16.84 mm, which is 11.37% lower than that of the suboptimal model VM-Unet; it attained the Dice coefficient as high as 91.54%, the mIoU and PA of 85.13% and 94.10%, respectively, significantly outperforming comparative segmentation models. LDDA-Net achieves high-precision and refined segmentation in complex scenarios, providing reliable technical support for intelligent detection and fragmentation analysis of ore in open-pit blasting.

2026 Vol.46 No.5

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF