Loading...

Table of Content

    10 February 2025, Volume 45 Issue 2
    Artificial intelligence
    Headline generation model with position embedding for knowledge reasoning
    Yalun WANG, Yangsen ZHANG, Siwen ZHU
    2025, 45(2):  345-353.  DOI: 10.11772/j.issn.1001-9081.2024030281
    Asbtract ( )   HTML ( )   PDF (1511KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    As the smallest semantic unit, sememe is crucial for headline generation task. Although Sememe-Driven Language Model (SDLM) is one of the mainstream models, it has limited encoding capability when dealing with long text sequences, does not fully consider positional relationships, and is prone to introduce noisy knowledge to affect the quality of generated headlines. To address the above problems, a Transformer-based generative headline model was proposed, namely Tran-A-SDLM (Transformer Adaption based Sememe-Driven Language Model with positional embedding and knowledge reasoning), which fully combined the advantages of adaptive position embedding and knowledge reasoning mechanism. Firstly, Transformer model was introduced to enhance the model’s encoding capability for text sequences. Secondly, the adaptive positional embedding mechanism was utilized to enhance the model’s positional awareness capability, thereby improving the learning of contextual sememe knowledge. In addition, a knowledge reasoning module was introduced for representing the sememe knowledge and guiding the model to generate accurate headlines. Finally, to demonstrate the superiority of Tran-A-SDLM, experiments were conducted on Large scale Chinese Short Text Summarization (LCSTS) dataset. Experimental results show that Tran-A-SDLM achieves improvements of 0.2, 0.7 and 0.5 percentage points respectively in ROUGE-1, ROUGE-2 and ROUGE-L scores, compared to RNN-context-SDLM. Results of the ablation study further validate the effectiveness of the proposed model.

    Contrastive knowledge distillation method for object detection
    Sheng YANG, Yan LI
    2025, 45(2):  354-361.  DOI: 10.11772/j.issn.1001-9081.2024020212
    Asbtract ( )   HTML ( )   PDF (4196KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Knowledge distillation is one of the most effective model compression methods in tasks such as image classification, but its application in complex tasks such as object detection is relatively limited. The existing knowledge distillation methods mainly focus on constructing information graphs to filter out noise from foreground or background regions during feature extraction by teachers and students, and then minimizing the mean square error loss between features. However, the objective functions of these methods are difficult to further optimize and only utilize the supervision signals of teachers, resulting in a lack of targeted information of incorrect knowledge for students. Based on this, a Contrastive Knowledge Distillation (CKD) method for object detection was proposed, which redesigned the distillation framework and loss function, and not only used the teacher’s supervision signal, but also utilized the constructed negative samples to provide guidance information for knowledge distillation, allowing students to acquire the teacher’s knowledge and acquire more knowledge through self-learning at the same time. Experimental results of the proposed method compared with the baseline on Pascal VOC and COCO2014 datasets using GFocal (Generalized Focal loss) and YOLOv5 models show that when using GFocal model on Pascal VOC dataset, CKD has the mean Average Precision (mAP) improvement of 5.6 percentage points, and the AP50 (Average Precision@0.50) improvement of 5.6 percentage points; and when using YOLOv5 model on COCO2014 dataset, CKD method has the mAP improvement of 1.1 percentage points, and the AP50 improvement of 1.7 percentage points.

    End-to-end Vietnamese text normalization method based on editing constraints
    Ming JIANG, Linqin WANG, Hua LAI, Shengxiang GAO
    2025, 45(2):  362-370.  DOI: 10.11772/j.issn.1001-9081.2024020232
    Asbtract ( )   HTML ( )   PDF (1452KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Text normalization is considered an indispensable step in frontend analysis task of Text-To-Speech (TTS), and semantic ambiguity is the main challenge faced by text normalization tasks, particularly semantic ambiguity of non-standard words such as numbers, dates, and time. Aiming at the problem, an editing constraint-based end-to-end text normalization method was proposed, and after fully considering linguistic characteristics of Vietnamese, a specialized labelling method was designed for Vietnamese to enhance the model’s modeling capability of contextual semantic information. Furthermore, addressing the issue of irreparable errors generated by neural network models easily, an editing alignment algorithm was proposed to constrain the scope of non-standard word text effectively, thereby reducing search space at the decoding end and avoiding prediction errors of non-normalized text caused by limitations of the model itself. With FastCorrect model selected as the baseline model, various optimization methods were applied to the model to obtain new models. Experimental results indicate that the proposed model achieves a 23.71 percentage point increase in precision compared to the baseline model using unlabeled data in Vietnamese experiments of different optimization methods, and a 26.24 percentage point increase in precision in similar Chinese experiments. It can be observed that the method not only performs well in Vietnamese but also demonstrates significant effects on Chinese open-source data, confirming its applicability beyond Vietnamese. Moreover, the model using the proposed method surpasses six baseline models with an precision of 97.14% and outperforms the Weighted Finite-State Transducer (WFST) two-stage method by 2.29 percentage points in F1-score, verifying superiority of the proposed method in text normalization tasks.

    Review of radar automatic target recognition based on ensemble learning
    Zirong HONG, Guangqing BAO
    2025, 45(2):  371-382.  DOI: 10.11772/j.issn.1001-9081.2024020179
    Asbtract ( )   HTML ( )   PDF (1391KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Radar Automatic Target Recognition (RATR) has widespread applications in both domains of military and civilian. Due to the robustness caused by that ensemble learning improves model classification performance by integrating the existing machine learning models, ensemble learning has been applied in the field of radar target detection and recognition increasingly. The research progress of ensemble learning in RATR was discussed in detail through systematic sorting and refining the existing relevant literature. Firstly, the concept, framework, and development process of ensemble learning were introduced, ensemble learning was compared with traditional machine learning and deep learning methods, and the advantages, limitations, and main focuses of research of ensemble learning theory and common ensemble learning methods were summarized. Secondly, the concept of RATR was described briefly. Thirdly, the applications of ensemble learning in different radar image classification features were focused on, with a detailed discussion on target detection and recognition methods based on Synthetic Aperture Radar (SAR) and High-Resolution Range Profile (HRRP), and the research progress and application effect of these methods were summed up. Finally, the challenges faced by RATR and ensemble learning were discussed, and the applications of ensemble learning in the field of radar target recognition were prospected.

    Few-shot image classification method based on contrast learning
    Xuewen YAN, Zhangjin HUANG
    2025, 45(2):  383-391.  DOI: 10.11772/j.issn.1001-9081.2024020253
    Asbtract ( )   HTML ( )   PDF (1897KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Deep learning-based image classification algorithms usually rely on huge amounts of training data. However, it is often difficult to obtain sufficient large-scale high-quality labeled samples in real scenarios. Aiming at the problem of insufficient generalization ability of classification models in few-shot scenarios, a few-shot image classification method based on contrast learning was proposed. Firstly, global contrast learning was added as an auxiliary target in training to enable the feature extraction network to obtain richer information from instances. Then, the query samples were split into patches and used to calculate the local contrast loss, thereby promoting the model to gain the ability to infer the global thing the local things. Finally, saliency detection was used to mix the important regions of the query samples, and complex samples were constructed, so as to improve the model generalization ability. Experimental results of 5-way 1-shot and 5-way 5-shot image classification tasks on two public datasets, miniImageNet and tieredImageNet, show that compared to the few-shot learning baseline model, Meta-Baseline, the proposed method improves the classification accuracy by 5.97 and 4.25 percentage points respectively on miniImageNet, and by 3.86 and 2.84 percentage points respectively on tieredImageNet. Besides, the classification accuracy of the proposed method on miniImageNet is improved by 1.02 and 0.72 percentage points respectively compared to that of DFR (Disentangled Feature Representation) model. It can be seen that the proposed method improves the accuracy of few-shot image classification effectively with good generalization ability.

    Graph data augmentation method for few-shot node classification
    Kun FU, Shicong YING, Tingting ZHENG, Jiajie QU, Jingyuan CUI, Jianwei LI
    2025, 45(2):  392-402.  DOI: 10.11772/j.issn.1001-9081.2024030266
    Asbtract ( )   HTML ( )   PDF (2174KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Graph structure data are widely found in the real world. However, they often face a shortage of labeled data in practical applications. Methods for Few-Shot Learning (FSL) on graph data aim to classify data with a few labeled samples. Although these methods have good performance in Few-Shot Node Classification (FSNC) tasks, there are still the following problems: high-quality labeled data are difficult to obtain, generalization ability is insufficient in the parameter initialization process, the topology structure information in graph is not fully mined. To address these problems, a Few-Shot Node Classification model based Graph Data Augmentation (GDA-FSNC) was proposed. There are four modules in GDA-FSNC: a graph data pre-processing module based on structural similarity, a parameter initialization module, a parameter fine-tuning module, and an adaptive pseudo-label generation module. In the graph data pre-processing module, an adjacency matrix enhancement method based on structural similarity was used to obtain more graph structural information. In the parameter initialization module, to enhance the diversity of information during the model training process, a mutual teaching-based data augmentation method was used to make each model learn different patterns and features from the other models. In the adaptive pseudo-label generation module, appropriate pseudo-label generation techniques were selected automatically according to the characteristics of different datasets, thereby generating high-quality pseudo-label data. Experimental results on seven real datasets show that the proposed model performs better than the state-of-the-art FSL models such as Meta-GNN, GPN(Graph Prototypical Network), and IA-FSNC (Information Augmentation for Few-Shot Node Classification) in classification accuracy. For example, compared to the baseline model IA-FSNC, The classification accuracy of the proposed model has been improved by at least 0.27 percentage points in the 2-way 1-shot setting of the small dataset and by at least 2.06 percentage points in the 5-way 1-shot setting of the large datasets. It can be seen that GDA-FSNC has better classification performance and generalization ability in few-shot scenarios.

    Dense object counting network with few-shot similarity matching feature enhancement
    Binhong XIE, Wanyin GAO, Wangdong LU, Yingjun ZHANG, Rui ZHANG
    2025, 45(2):  403-410.  DOI: 10.11772/j.issn.1001-9081.2024010070
    Asbtract ( )   HTML ( )   PDF (2724KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to address the challenges of limited training data and diverse categories, a few-shot learning method was introduced. In view of the problems existing in dense object counting methods, such as unclear boundaries and spatial inconsistency of dense objects as well as weak generalization capability of model, a few-shot Similarity Matching Feature Enhancement dense object counting Network (SMFENet) was proposed. Firstly, image features were extracted through the feature extraction module, and sample features were aligned using ROI Align. Secondly, a Similarity Comparison Feature Enhancement Module (SCFEM) was designed to calculate similarity between sample features and image features, resulting in a similarity graph. This graph was used as weighting coefficients to enhance the image features adaptively with the sample features, so as to obtain the final enhanced features focusing more on regions with features similar to the sample features. At the same time, methods such as internal feature enhancement, internal scale enhancement and information fusion were employed to solve the problems of unclear boundaries and spatial inconsistency of dense objects. Finally, a density map was generated using the density prediction module. Additionally, the content-aware annotation method was used to generate high-quality Ground-Truth density maps to further improve the model accuracy. During test, the network was adjusted by adaptive loss to generalize to new categories. Experimental results on FSC-147 dataset and CARPK dataset show that compared with the existing few-shot counting methods, the proposed model has the Mean Absolute Error (MAE) reduced to 13.82 and Root Mean Squared Error (RMSE) reduced to 45.91, compared with class-specific counting method, the proposed model has the MAE reduced to 4.16 and RMSE reduced to 5.91. The above fully proves that SMFENet model can achieve good results in improving the accuracy and robustness of counting, demonstrates the practical application value of the model.

    Tri-modal adapter based on selective state space
    Hongye LIU, Xiai CHEN, Tao ZENG
    2025, 45(2):  411-420.  DOI: 10.11772/j.issn.1001-9081.2024010130
    Asbtract ( )   HTML ( )   PDF (2992KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The pre-training-then-fine-tuning paradigm is widely used in a variety of unimodal and multimodal tasks. However, as the model size grows exponentially, it becomes very difficult to fine-tune all the parameters of the pre-trained model. To solve this problem, a tri-modal adapter based on selective state space was designed, which can freeze the pre-trained model, fine-tune only a small number of additional parameters, and accomplish intensive interactions among three modalities. Specifically, a long-term semantic selection module based on selective state space and a short-term semantic interaction module based on visual or audio center were proposed and inserted among the sequential encoders sequentially to accomplish the intensive interactions among tri-modal information. The long-term semantic selection module aims at suppressing redundant information in three modalities, while the short-term semantic interaction module models the interactions of local modal features in a short term. Compared to previous methods that require pre-training on large-scale tri-modal datasets, the proposed method is more flexible, and it can inherit powerful unimodal or bimodal models arbitrarily. On Music-AVQA tri-modal evaluation dataset, the proposed method achieves an average accuracy of 80.19%, with an improvement of 4.09 percentage points compared to LAVISH.

    Semantic graph enhanced multi-modal recommendation algorithm
    Qijian CAI, Wei TAN
    2025, 45(2):  421-427.  DOI: 10.11772/j.issn.1001-9081.2024010145
    Asbtract ( )   HTML ( )   PDF (2506KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to mine the latent isomorphic semantic relationships within multi-modal information and learn better item representations, a Semantic Graph Enhanced Multi-modal Recommendation (SGEMR) algorithm was proposed. Specifically, auxiliary multi-modal information was utilized to complement historical user-item interactions, thereby capturing user preferences in different modalities. Subsequently, based on metric learning, the scattered sequence of items was reconstructed into a dense item-item semantic graph, and a semantic hierarchical attention mechanism was designed to integrate the multi-modal information of items. At the same time, a graph reconstruction loss function was proposed to retain more semantic relationships in item representations, thereby improving recommendation performance. Experimental results indicate that compared to the optimal baseline algorithm FREEDOM (FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation) on three real datasets, the proposed algorithm has the Recall@10 enhanced by 6.70%, 11.30%, and 5.09% respectively, and the NDCG@10 increased by 9.09%, 12.73%, and 7.62% respectively. Moreover, the effectiveness of the proposed algorithm is validated through various ablation experiments.

    Data science and technology
    Time series event classification method fused with derived features
    Hanlin ZHANG, Junlu WANG, Baoyan SONG
    2025, 45(2):  428-435.  DOI: 10.11772/j.issn.1001-9081.2024020202
    Asbtract ( )   HTML ( )   PDF (2121KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Time series classification is the foundation of time series analysis. However, the morphological features corresponding to the existing time series classification methods cannot serve as the basis for classification, and the features between channels are not characterized accurately by the single weight on the graph, resulting in low classification accuracy. Therefore, a Time Series Event Classification method Fused with Derived Features (TSEC-FDF) was proposed. Firstly, after constructing a time series event set on the time series, the mutation graphs, collaborative graphs, and heuristic graphs were constructed on the basis of each time series event to reduce noise interference to high-dimensional features. Secondly, the features of multiple graphs were fused and treated as derived features, and features of time series events at multiple time levels were extracted. Finally, a multi-graph convolutional classification model fusing derived features was proposed, where time series and graph features were cascaded as high-dimensional features of time series events. Experimental results show that TSEC-FDF improves the accuracy, precision, recall, F1 score, AUROC(Area Under the Receiver Operating Characteristic) curve, and AUPRC(Area Under the Precision versus Recall Curve) on 4 real datasets by 3.2%, 4.7%, 7.8%, 6.3%, 0.9%, and 2.2%, at least, compared to TF-C (Time-Frequency Consistency) and Bi-directional Long Short-Term Memory-Hidden Markov Model (BL-HMM) methods.

    Robust shapelet representation method for time series
    Qianting ZHANG, Liying HU, Lifei CHEN
    2025, 45(2):  436-443.  DOI: 10.11772/j.issn.1001-9081.2024020163
    Asbtract ( )   HTML ( )   PDF (1771KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the wide application of time series data in various fields, the mining and representation of identifiable features of the data is crucial. Due to the influence of the data acquisition environment and acquisition equipment, time series data in many application fields are characterized by high noise, which puts forward high requirements for the robustness of data representation methods. Therefore, a Robust Shapelet representation method for Time series (TRS) was proposed, which adopts the feature extraction method of Key-Shapelet (KS), retains the interpretability while reducing the influence of noise, and represents the time series by position distance measurement, thereby improving the robustness of the whole method. Experimental results on noise-disturbed time series data show that the features extracted by TRS are significantly better than those of the existing methods in classification, and the average accuracy of TRS is 2.1 percentage points higher than that of the deep learning model — Adversarial Dynamic Shapelet Network (ADSN), which also extracts features based on Shapelets. It can be seen that the feature set extracted by TRS is more representative and robust.

    Multi-domain spatiotemporal hierarchical graph neural network for air quality prediction
    Handa MA, Yadong WU
    2025, 45(2):  444-452.  DOI: 10.11772/j.issn.1001-9081.2024010064
    Asbtract ( )   HTML ( )   PDF (3113KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In the spatiotemporal hybrid models that integrate meteorological, spatial, and temporal information, the modeling of temporal changes is usually done in one-dimensional space. To solve the problems that one-dimensional sequences are limited in sliding windows and is lack of the flexibility of multi-scale feature extraction, a Multi-domain SpatioTemporal Hierarchical Graph Neural Network (MST-HGNN) model was proposed. Firstly, two levels of hierarchical graphs were constructed, namely, city-wide global scale one and station-level local scale one, so as to perform spatial relationship learning. Secondly, the one-dimensional air quality sequences were transformed into a set of two-dimensional tensors based on multiple periods, and multi-scale convolution in two-dimensional space was used to capture frequency domain features by periodic decoupling. At the same time, Long Short-Term Memory (LSTM) network in one-dimensional space was employed to fit temporal features. Finally, to avoid redundant information aggregation, a gating mechanism fusion module was designed for multi-domain feature fusion of frequency domain and temporal domain features. Experimental results on Urban-Air dataset and the Yangtze River Delta city cluster dataset show that compared with Multi-View Multi-Task Spatiotemporal Graph Convolutional Network model (M2), the proposed model has lower Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) than the comparison model in predicting air quality at the 1 h, 3 h, 6 h, and 12 h. It can be seen that MST-HGNN can decouple complex time patterns in the frequency domain, compensate for the limitations of temporal feature modeling using frequency domain information, and predict air quality changes more comprehensively by combining time domain information.

    Cyber security
    Summary of network intrusion detection systems based on deep learning
    Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU
    2025, 45(2):  453-466.  DOI: 10.11772/j.issn.1001-9081.2024020229
    Asbtract ( )   HTML ( )   PDF (1427KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Security mechanisms such as Intrusion Detection System (IDS) have been used to protect network infrastructure and communication from network attacks. With the continuous progress of deep learning technology, IDSs based on deep learning have become a research hotspot in the field of network security gradually. Through extensive literature research, a detailed introduction to the latest research progress in network intrusion detection using deep learning technology was given. Firstly, a brief overview of several IDSs was performed. Secondly, the commonly used datasets and evaluation metrics in deep learning-based IDSs were introduced. Thirdly, the commonly used deep learning models in network IDSs and their application scenarios were summarized. Finally, the problems faced in the current related research were discussed, and the future development directions were proposed.

    Survey on trusted execution environment towards privacy computing
    Han ZHANG, Hang YU, Jiwei ZHOU, Yunkai BAI, Lutan ZHAO
    2025, 45(2):  467-481.  DOI: 10.11772/j.issn.1001-9081.2024020222
    Asbtract ( )   HTML ( )   PDF (1430KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the popularization of cloud computing and big data, increasing user privacy data was updated for cloud computing and processing. However, as privacy data was stored and managed by untrusted third parties, user private data faces the risk of privacy leakage, thereby affecting the safety of citizens’ lives and property, and even national security. In recent years, several privacy preserving techniques based on cryptographic algorithms, such as secure multi-party computation, Homomorphic Encryption (HE), and federated learning, solve the security issues in the transmission and computation process of private data, thereby achieving “usable but invisible” of private data. However, these schemes have not been widely deployed and applied due to their computational and communication complexity. At the same time, much research devotes to use Trusted Execution Environment (TEE) to reduce the computational and communication complexity of privacy preserving techniques while ensuring security of these techniques. TEEs create execution environments that can be trusted with hardware assistance, and ensure the confidentiality, integrity, and availability of privacy data and code in the environment. Therefore, start from the research combining privacy computing and TEEs, the review was performed. Firstly, the system architecture and hardware support of TEEs to protect the user data privacy were analyzed comprehensively. Then, the advantages and disadvantages of the existing TEE architectures were compared. Finally, combined with the latest developments in industry and academia, the future development trends of the cross-research field of privacy computing and TEEs were discussed.

    Federated learning method based on adaptive differential privacy and client selection optimization
    Chao XU, Shufen ZHANG, Haitian CHEN, Lulu PENG, Shuaihua ZHANG
    2025, 45(2):  482-489.  DOI: 10.11772/j.issn.1001-9081.2024020162
    Asbtract ( )   HTML ( )   PDF (2308KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    The method of applying differential privacy to federated learning has been one of the key techniques for protecting the privacy of training data. Addressing the issue that most previous works do not consider the heterogeneity of parameters, resulting in pruning training parameters uniformly, leading to uniform noise addition in each round, thus affecting model convergence and the quality of training parameters, an adaptive noise addition scheme based on gradient clipping was proposed. Considering the heterogeneity of gradients, adaptive gradient clipping was executed for different clients in different rounds, thereby allowing for the adaptive adjustment of noise magnitude. At the same time, to further improve model performance, different from traditional client random sampling methods, a client sampling method that combines roulette and elite preservation was proposed. Combining the aforementioned two methods, a Client Selection and Adaptive Gradient Clipping Differential Privacy_Federated Learning (CS&AGC DP_FL) was proposed. Experimental results demonstrate that, when the privacy budget is 0.5, compared to the Federated Learning method based on Adaptive Differential Privacy (Adapt DP_FL), the proposed method improves the final model’s classification accuracy by 4.9 percentage points under the same level of privacy constraints. Additionally, in terms of convergence speed, the proposed method requires 4 to 10 fewer rounds to reach convergence compared to the methods to be compared.

    FedAud: adaptive defense mechanism based on historical model updates
    Zhiqiang REN, Xuebin CHEN
    2025, 45(2):  490-496.  DOI: 10.11772/j.issn.1001-9081.2024030300
    Asbtract ( )   HTML ( )   PDF (2229KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Federated Learning (FL) has emerged as a promising method for training machine learning models on decentralized edge devices while protecting data privacy. However, FL systems are susceptible to Byzantine attacks, which means that a malicious client compromises the integrity of the global model. Moreover, some existing defense methods have large computational overheads. To address the above problems, an adaptive defense mechanism, namely FedAud, was proposed, which aims to reduce computational overhead of the server while ensuring robustness of the FL system against Byzantine attacks. An anomaly detection module and a reputation mechanism were integrated by FedAud to adjust the defense strategy dynamically based on historical model updates. Experimental results of FedAud evaluated using MNIST and CIFAR-10 datasets under various attack scenarios and defense methods demonstrate that FedAud reduces the execution frequency of defense methods effectively, thereby alleviating the computational burden of the server and enhancing FL efficiency, particularly in scenarios of defense methods with high computational overheads or long training cycles. Furthermore, FedAud maintains model accuracy and even improves model performance in certain cases, verifying its effectiveness in real FL deployments.

    Privacy-preserving random consensus asset cross-chain scheme
    Baoyin WANG, Hongmei XUE, Qilie LIU, Tao GUO
    2025, 45(2):  497-505.  DOI: 10.11772/j.issn.1001-9081.2024020235
    Asbtract ( )   HTML ( )   PDF (2928KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    With the ecological conversion from single-chain collaboration to multi-chain expansion in blockchain, cross-chain technology becomes key path for driving application innovation and enhancing system capabilities. However, the absence of a unified identity authentication scheme in distributed ledgers with multi-chain architecture may pose potential privacy risks during asset cross-chain transactions. In response, a privacy-preserving random consensus asset cross-chain scheme was proposed. In the proposed scheme, the Random Notary Multiple Signature (RNMS) architecture-based cross-chain model was introduced, and secure negotiation of shared keys by the two parties of transaction was insured through the Elliptic Curve Diffie-Hellman (ECDH) key negotiation algorithm. Furthermore, considering the intermediate trust problem, an algorithm for random notary selection based on enhanced Algorand approach was designed. In this algorithm, the roulette-style labeling method for random notary selection was improved to a verifiable pseudo-random method, thereby reducing the risks associated with pseudo-random selection and ensuring the security and decentralization of cross-chain interactions. After that, Byzantine Agreement (BA) was improved to reduce the communication cost of consensus, and algorithmic simulations on BFT simulation platform were performed. Experimental results demonstrate that compared to Algorand algorithm, the proposed scheme achieves higher node privacy verification, enhances cross-chain message consensus efficiency by 89%, and reduces node message communication cost by 80% through notary mechanism verification. The above show that the proposed scheme can improve cross-chain security effectively.

    Federated learning privacy protection scheme based on local differential privacy for remote sensing data
    Haitian CHEN, Xuebin CHEN, Ruikui MA, Shuaihua ZHANG
    2025, 45(2):  506-517.  DOI: 10.11772/j.issn.1001-9081.2024020249
    Asbtract ( )   HTML ( )   PDF (5715KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Remote sensing data have high spatio-temporal correlation and complex surface features, which makes the privacy protection of the data challenging. As a distributed learning method with the goal of protecting data privacy of the participants, federated learning provides an effective solution to overcome the challenges faced by remote sensing data privacy protection. However, during the training phase of federated learning models, malicious attackers may infer private information of the participants through inversion, leading to the disclosure of sensitive information. Aiming at the privacy leakage problem of remote sensing data in federated learning training, a federated learning privacy protection scheme based on local differential privacy was proposed. Firstly, the model was pre-trained, the layer importance of the model was calculated, and the privacy budget was allocated reasonably based on the layer importance. Then, local differential privacy protection was achieved by performing a crop transformation on the model update and performing adaptive random disturbance on the crop value. Finally, model correction was employed to further improve the model performance when the aggregated disturbance was updated. Theoretical analysis and simulation results show that the proposed scheme can not only provide appropriate differential privacy protection for each participant and prevent inferring privacy sensitive information through inversion effectively, but also outperform the segmentation mechanism-based disturbance scheme in accuracy on three remote sensing datasets by 3.28 to 3.93 percentage points. It can be seen that the proposed scheme guarantees model performance effectively while ensuring privacy.

    Vertical federated learning enterprise emission prediction model with integration of electricity data
    Xinyan WANG, Jiacheng DU, Lihong ZHONG, Wangwang XU, Boyu LIU, Wei SHE
    2025, 45(2):  518-525.  DOI: 10.11772/j.issn.1001-9081.2024020173
    Asbtract ( )   PDF (2798KB) ( )  
    References | Related Articles | Metrics

    To address the problem of the difficulty of monitoring and controlling enterprise emissions, a Vertical Federated Learning Enterprise Emission Prediction (VFL-EEP) model with integration of electricity data was proposed by considering the premise of secure data sharing and privacy protection. Firstly, within the framework of Vertical Federated Learning (VFL), the logistic regression model was enhanced to allow the separation of data usage and model training without leaking the monitoring data of electricity and environmental protection enterprises. Then, the logistic regression algorithm was improved to incorporate with Paillier encryption technology for ensuring the security of model parameter transmission, thereby solving the issue of insecure communication among participants in VFL effectively. Finally, through experiments on simulated data, the pollution prediction results of the proposed model were compared with those of the centralized logistic regression model. The results show that the proposed model integrates electricity data under the premise of privacy security, and has the accuracy, recall, precision, and F1 value improved by 8.92%, 7.62%, 3.95%, and 11.86%, respectively, realizing the balance between privacy protection and model performance effectively.

    Abnormal attack detection for underwater acoustic communication network
    Dixin WANG, Jiahao WANG, Min LI, Hao CHEN, Guangyao HU, Yu GONG
    2025, 45(2):  526-533.  DOI: 10.11772/j.issn.1001-9081.2024030283
    Asbtract ( )   HTML ( )   PDF (2570KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, underwater acoustic communication network plays a crucial role in underwater information transmission. Due to the open nature of underwater communication channels, they are more prone to attacks such as interference, tempering, and eavesdropping, so that underwater acoustic communication networks face security challenges different from traditional networks. However, traditional anomaly detection methods have lower accuracy when applied to underwater acoustic networks directly. At the same time, although machine learning-based anomaly detection methods improve accuracy, they face problems such as limited datasets and poor model interpretability. Therefore, CNN-BiLSTM integrating attention mechanism was applied for anomaly attack detection in underwater acoustic networks, and WCBA (underWater CNN-BiLSTM-Attention) model was proposed. In the model, the high dimension of dataset was reduced effectively through IG-PCA (Integrated Gradient-Principal Component Analysis) feature selection algorithm, and the identification of abnormal attacks in complex underwater data was enabled by fully utilizing the spatio-temporal features of multi-dimensional matrix acoustic network traffic. Experimental results show that WCBA model provides higher accuracy and interpretability compared to other neural network models when the dataset is limited.

    Advanced computing
    Collaborative crowdsourcing task allocation method fusing community detection
    Linbo HU, Zhiwei NI, Jiale CHENG, Wentao LIU, Xuhui ZHU
    2025, 45(2):  534-545.  DOI: 10.11772/j.issn.1001-9081.2024030274
    Asbtract ( )   HTML ( )   PDF (3324KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the issue of neglecting workers’ collaborative relationships in traditional collaborative crowdsourcing task allocation, a collaborative crowdsourcing task allocation method fusing community detection was proposed, by considering the social and historical cooperative relationships among workers. Firstly, potential social relationships among crowdsourced workers were mined by a community detection algorithm to establish candidate communities. Secondly, after defining factors such as degree of collaboration, interaction cost, and utility of task allocation, a model for collaborative crowdsourcing task allocation was developed by considering skill coverage, credibility, and budget comprehensively. Thirdly, the strategies such as Piece-Wise chaotic mapping, inverse cumulative function operator based on Cauchy distribution, adaptive tangent flight operator, and sparrow warning mechanism were introduced and an optimized Sand Cat Swarm Optimization (SCSO) algorithm — TSCSO was proposed. Finally, TSCSO algorithm was used to solve the aforementioned model. Experimental results on examples synthesized from real datasets of different scales demonstrate that the proposed algorithm has the task allocation success rate of at least 90%. Furthermore, TSCSO algorithm improves the average task allocation utility ranging by 20.08% to 53.38% compared to other optimized intelligent algorithms, verifying the proposed algorithm’s applicability, stability, and efficacy in collaborative crowdsourcing task allocation problems.

    DeepsORF: coding sORFs prediction method based on graph coding with improved flow attention
    Dongmei XIE, Xinye BIAN, Lianfei YU, Wenbo LIU, Ziling WANG, Zhijian QU, Jiafeng YU
    2025, 45(2):  546-555.  DOI: 10.11772/j.issn.1001-9081.2024020177
    Asbtract ( )   HTML ( )   PDF (3046KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Small Open Reading Frames (sORFs) plays a critical role in various biological processes, and identifying coding and non-coding sORFs accurately is a significant and challenging task in genomics. Due to the severe reliance of most existing algorithms for predicting coding sORFs on manual features based on prior biological knowledge, and the lack of universality of the algorithms, as well as the variable lengths of original sORFs sequences that prevent direct input into prediction models, an sORF-Graph graph encoding method-based end-to-end deep learning framework, DeepsORF, was developed for predicting coding sORFs. Firstly, all sORFs sequences were encoded into the corresponding graphs through sORF-Graph, and the input sequences were standardized by encoding sequence information into graph element features. Then, a convolutional and residual flow attention mechanism was introduced to capture the interactions among long distant bases within sORFs, thereby enhancing the expression of sORFs features and improving the model’s prediction accuracy. Experimental results demonstrate that DeepsORF framework enhances performance on all of six independent test sets. Compared with csORF-finder method, DeepsORF achieves increases of 9.97, 19.49, and 13.07 percentage points in accuracy, Matthew Correlation Coefficient (MCC), and precision, respectively, on D.melanogaster nonCDS-sORFs test set, validating the effectiveness and good generalization ability of DeepsORF model in the task of identifying coding and non-coding sORFs.

    Recurrence formula for initial value problems of fractional-order autonomous dynamics system and its application
    Wujiu FU, Lin ZHOU, Jianjie DENG, Yong YOU
    2025, 45(2):  556-562.  DOI: 10.11772/j.issn.1001-9081.2024030289
    Asbtract ( )   HTML ( )   PDF (5494KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    When calculating fractional-order differential dynamics systems numerically, there are difficulties of long-time memory storage by discretizing differential equation directly. In order to solve this problem, firstly, the differential equation was integrated once and then discretized. At the same time, a recurrence formula was given and its applicable conditions were discussed. Some common non-linear problems were calculated by this formula. The results of the above were consistent with those of other numerical methods. As whether there is chaotic motion in two-dimensional fraction-order continuous dynamics system is not concluded, this recurrence formula was used to study the two-dimensional continuous coupled Logistic model. It is found that there is only the limit cycle generated by the equilibrium point through Hopf bifurcation in this system without chaotic motion. Finally, the Lyapunov exponent criterion for the motion of two-dimensional fractional-order continuous Logistic system was given.

    Network and communications
    Intelligent joint power and channel allocation algorithm for Wi-Fi7 multi-link integrated communication and sensing
    Jing WANG, Xuming FANG
    2025, 45(2):  563-570.  DOI: 10.11772/j.issn.1001-9081.2024020191
    Asbtract ( )   HTML ( )   PDF (2623KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To solve the problem of joint power and channel resource allocation for integrated communication and sensing in multi-link transmission of next-generation Wi-Fi7 devices, a multi-link multi-agent reinforcement learning algorithm based on QMIX(Q-learning Mixing Network) for Joint Power Control and channel allocation (JPCQMIX) was proposed on the basis of special upper and lower Media Access Control (MAC) layer structure of Multi-Link Device (MLD). In the algorithm, each lower-layer MAC, i.e., each link, was regarded as an agent, and mixing network was set up in the upper-layer MAC to process all the local value functions of lower-layer MACs, thereby achieving the effect of centralized training. After the training, each lower-layer MAC entered the distributed execution mode and interacted with its local environment independently to perform power control and channel allocation decision making. Simulation results show that the proposed algorithm improves the communication throughput performance by 20.51% and 29.10% respectively compared with Multi-Agent Deep Q Network (MADQN) algorithm and the traditional heuristic Particle Swarm Optimization (PSO) algorithm. Meanwhile, the proposed algorithm demonstrates better robustness when facing with different sensing accuracy thresholds and different link minimum Signal-to-Interference-plus-Noise Ratio (SINR). It can be seen that JPCQMIX enhances the system’s communication throughput under the condition of satisfying the sensing accuracy effectively.

    Dynamic allocation algorithm for multi-beam subcarriers of low orbit satellites based on deep reinforcement learning
    Huahua WANG, Liang HUANG, Jiajie CHEN, Jiening FANG
    2025, 45(2):  571-577.  DOI: 10.11772/j.issn.1001-9081.2024030306
    Asbtract ( )   HTML ( )   PDF (2404KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In response to the resource allocation problem in multi-beam scenarios of Low Earth Orbit (LEO) satellite, as the factors such as interference and noise between wave beams in actual satellite communication environments are complex and variable, conventional subcarrier dynamic allocation algorithms cannot adjust parameters dynamically to adapt to changes in the communication environment. By combining traditional communication scheduling algorithms with reinforcement learning techniques, with the goal of minimizing user packet loss rate, user’s scheduling situations were adjusted dynamically and resources of the entire satellite communication system were allocated dynamically to adapt to environmental changes. The dynamic characteristic model of LEO satellite was discretized by time slot division, and a Deep Reinforcement Learning (DRL)-based resource allocation strategy was proposed on the basis of the modeling of LEO satellite resource allocation scenarios. In this strategy, the scheduling opportunities for users with high latency were increased by adjusting the satellite scheduling queue situation, that is, adjusting the resource blocks in each beam of a single LEO satellite to correspond to qualifications of users, thereby ensuring a certain level of fairness and reducing the user packet loss rate at the same time. Simulation results show that under the condition meeting total power constraints, the user transmission fairness and system throughput are stable in the proposed Deep Reinforcement Learning based Resource Allocation algorithm (DRL-RA), and users with large latency obtain more scheduling opportunities in DRL-RA due to priority improvement. Compared with Proportional Fairness (PF) algorithm and Maximum Carrier/Interference (Max C/I) algorithm, DRL-RA has the data packet loss rate reduced by 13.9% and 15.6% respectively. It can be seen that the proposed algorithm solves the problem of packet loss effectively during data transmission.

    Multimedia computing and computer simulation
    Symbolic music generation with pre-training
    Yuchen HONG, Jinlong LI
    2025, 45(2):  578-583.  DOI: 10.11772/j.issn.1001-9081.2024030264
    Asbtract ( )   HTML ( )   PDF (1616KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    To address the lack of sufficient paired multi-track music score datasets in the field of music representation learning, a music generation pre-training model was proposed. Firstly, a multi-generator model based on Transformers named MMGPNet (Multi-track Music Generation with Pre-training Network) as the baseline model was proposed as the fact that multi-track music generation needs to ensure continuity within the single track and harmony between the tracks at the same time. Secondly, in order to use sufficient single track musical instrument datasets, a music pre-training module was designed on the generation model. Finally, a reconstruction task was designed during the pre-training process to mask the properties of musical notations and rebuild them. Experimental results show that the proposed model accelerates training process of the model and improves the prediction accuracy. Besides, compared with baseline models such as MuseGAN (Multi-track sequential Generative Adversarial Network) and SymphonyNet, various music evaluation metrics of the generated multi-track sequences are closer to the real music. The listening test further proves the validity of the proposed model.

    Panoptic scene graph generation method based on relation feature enhancement
    Linhao LI, Yize WANG, Yingshuang LI, Yongfeng DONG, Zhen WANG
    2025, 45(2):  584-593.  DOI: 10.11772/j.issn.1001-9081.2024010139
    Asbtract ( )   HTML ( )   PDF (5117KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Panoptic Scene Graph Generation (PSGG) aims to identify all objects within an image and capture the intricate semantic association among them automatically. Semantic association modeling depends on feature description of target objects and subject-object pair. However, current methods have several limitations: object features extracted through bounding box extraction are ambiguous; the methods only focus on the semantic and spatial position features of objects, while ignoring the semantic joint features and relative position features of subject-object pair, which are equally essential for accurate relation predictions; current methods fail to extract features of different types of subject-object pair (e.g., foreground-foreground, foreground-background, background-background) differentially, ignoring their inherent differences. To address these challenges, a PSGG method based on Relation Feature Enhancement (RFE) was proposed. Firstly, by introducing pixel-level mask regional features, the detailed information of object features was enriched, and the joint visual features, semantic joint features, and relative position features of subject-objects were integrated effectively. Secondly, depending on the specific type of subject-object, the most suitable feature extraction method was selected adaptively. Finally, more accurate relation features after enhancement were obtained for relation prediction. Experimental results on the PSG dataset demonstrate that with VCTree (Visual Contexts Tree), Motifs, IMP (Iterative Message Passing), and GPSNet as baseline methods, and ResNet-101 as the backbone network, RFE achieves increases of 4.37, 3.68, 2.08, and 1.80 percentage points, respectively, in R@20 index for challenging SGGen tasks. The above validates the effectiveness of the proposed method in PSGG.

    Multi-focus image fusion network with cascade fusion and enhanced reconstruction
    Benchen YANG, Haoran LI, Haibo JIN
    2025, 45(2):  594-600.  DOI: 10.11772/j.issn.1001-9081.2024030302
    Asbtract ( )   HTML ( )   PDF (2477KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of semi-focus images caused by improper focusing of far and near visual fields during digital image shooting, a multi-focus image fusion Network with Cascade fusion and enhanced reconstruction (CasNet) was proposed. Firstly, a cascade sampling module was constructed to calculate and merge the residuals of feature maps sampled at different depths for efficient utilization of focused features at different scales. Secondly, a lightweight multi-head self-attention mechanism was improved to perform dimensional residual calculation on feature maps for feature enhancement of the image and make the feature maps present better distribution in different dimensions. Thirdly, convolution channel attention stacking was used to complete feature reconstruction. Finally, interval convolution was used for up- and down-sampling during the sampling process, so as to retain more original image features. Experimental results demonstrate that CasNet achieves better results in metrics such as Average Gradient (AG) and Gray-Level Difference (GLD) on multi-focus image benchmark test sets Lytro, MFFW, grayscale, and MFI-WHU compared to popular methods such as SESF-Fuse (Spatially Enhanced Spatial Frequency-based Fusion) and U2Fusion (Unified Unsupervised Fusion network).

    Lightweight image super-resolution reconstruction based on asymmetric information distillation network
    Haiteng MENG, Xiaole ZHAO, Tianrui LI
    2025, 45(2):  601-609.  DOI: 10.11772/j.issn.1001-9081.2024030276
    Asbtract ( )   HTML ( )   PDF (4020KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Deep Convolutional Neural Network (CNN) has impressive performance in image super-resolution reconstruction. However, many current related methods have a lot of model parameters, making them unsuitable for devices with limited computational resources. To address the above problem, a lightweight Asymmetric Information Distillation Network (AIDN) was proposed. Firstly, effective feature information was extracted from the input original images and edge images. Secondly, an asymmetric information distillation module was designed for non-linear mapping learning on these features. Thirdly, multiple residual images were reconstructed by an upsampling module and fused into one residual image through attention mechanism. Finally, the fused residual image was added to the interpolation of the input image to generate the super-resolution image. Experimental results on Set14, Urban100, and Manga109 datasets show that the 4× super-resolution Peak Signal-to-Noise Ratio (PSNR) values of AIDN model are improved by 0.03 dB, 0.14 dB, and 0.06 dB, respectively, compared to those of Spatial Adaptive Feature Modulation Network (SAFMN). This demonstrates that AIDN model achieves a superior balance between model parameters and performance.

    Dynamic visual SLAM algorithm incorporating object detection and feature point association
    Shijia WEN, Shijun JING
    2025, 45(2):  610-615.  DOI: 10.11772/j.issn.1001-9081.2024020227
    Asbtract ( )   HTML ( )   PDF (5476KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem that dynamic objects interfere with the normal operation of Simultaneous Localization And Mapping (SLAM) system seriously, a dynamic visual SLAM algorithm based on object detection and feature point association was proposed. Firstly, the YOLOv5 (You Only Look Once version 5) object detection network was used to obtain information about potential dynamic objects in environment, and the missed detection of the image was compensated on the basis of simple target tracking. Secondly, in order to solve the problem that the geometric constraint method of single feature point is prone to misjudgment, the feature point association was established according to the positional information and optical flow information of the image, and then combined with the epipolar constraint, dynamics of the relation network was judged. Thirdly, the two methods were combined to eliminate dynamic feature points in the image, and the remaining static feature points were weighted to estimate the camera pose. Finally, a dense point cloud map was established for the static environment. Experimental results of comparison and ablation on TUM (Technical University of Munich) public dataset demonstrate that the Root Mean Square Error (RMSE) in Absolute Trajectory Error (ATE) of the proposed algorithm is reduced by at least 95.22% and 5.61% respectively compared to ORB-SLAM2 and DS-SLAM (Dynamic Semantic SLAM) in highly dynamic scenarios. It can be seen that the proposed algorithm can improve accuracy and robustness while ensuring real-time performance.

    Image watermarking method combining attention mechanism and multi-scale feature
    Tianqi ZHANG, Shuang TAN, Xiwen SHEN, Juan TANG
    2025, 45(2):  616-623.  DOI: 10.11772/j.issn.1001-9081.2024030282
    Asbtract ( )   HTML ( )   PDF (3448KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problems that the watermarking method based on deep learning does not fully highlight key features of the image and does not utilize the output features of the intermediate convolution layer effectively, to improve the visual quality and the ability to resist noise attacks of the watermarked image, an attention mechanism-based multi-scale feature image watermarking method was proposed. An attention module was designed in the encoder part to focus on important image features, thereby reducing image distortion caused by watermark embedding; a multi-scale feature extraction module was designed in the decoder part to capture different levels of image details. Experimental results show that compared with the deep watermark model HiDDeN(Hiding Data with Deep Networks) on COCO dataset, the proposed method has the generated watermarked image’s Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity (SSIM) increased by 11.63% and 1.29% respectively and has the average Bit Error Rate (BER) of watermark extraction for dropout, cropout, crop, Gaussian blur, and JPEG compression reduced by 53.85%. In addition, ablation experimental results confirm that the method adding attention module and multi-scale feature extraction module has better invisibility and robustness.

    Low-dose CT denoising model based on dual encoder-decoder generative adversarial network
    Hong SHANGGUAN, Huiying REN, Xiong ZHANG, Xinglong HAN, Zhiguo GUI, Yanling WANG
    2025, 45(2):  624-632.  DOI: 10.11772/j.issn.1001-9081.2024010039
    Asbtract ( )   HTML ( )   PDF (4155KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In recent years, Generative Adversarial Network (GAN) used for Low-Dose Computed Tomography (LDCT) image denoising has shown significant performance advantages, becoming a hot topic in the field. However, the insufficient perception ability of GAN generator for the noise and artifact distribution in LDCT images leads to limit the denoising performance. To address this issue, an LDCT denoising model based on a Dual Encoder-Decoder GAN (DualED-GAN) was proposed. Firstly, a pair of encoder-decoder was proposed to form an artifact pixel-level feature extraction channel for estimating the artifact noise in LDCT images. Secondly, another pair of encoder-decoder was proposed to form an artifact mask information extraction channel for estimating the intensity and location information of artifacts. Finally, the artifact image quality label maps were used to assist in estimating the mask information of artifacts, so that supplementary features were provided for the artifact pixel-level feature extraction channel, thereby enhancing the sensitivity of the GAN denoising network to the distribution intensity of artifact noise. Experimental results show that compared with the sub-optimal model DESD-GAN(Dual-Encoder-Single-Decoder based Generative Adversarial Network), the proposed model increases the average Peak Signal-to-Noise Ratio (PSNR) by 0.338 7 dB, and the average Structural Similarity Index Measure (SSIM) by 0.002 8 on mayo test set. It can be seen that the proposed model performs better in all terms of artifact suppression, structural preservation, and model robustness.

    Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion
    Zhongwei ZHANG, Jun WANG, Shudong LIU, Zhiheng WANG
    2025, 45(2):  633-639.  DOI: 10.11772/j.issn.1001-9081.2024020252
    Asbtract ( )   HTML ( )   PDF (2412KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Significant differences in object scale and aspect ratio in remote sensing images lead to difficult object detection in remote sensing images. Aiming at this characteristic of remote sensing image, in order to improve the precision of object detection in remote sensing images, EW-YOLO (Efficient Weighted-YOLO) was proposed by improving the YOLO framework. Firstly, the multi-level feature fusion structure was introduced in the feature fusion section, so that the dual-branch residual module was utilized to promote the fusion of features at different scales. And by the cascade of feature fusion modules and the cross-layer feature fusion design, the extraction capability of objects at different scales was improved, and the detection capability was further enhanced. Secondly, in the prediction section, the weighted detection head was proposed and Weighted Boxes Fusion (WBF) was introduced, so as to improve the detection precision of objects with different aspect ratios by weighting each candidate box using the confidence scores and generating prediction boxes by fusion. Finally, to address the issue of too large image size, an image resampling technique was proposed, which means that the images were sampled to appropriate sizes and joined into network training, solving the problem of low detection precision of large-size objects caused by cropping. Experimental results on DOTA dataset show that the detection mean Average Precision (mAP) of the proposed method is 77.47%, which is increased by 1.55 percentage points compared to that of the original YOLO framework based method. And compared with the current mainstream methods, the proposed method has superior performance. At the same time, the proposed method’s effectiveness is also verified on HRSC and UCAS-AOD datasets.

    Fabric defect detection algorithm based on context information and multi-scale feature fusion
    Qiurun HE, Jie HU, Bo PENG, Tianyuan LI
    2025, 45(2):  640-646.  DOI: 10.11772/j.issn.1001-9081.2024010140
    Asbtract ( )   HTML ( )   PDF (5396KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In response to the detect difficulty caused by weak edge features and extreme aspect ratios in fabric defects, a Context information Multi-scale feature Fusion Fabric defect Detection algorithm based on improved YOLOv7 (CMFFD-YOLO) was proposed. Firstly, adaptive anchor boxes for the object sizes were obtained using k-means clustering algorithm, and transfer learning was applied to introduce the backbone weights. Secondly, the backbone network was redesigned, and Global Context (GC) module was added to fully utilize local and global context information to enhance feature extraction capability for small targets. Finally, a Channel spatial attention Asymptotic Feature Pyramid Network (CAFPN) based on multi-scale feature fusion network was designed, utilizing progressive fusion to establish tighter connections between semantic information from different levels. And during fusion process, more useful information was extracted effectively. Experimental results on Tianchi and ZJU-Leaper textile fabric defect datasets, demonstrate that the proposed algorithm achieves the mean Average Precision (mAP) of 64.6% and 61.7% respectively. Compared to the original YOLOv7, the proposed algorithm shows improvements of 12.5 and 7.8 percentage points in mAP respectively, and reduces model parameter size by 5.013×106, which means faster detection speed. It can be seen that the proposed algorithm satisfies accuracy and speed requirements for fabric defect detection in enterprise applications.

    Lightweight large-format tile defect detection algorithm based on improved YOLOv8
    Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU
    2025, 45(2):  647-654.  DOI: 10.11772/j.issn.1001-9081.2024020198
    Asbtract ( )   HTML ( )   PDF (3856KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In view of the problems of current tile defect detection mainly relying on manual detection, such as strong subjectivity, low efficiency, and high labor intensity, an improved lightweight algorithm for detecting small defects in large-format ceramic tile images based on YOLOv8 was proposed. Firstly, the high-resolution large-format image was cropped, and HorBlock was introduced into the backbone network to enhance model’s capture capability. Secondly, Large Separable Kernel Attention (LSKA) was incorporated to improve C2f for improving the detection performance of the model and model’s feature extraction capability was enhanced by introducing SA (Shuffle Attention). Finally, Omni-Dimensional Dynamic Convolution (ODConv) was introduced to further enhance model’s capability to handle with small defects. Experimental results on Alibaba Tianchi tile defect detection dataset show that the improved model not only has lower parameters than the original YOLOv8n, but also has an increase of 8.2 percentage points in mAP@0.5 and an increase of 7 percentage points in F1 score compared to the original YOLOv8n. It can be seen that the improved model can identify and process small surface defects of large-format tiles more accurately, and improve the detection effect significantly while maintaining lightweight.

    Tunnel foreign object detection algorithm based on improved YOLOv8n
    Jiayang GUI, Shunji WANG, Zhengkang ZHOU, Jiashan TANG
    2025, 45(2):  655-661.  DOI: 10.11772/j.issn.1001-9081.2024020225
    Asbtract ( )   HTML ( )   PDF (3102KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    In order to address the problems of high labor costs and low efficiency in manual inspection for tunnel foreign object detection, a tunnel foreign object detection algorithm based on improved YOLOv8n was proposed. Firstly, C2f_CA module was proposed with the incorporation of Coordinate Attention (CA) mechanism. In the module, by embedding positional information into channel attention, the network’s focus on the spatial distribution of features in the image was enhanced, thereby improving feature extraction capability of the network. Secondly, inspired by the concept of high-resolution network, a new feature fusion module HRNet_Fusion (High Resolution Net) was proposed to take extracted feature maps with different resolutions as four parallel branches and input them into the network, and multiple up-sampling, down-sampling, and fusion operations were performed to obtain comprehensive and accurate feature information. The above enhanced performance in small target detection and feature fusion significantly. Finally, the WIoU (Wise-IoU) loss function was introduced to reduce the harmful gradient effects of low-quality samples on the network, further improving model detection accuracy. Experimental results on a tunnel foreign object detection dataset indicate that the improved algorithm achieves mean Average Precision (mAP@0.5) of 79.9%, with a model size of 6.0 MB. Compared to YOLOv8n, the proposed algorithm has the mAP@0.5 enhanced by 6 percentage points, while the model size decreased by 0.2 MB, and the model parameters reduced by 0.379×106.

    VPNet: fatty liver ultrasound image classification method inspired by ventral pathway
    Danni DING, Bo PENG, Xi WU
    2025, 45(2):  662-669.  DOI: 10.11772/j.issn.1001-9081.2024020185
    Asbtract ( )   HTML ( )   PDF (1686KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    An innovative fatty liver classification method based on ventral pathway was developed due to the crucial role of ventral pathway in visual information processing. By integrating Convolutional Neural Network (CNN) and biological visual cognition model, hierarchical information processing process from primary visual cortex (V1) to Inferior Temporal Cortex (IT Cortex) was simulated, resulting in the creation of a new neural network architecture named VPNet (Ventral Pathway Network). Besides, inspired by non-Classical Receptive Field (nCRF) inhibition mechanism in biological vision, which aids in background noise suppression, this mechanism was simulated to address the challenge of speckle noise in ultrasound images, thereby enhancing the feature recognition capability of the model. An accuracy of 88.37% was achieved by VPNet in identifying four categories of fatty liver variation degree on the self-made dataset, and best performance of 100% accuracy, sensitivity, and specificity was achieved by VPNet in diagnosing two categories of fatty liver on the public dataset. The experimental results show that, compared with the superior ResNet101-SVM in the existing public dataset research, the accuracy of VPNet increases by 11.63 and 0.7 percentage points on the self-made dataset and public dataset respectively, which proves the effectiveness of the proposed method in the diagnosis of fatty liver diseases.

    Frontier and comprehensive applications
    Enterprise ESG indicator prediction model based on richness coordination technology
    Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG
    2025, 45(2):  670-676.  DOI: 10.11772/j.issn.1001-9081.2024030262
    Asbtract ( )   HTML ( )   PDF (1400KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Environmental, Social, and Governance (ESG) indicator is a critical indicator for assessing the sustainability of enterprises. The existing ESG assessment systems face challenges such as narrow coverage, strong subjectivity, and poor timeliness. Thus, there is an urgent need for research on prediction models that can forecast ESG indicator accurately using enterprise data. Addressing the issue of inconsistent information richness among ESG-related features in enterprise data, a prediction model RCT (Richness Coordination Transformer) was proposed for enterprise ESG indicator prediction based on richness coordination technology. In this model, an auto-encoder was used in the upstream richness coordination module to coordinate features with heterogeneous information richness, thereby enhancing the ESG indicator prediction performance of the downstream module. Experimental results on real datasets demonstrate that on various prediction indicators, RCT model outperforms multiple models including Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM) network, Self-Attention Model (Transformer), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The above verifies that the effectiveness and superiority of RCT model in ESG indicator prediction.

    Fault diagnosis method for train control on-board interface equipment of CTCS-3 based on temporal knowledge graph completion
    Meng WANG, Daqian ZHANG, Bingyan ZHOU, Qianying MA, Jidong LYU
    2025, 45(2):  677-684.  DOI: 10.11772/j.issn.1001-9081.2024070990
    Asbtract ( )   HTML ( )   PDF (2503KB) ( )  
    Figures and Tables | References | Related Articles | Metrics

    Chinese Train Control System level 3 (CTCS-3) train control on-board equipment plays a crucial role in ensuring train safety and improving operational efficiency. On-board interface equipment enables interaction between the on-board Automatic Train Protection (ATP) system, and ground equipment, drivers and trains. However, faults in on-board interface equipment account for a relatively high proportion of on-board equipment faults. In order to identify fault causes and ensure safety, a fault diagnosis method for on-board interface equipment based on temporal knowledge graph completion was proposed. In the method, travel logs and fault statistical data were integrated by introducing the temporal series, which extracted fault phenomena, performed entity alignment, and constructed a temporal knowledge graph. On the basis of the above, a fault diagnosis network based on knowledge graph completion was constructed; Temporal-Translating Embedding (T-TransE) vectorization, and Bidirectional Long Short-Term Memory (Bi-LSTM) network as well as Self-Attention (SA) mechanism were integrated for temporal feature extraction. Finally, the T-TransE vectorization model was pretrained using on-board interface equipment fault data from a railway administration in recent years, and the temporal introduction method with the best effect was selected. In order to validate superiority of the proposed method and effectiveness of the data integration method, the diagnostic network without data integration or temporal relationship introduction, as well as other common fault diagnostic networks, were tested using the on-board fault data. Experimental results show that with the same corpus, the temporal knowledge graph completion-based fault diagnosis model achieves the highest accuracy of 96.69% compared to other fault diagnosis frameworks.

2025 Vol.45 No.4

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF