Journal of Computer Applications

Review of interpretable deep knowledge tracing methods

Jinxian SUO, Liping ZHANG, Sheng YAN, Dongqi WANG, Yawen ZHANG

2025, 45(7): 2043-2055. DOI: 10.11772/j.issn.1001-9081.2024070970

Asbtract ( )

HTML ( )

PDF (2726KB) ( )

Figures and Tables | References | Related Articles | Metrics

Knowledge Tracing （KT） is a cognitive diagnostic method aimed at simulating learner's mastery level of learned knowledge by analyzing learner's historical question answering records， ultimately predicting learner's future question answering performance. Knowledge tracing techniques based on deep neural network models have become a hot research topic in knowledge tracing field due to their strong feature extraction capabilities and superior prediction performance. However， deep learning-based knowledge tracing models often lack good interpretability. Clear interpretability enable learners and teachers to fully understand the reasoning process and prediction results of knowledge tracing models， thus facilitating the formulation of learning plans tailored to the current knowledge state for future learning， and enhance the trust of learners and teachers in knowledge tracing models at the same time. Therefore， interpretable Deep Knowledge Tracing （DKT） methods were reviewed. Firstly， the development of knowledge tracing and the definition as well as necessity of interpretability were introduced. Secondly， improvement methods proposed for solving the lack of interpretability in DKT models were summarized and listed from the perspectives of feature extraction and internal model enhancement. Thirdly， the related publicly available datasets for researchers were introduced， and the influences of dataset features on interpretability were analyzed， discussing how to evaluate knowledge tracing models from both performance and interpretability perspectives， and sorting out the performance of DKT models on different datasets. Finally， some possible future research directions to address current issues in DKT models were proposed.

Survey of sequential pattern mining

Zhenlong DAI, Meng HAN, Wenyan YANG, Shineng ZHU, Shurong YANG

2025, 45(7): 2056-2069. DOI: 10.11772/j.issn.1001-9081.2024070952

Asbtract ( )

HTML ( )

PDF (4325KB) ( )

Figures and Tables | References | Related Articles | Metrics

Sequential Pattern Mining （SPM） aims to discover interesting patterns or rules from databases to support and guide user decision-making. In recent years， research on algorithms related to SPM goes deeper and deeper increasingly. With the emergence of large-scale data， many sequential algorithms suitable for parallel environments have been proposed. Therefore， a review of the existing sequential and parallel sequential mining algorithms was presented. Firstly， for sequential pattern serial mining algorithms， structured classification was performed， which means that the algorithms were categorized on the basis of adopted data structures they use， such as tree structure， list structure， and link structure， the advantages and disadvantages of different structures were summarized comprehensively and the strengths and weaknesses of each algorithm were summed up in detail. Secondly， for sequential pattern parallel mining algorithms， for the first time， the existing distributed frameworks were classified according to different characteristics of storage structures， the advantages and disadvantages of different distributed frameworks were analyzed and the parallel algorithms were introduced and analyzed on the basis of these frameworks. Finally， future research directions were discussed to address the shortcomings of the existing SPM algorithms.

Review of conflict-based cache side-channel attacks and eviction sets

Zihao YAO, Ziqiang MA, Yang LI, Lianggen WEI

2025, 45(7): 2070-2078. DOI: 10.11772/j.issn.1001-9081.2024070933

Asbtract ( )

HTML ( )

PDF (2682KB) ( )

Figures and Tables | References | Related Articles | Metrics

Cache side-channel attacks exploit the shared characteristics of computer caches， and pose serious threats to target cryptographic systems across processors and virtual machines. Among them， conflict-based cache side-channel attacks overcome the limitations imposed of privileged instructions， and can construct a set of virtual addresses that map to the same cache set as the target address， that is the eviction set， so as to cause cache conflicts and ultimately obtain the target’s sensitive data. Constructing eviction set has become a key technique in conflict-based cache side-channel attacks and speculative execution attacks. Therefore， a review of researches on conflict-based cache side-channel attacks and eviction sets was conducted. First， the fundamental principles of conflict-based cache side-channel attacks were discussed. Subsequently，the core mechanisms and evolution of eviction set construction algorithms were discussed. These algorithms were systematically categorized into conflict elimination methods and conflict progressive methods， distinguished by their strategies for candidate address manipulation and eviction set construction. Furthermore， key factors influencing eviction set construction algorithms were summarized. Finally，current challenges and future research directions for conflict-based cache side-channel attacks were discussed.

Survey of DNS tunneling detection technology research

Zhiqiang ZHENG, Ruiqi WANG, Zijing FAN, Famei HE, Yepeng YAO, Qiuyun WANG, Zhengwei JIANG

2025, 45(7): 2079-2091. DOI: 10.11772/j.issn.1001-9081.2024070972

Asbtract ( )

HTML ( )

PDF (1890KB) ( )

Figures and Tables | References | Related Articles | Metrics

As a system that converts IP addresses and domain names to each other， Domain Name System （DNS） is one of the important basic protocols in Internet. Due to the importance of DNS in Internet， the security policies of some security facilities such as firewalls and Intrusion Detection Systems （IDSs） allow DNS traffic to pass by default， giving attackers the opportunity to use DNS tunneling for communication. Currently， there are many malware that support DNS communication or even use DNS communication by default， which brings great challenges to network security tools and security operations centers. However， the existing research mainly focuses on specific detection methods and rarely explores the tunneling tools themselves， even though the majority of researchers rely on tunneling tools to generate samples. Therefore， the research on DNS tunnel detection technology was reviewed. Firstly， the development history and research status and the existing detection schemes of DNS tunneling were elaborated systematically， and the advantages and disadvantages of detection methods in the past 10 years were discussed. Subsequently， 6 commonly used tools in these detection schemes such as dnscat2， Iodine， and dns2tcp were evaluated and tested， and the experimental data was published. Experimental results show that most detection schemes do not disclose their tunneling sample datasets or the set parameters when using tunneling tools to generate traffic， making these schemes almost impossible to reproduce. Besides， some detection solutions use DNS tunneling tools with distinctive signature characteristics. Using samples with signature features to train model-based detection schemes will lead to doubts about the generalization ability of the model， that is， it is impossible to know whether this type of model will perform well in the real world. Finally， related future work development directions were prospected.

Blockchain sharding mechanism in asynchronous network based on HoneyBadgerBFT and DAG

Yuxuan CHEN, Haibin ZHENG, Zhenyu GUAN, Boheng SU, Yujue WANG, Zhenwei GUO

2025, 45(7): 2092-2100. DOI: 10.11772/j.issn.1001-9081.2024070962

Asbtract ( )

HTML ( )

PDF (2275KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of limited network size， strong dependency on network environment， high storage cost， and low transaction throughput existing in the scalability of blockchain systems， a sharding mechanism adapted to asynchronous network environments and supporting parallel transaction processing was proposed. In this mechanism， HoneyBadgerBFT consensus was used to achieve data consistency in asynchronous networks， sharding technology was employed for linear scalability of blockchain system， and DAG （Directed Acyclic Graph） technology was adopted to further enhance capability for intra-shard and inter-shard parallel transaction processing. Simulation results demonstrate that the proposed mechanism maintains liveness even in the asynchronous network environment； in the semi-synchronous network environment， the proposed mechanism reduces the communication overhead by more than 49.9% compared to SharPer using PBFT （Practical Byzantine Fault Tolerance）； in a blockchain network composed of 16 nodes， the proposed mechanism has the TPS （Transactions-Per-Second） decreased by 16.7% compared to SharPer， while in a 64-node blockchain network， the proposed mechanism has the TPS higher than SharPer by 6.7%， showing higher throughput of the proposed mechanism than SharPer； under the condition of 20% cross-shard transactions and using the same network environment and hardware resources， when the numbers of shards and nodes double， transaction throughput of the proposed mechanism grows 30.0% and 10.5% respectively more than that of SharPer， demonstrating better scalability of the proposed mechanism than SharPer.

Labeling certainty enhancement-oriented positive and unlabeled learning algorithm

Yulin HE, Peng HE, Zhexue HUANG, Weicheng XIE, Fournier-Viger PHILIPPE

2025, 45(7): 2101-2112. DOI: 10.11772/j.issn.1001-9081.2024070953

Asbtract ( )

HTML ( )

PDF (3586KB) ( )

Figures and Tables | References | Related Articles | Metrics

Positive and Unlabeled Learning （PUL） is used to train classifiers with performance that can be accepted by practical applications when negative samples are unknown by utilizing a few known positive samples and many unlabeled samples. The existing PUL algorithms have a common flaw： big uncertainty in labeling unlabeled samples， leading to inaccurate classification boundaries learnt by the classifier and limiting the classifier’s generalization ability on new data. To solve this issue， an unlabeled sample Labeling Certainty Enhancement-oriented PUL （LCE-PUL） algorithm was proposed. Firstly， reliable positive samples were selected on the basis of similarity between posterior probability mean on the validation set and center point of the positive sample set， and the labeling process was refined gradually through iterations， so as to increase the accuracy of preliminary category judgments of unlabeled samples， thereby improving the certainty of labeling unlabeled samples. Secondly， these reliable positive samples were merged with the original positive sample set to form a new positive sample set， and then this set was removed from the unlabeled sample set. Thirdly， the new unlabeled sample set was traversed， and reliable positive samples were selected again based on similarity of each sample and multiple neighboring points， so as to further improve the inference of potential labels， thereby reducing mislabeling and enhancing certainty of labeling. Finally， the positive sample set was updated， and the unselected unlabeled samples were treated as negative samples. The feasibility， rationality， and effectiveness of LCE-PUL algorithm were validated on representative datasets. With the increase of iterations， the training of the LCE-PUL algorithm shows a convergent characteristic. When the proportion of positive samples is 40%， 35%， and 30%， the test accuracy of the classifier constructed by the LCE-PUL algorithm is improved by 5.8， 8.8， and 7.6 percentage points at most compared with the five representative comparative algorithms， including the Biased Support Vector Machine based on a specific cost function （Biased-SVM） algorithm， the Dijkstra-based Label Propagation for PUL （LP-PUL） algorithm， and the PUL by Label Propagation （PU-LP） algorithm. Experimental results show that LCE-PUL is an effective machine learning algorithm for handling PUL problems.

Label noise adaptive learning algorithm based on meta-learning

Qiaoling QI, Xiaoxiao WANG, Qianqian ZHANG, Peng WANG, Yongfeng DONG

2025, 45(7): 2113-2122. DOI: 10.11772/j.issn.1001-9081.2024070932

Asbtract ( )

HTML ( )

PDF (2377KB) ( )

Figures and Tables | References | Related Articles | Metrics

Image classification requires the collection of a large number of images for model training and optimization， but the image collection process will bring noisy labels inevitably. To cope with this challenge， robust classification methods have emerged. The setting of hyperparameters in the current robust classification methods needs to be adjusted manually， which brings a lot of loss in human and material resources. Therefore， Meta Hyperparameter Adjuster （MHA） was proposed， which adopted a two-layer nested loop optimization method to learn noise-aware hyperparameter combinations adaptively， and a Meta-FPL （Feature Pseudo-Label adaptive learning algorithm based on Meta learning） algorithm was proposed too. In addition， in order to solve the problem that the backpropagation process in meta training phase consumes a large amount of GPU computing power， the Select Activation Metamodel Layer （SAML） strategy was proposed， which restricts the update of some metamodel layers by comparing sizes of the average gradient of the backpropagation and the meta-gradient in virtual training phase， which improves training efficiency of the model effectively. Experimental results on four benchmark datasets and one real dataset show that compared with MLC （Meta Label Correction for noisy label learning）， CTRR （ConTrastive RegulaRization） and Feature Pseudo Label （FPL） algorithms， Meta-FPL algorithm has a higher classification accuracy. In addition， after introducing SAML strategy， the training duration of the backpropagation process in the meta training phase was reduced by 79.52%. It can be seen that Meta-FPL algorithm can effectively improve the accuracy of classification testing in a shorter training time.

Federated learning algorithm for personalization and fairness

Hongyang ZHANG, Shufen ZHANG, Zheng GU

2025, 45(7): 2123-2131. DOI: 10.11772/j.issn.1001-9081.2024070934

Asbtract ( )

HTML ( )

PDF (3790KB) ( )

Figures and Tables | References | Related Articles | Metrics

As a distributed optimization paradigm， Federated Learning （FL） enables a large number of resource-constrained client nodes to train models collaboratively without sharing the data. However， traditional federated learning algorithms， such as fedAvg， often fail to address fairness issues adequately. In practical scenarios， data distributions are highly heterogeneous typically， and conventional aggregation operations may introduce model biases towards certain clients， resulting in significant local performance disparities across clients of the global model. To tackle this challenge， a federated learning algorithm for personalization and fairness named FedPF （Federated learning for Personalization and Fairness） was proposed. FedPF aims to reduce inefficient aggregation behaviors in federated learning effectively， and distribute personalized models among clients by exploring the correlations between the global model and local models， thereby ensuring a balanced performance distribution among clients while maintaining performance of the global model. FedPF was evaluated and analyzed on Synthetic， MNIST， and CIFAR10 datasets， and was compared with three federated learning algorithms： FedProx， q-FedAvg， and FedAvg. Experimental results demonstrate that FedPF achieves notable improvements in both effectiveness and fairness.

Domain adaptive semantic segmentation based on masking enhanced self-training

Bo FENG, Haizheng YU, Hong BIAN

2025, 45(7): 2132-2137. DOI: 10.11772/j.issn.1001-9081.2024070935

Asbtract ( )

HTML ( )

PDF (1857KB) ( )

Figures and Tables | References | Related Articles | Metrics

In recent years， semantic segmentation models based on Convolutional Neural Network （CNN） have shown excellent performance in a variety of applications. However， these models usually do not generalize well when they are applied to new domains， especially from synthetic to real data. The problem of Unsupervised Domain Adaptation （UDA） is attempting to train in a known domain with labeled data （the source domain） while learning in an unknown domain with unlabeled data （the target domain）， in order to improve the generalization ability of the segmentation model trained in the source domain to the target domain. The existing methods have made great progress through training pseudo-labels on unlabeled target domain images by self-training， and various ways have been proposed to reduce low-quality pseudo-labels brought by domain migration， but the above leads to mixed results. Aiming at this problem， a masking enhanced self-training domain adaptation method was proposed to generate pseudo-labels for target domain image masking enhanced processing and correct pseudo-labels generated from unmasked target images， and with the goal of minimizing loss of consistency between the pseudo-labels of masked image and the corrected pseudo-labels of unmasked image， more features of the target domain were learnt by the model and more robust pseudo-labels were generated by the model at the same time. Experimental results show that the proposed method achieves good performance in benchmark experiments of semantic segmentation used commonly in two UDA tasks， GTA5 （Grand Theft AutoV） → Cityscapes and SYNTHIA （SYNTHetic collection of Imagery and Annotations） → Cityscapes. Specifically， compared with the classical DACS （Domain Adaptation Cross-domain Sampling） method， the proposed method improves the mean Intersection over Union （mIoU） by 1.3 percentage points on the GTA5 dataset， and 1.2 percentage points on SYNTHIA dataset. In addition， the ablation experimental results show the effectiveness of the proposed mask enhancement and pseudo-label correction modules. It can be seen that the proposed self-training domain adaptation method learns more target domain context information and improves generalization ability of the segmentation model in target domain.

Fine-tuned and filtered oversampling method based on agglomerative hierarchical clustering

Zheng GU, Xuebin CHEN, Hongyang ZHANG, Yuxin LI

2025, 45(7): 2138-2144. DOI: 10.11772/j.issn.1001-9081.2024070919

Asbtract ( )

HTML ( )

PDF (989KB) ( )

Figures and Tables | References | Related Articles | Metrics

A fine-tuned and filtered oversampling method based on Agglomerative Hierarchical Clustering （AHC） was proposed to address the issue of poor classification performance on imbalanced datasets， which can be applied to multi-class imbalanced data scenarios. Firstly， AHC algorithm was employed during the clustering process of imbalanced datasets， so that the majority and minority classes were clustered separately， thereby avoiding class overlap effectively while considering inter-class relationships. Secondly， to balance the dataset while preserving characteristics of the original data， a fine-tuned oversampling algorithm was designed. Thirdly， to improve classification accuracy of the generated samples， a label tendency evaluation and filtering method based on propensity score matching was introduced. Finally， the proposed method was validated through experiments and compared with three methods： MDO （Mahalanobis Distance-based Over-sampling technique）， AND-SMOTE （Automatic Neighborhood size Determination method for Synthetic Minority Over-sampling TEchnique）， and K-means SMOTE. Experimental results demonstrate that the proposed method has excellent performance on six different datasets such as Abalone， Contraceptive and Yeast， confirming effectiveness of the method.

Deep semi-supervised text clustering with intentional regularization

Le XU, Ruizhang HUANG, Ruina BAI, Yongbin QIN

2025, 45(7): 2145-2152. DOI: 10.11772/j.issn.1001-9081.2024070931

Asbtract ( )

HTML ( )

PDF (1772KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that the existing semi-supervised text clustering methods fail to consider user intent in processes of representation learning and clustering simultaneously， a Deep Semi-supervised Text Clustering with Intentional Regularization （IRDSTC） model was proposed. With the introduction of intention regularization strategy， the Intention Regularized Representation Learning （IRRL） module and Intention Regularized Clustering （IRC） module were designed. Firstly， an intent matrix was constructed on the basis of the intent constraint information provided by the user to capture the user’s expectations for the relationship between texts. Secondly， the matrix was applied to the representation learning stage and the clustering stage. In the representation learning stage， the intermediate layer representation extracted by the deep model was converted into a representation correlation matrix， and the intent matrix was combined to construct a regular term， so as to use user intent to drive the representation learning. In the clustering stage， an allocation consistency matrix was constructed according to the class cluster allocation probabilities obtained from clustering iterations， and the intent matrix was combined to construct regular terms， so as to realize the guidance of user intent to the clustering process. Experimental results show that IRDSTC model has better performance in clustering ACCuracy （ACC）， Normalized Mutual Information （NMI） and Adjusted Rand Index （ARI） compared to other clustering methods on Reu-10k， BBC， ACM， and Abstract datasets. In specific， compared with Improved Deep Embedding Clustering（IDEC）， IRDSTC model has the NMI increased by 28.26%， 32.58%， 27.13%， and 34.94%， respectively， indicating that IRDSTC model has better clustering effect.

Semi-EM algorithm for solving Gamma mixture model of multimodal probability distribution

Jiaqi CHEN, Yulin HE, Yingchao CHENG, Zhexue HUANG

2025, 45(7): 2153-2161. DOI: 10.11772/j.issn.1001-9081.2024070942

Asbtract ( )

HTML ( )

PDF (4261KB) ( )

Figures and Tables | References | Related Articles | Metrics

Expectation-Maximization （EM） algorithm plays an important role in parameter estimation for mixture models. However， the existing EM algorithms for solving Gamma Mixture Model （GaMM） parameters have limitations， which mainly are the problems of low-quality parameter estimation led by approximate calculations and inefficient computation due to many numerical calculations. To address these limitations and fully exploit the multimodal nature of data， a Semi-EM algorithm was proposed to solve GaMM for estimating multimodal probability distributions. Firstly， spatial distribution characteristics of the data were explored by using clustering， thereby initializing GaMM parameters and so that a more precise characterization of data’s multimodality was obtained. Secondly， based on the framework of EM algorithm， a customized heuristic strategy was employed to address the challenge of parameter update difficulty caused by the absence of closed-updated expressions. The shape parameters of GaMM were updated by adopting this strategy towards maximizing the log-likelihood value gradually， while remaining parameters were updated in closed-form. A series of persuasive experiments were conducted to validate the feasibility， rationality， and effectiveness of the proposed Semi-EM algorithm. Experimental results demonstrate that the Semi-EM algorithm outperforms the four comparison algorithms in estimating multimodal probability distributions accurately. Specifically， the Semi-EM algorithm has lower error metrics and higher log-likelihood values， indicating that this algorithm can provide more accurate model parameter estimation and then obtain more precise representation of multimodal nature of the data.

Object detection uncertainty measurement scheme based on guide to the expression of uncertainty in measurement

Peiyu JIANG, Yongguang WANG, Yating REN, Shuochen LI, Huobin TAN

2025, 45(7): 2162-2168. DOI: 10.11772/j.issn.1001-9081.2024070941

Asbtract ( )

HTML ( )

PDF (1435KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of treating uncertainty modeling as a step for optimizing prediction results while ignoring the value of uncertainty itself in current object detection algorithms， an object detection result evaluation scheme based on Guide to the Expression of Uncertainty in Measurement （GUM） was proposed. Firstly， sources of uncertainty in object detection were decomposed into three mutually independent aspects： data， model and platform. Then， uncertainty influence factors were extracted from these three aspects to construct measurement function. Secondly， type A and type B evaluation methods in GUM were used to measure uncertainty influence components. Finally， uncertainty synthesis rules were used on the basis of the measurement function， and standard uncertainty was synthesized from uncertainty components. Experiments were conducted by using an object detection algorithm. The results show that compared to Peak Signal-to-Noise Ratio （PSNR） and Structural SIMilarity （SSIM）， the data uncertainty increased 5.30 and 19.08 percentage points respectively in capturing noisy data； model uncertainty has a tiny influence on the prediction results， which can be ignored within the range of 10^-6； platform uncertainty can represent prediction result differences caused by software and hardware platforms in numerical form.

Research review on explainable artificial intelligence in internet of things applications

Xiaoyang ZHAO, Xinzheng XU, Zhongnian LI

2025, 45(7): 2169-2179. DOI: 10.11772/j.issn.1001-9081.2024070927

Asbtract ( )

HTML ( )

PDF (2756KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the era of Internet of Things （IoT）， the integration of Artificial Intelligence （AI） and IoT has become a significant trend driving technological advancements and application innovations. With the exponential growth in the number of connected devices， enhancing end-users’ trust in intelligent systems has become especially critical. Explainable Artificial Intelligence （XAI） refers to AI systems capable of providing their decision-making processes and outcome explanations. The emergence of XAI has propelled the development of AI technology and increased users’ trust in AI systems. Therefore， a research review on XAI in IoT applications was performed. Firstly， the background and significance of IoT and XAI were discussed. Secondly， the definition and key technologies of XAI were presented. Thirdly， the recent progress in traditional AI-driven IoT applications as well as XAI-driven IoT applications were introduced. Finally， the future development directions of XAI in IoT applications were prospected and the associated challenges were summarized.

Multi-scale decorrelation graph convolutional network model

Danyang CHEN, Changlun ZHANG

2025, 45(7): 2180-2187. DOI: 10.11772/j.issn.1001-9081.2024070951

Asbtract ( )

HTML ( )

PDF (8800KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep Graph Neural Networks （GNNs） aim to capture both local and global features in complex networks， thereby alleviating the bottleneck in information propagation in graph-structured data. However， current deep GNN models often face the problem of feature over-correlation. Therefore， a Multi-scale Decorrelation graph convolutional network （Multi-Deprop） model was proposed. The model includes two operations： feature propagation and feature transformation. In feature propagation operation， multi-scale de-correlation parameters were introduced to maintain high de-correlation in lower network layers and weak de-correlation in higher network layers， thereby adapting to the needs of different hierarchical feature processing. In feature transformation operation， orthogonal regularization and maximal informatization loss were introduced， and orthogonal regularization was used to maintain feature independence and maximal informatization was used to maximize mutual information between the input and representation， thereby reducing feature information redundancy. Finally， comparison experiments were conducted on seven node classification datasets among the proposed model and four benchmark models. Experimental results show that the Multi-Deprop model achieves better node classification accuracy in most cases of models with 2 to 32 layers. Particularly on Cora dataset， the Multi-Deprop model has the accuracy of models with 4 to 32 layers improved by 0.80% to 13.28% compared to the benchmark model Deprop， which means the performance degradation problem in deep networks is solved by the proposed model in certain degree. In feature matrix correlation analysis， the feature matrix obtained using the Multi-Deprop model on Cora dataset has a correlation of 0.40， indicating weak correlation， demonstrating that the Multi-Deprop model alleviates the over-correlation issue significantly. The results of ablation studies and visualization experiments show that improvements in both operations contribute to enhancement of model performance. It can be seen that Multi-Deprop model reduces feature redundancy in deep networks significantly while ensuring high classification accuracy， and has strong generalization ability and practical value.

Multi-scale sparse graph guided vision graph neural networks

Zimo ZHANG, Xuezhuan ZHAO

2025, 45(7): 2188-2194. DOI: 10.11772/j.issn.1001-9081.2024070910

Asbtract ( )

HTML ( )

PDF (2247KB) ( )

Figures and Tables | References | Related Articles | Metrics

Recently， the Vision Graph neural network （ViG） has attracted considerable attention from the researchers in the field of computer vision， with graph construction being a key modeling approach in ViG. The existing popular K-Nearest Neighbor （KNN） graph construction approach is limited by fixed scale and quadratic computational complexity， making it difficult to model both local and multi-scale information in the image. To address this issue， a construction method of multi-scale sparse graph — MSSG （Multi-Scale Sparse Graph） was proposed. In this method， the KNN graph was decomposed into three sparse subgraphs of different scales along the channel dimension， achieving linear computational complexity while modeling both local and multi-scale information in the image effectively. To enhance the model’s global modeling capability， a global and local multi-scale information fusion strategy was proposed. Based on these methods， a vision architecture — MSViG （Multi-Scale Vision Graph neural network） was proposed. The results of image classification experiments on ImageNet-1K dataset demonstrate that MSViG outperforms the existing ViG. For example， the proposed MSViG-T achieves a 2.1 percentage points higher Top-1 classification accuracy compared to ViG-T， and it also shows significant performance improvements in downstream vision tasks such as object detection and instance segmentation compared to ViG.

Open set recognition method with open generation and feature optimization

Erkang XIANG, Rong HUANG, Aihua DONG

2025, 45(7): 2195-2202. DOI: 10.11772/j.issn.1001-9081.2024060862

Asbtract ( )

HTML ( )

PDF (3039KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep Neural Network （DNN） may encounter samples that are not present during training， and there are difficulties in rejecting unknown class samples accurately. Open set recognition can classify known class samples accurately while rejecting unknown class samples. In the current domain of open set recognition， prototype learning methods have gained widespread applications. However， these methods cannot ensure intra-class compactness and inter-class separability of sample distribution simultaneously. Therefore， an open set recognition method with Open Generation and Feature Optimization （OGFO） was proposed. Firstly， the concept of open point was presented， and the inherent features of corresponding class samples were learned by prototype points through DNN. The average of prototype points for each class was open point， representing the inherent features of unknown classes， and the central region of the feature space was occupied by the open point as open space for the distribution of unknown class samples. Then， a Feature Optimization Algorithm based on open points （FOA） was proposed， so that open points were utilized to force a more compact distribution within the same class samples， and a more separable distribution among different class samples. Finally， an open point based generation method OGAN （Open Generative Adversarial Network） was proposed， and the unknown class samples generated by OGAN were forced by DNN to distribute in the open space occupied by open points. Experimental results demonstrate that compared with Adversarial Reciprocal Points Learning for open set recognition （ARPL） on datasets such as MNIST， SVHN， CIFAR10， and TinyImageNet， OGFO has significant improvements in AUROC （Area Under the Receiver Operating Characteristic curve）. Especially， on TinyImageNet dataset， OGFO improves the AUROC by at least 3 percentage points and has improvements of at least 6 and at least 5 percentage points， respectively， in accuracy and OSCR （Open Set Classification Rate） compared to ARPL. It can be seen that the challenge which other methods cannot address： balancing intra-distribution compactness and inter-distribution separability of sample distribution simultaneously is addressed by OGFO.

Gradient-discriminative and feature norm-driven open-world object detection

Yingjun ZHANG, Weiwei YAN, Binhong XIE, Rui ZHANG, Wangdong LU

2025, 45(7): 2203-2210. DOI: 10.11772/j.issn.1001-9081.2024070944

Asbtract ( )

HTML ( )

PDF (2461KB) ( )

Figures and Tables | References | Related Articles | Metrics

Open-World Object Detection （OWOD） extends the object detection task to real and variable environments， and requires models to identify known and unknown objects accurately and learn new knowledge gradually. In response to the low recall for unknown classes and the problem of false identification in the existing OWOD methods， a Gradient-Discriminative and Feature Norm-driven OWOD （GDFN-OWOD） network model was proposed. To address the issue of low recall for unknown classes， a Gradient-Discriminative Representation Module （GDRM） was proposed， which uses the gradient difference from backpropagation to distinguish unknown classes from the background accurately， thereby improving the recall for unknown classes. In addition， a Graph Segmentation-based Bounding box Clustering （GSBC） algorithm was introduced to model the determination of object bounding boxes as a graph decomposition problem， thereby reducing redundant bounding boxes， and thus reducing the computational complexity of the model. To tackle the problem of false identification for unknown classes， a FeatureNorm-Based Classifier （FN-BC） was employed to select the best-performing convolutional layer to identity known and unknown classes for higher identification precision. Experimental results on M-OWODB dataset show that compared with the best performance of comparison models in tasks T₁， T₂， and T₃， GDFN-OWOD has the recall for unknown classes increased by 1.1， 2.1， and 0.9 percentage points， respectively， and the Absolute Open-Set Error （A-OSE） reduced by 35.1%， 28.7%， and 12.2%， respectively. It can be seen that compared with the existing OWOD methods， the proposed method alleviates the problems of low recall for unknown classes and false identification effectively.

Multi-view knowledge-aware and interactive distillation recommendation algorithm

Yuelan ZHANG, Jing SU, Hangyu ZHAO, Baili YANG

2025, 45(7): 2211-2220. DOI: 10.11772/j.issn.1001-9081.2024070948

Asbtract ( )

HTML ( )

PDF (2566KB) ( )

Figures and Tables | References | Related Articles | Metrics

Currently， collaborative filtering-based Graph Neural Network （GNN） recommendation systems face data sparsity and cold start issues. Many related algorithms introduce external knowledge of items for supplementary expansion to alleviate these issues， but these algorithms ignore the severe information utilization imbalance caused by direct fusion of sparse collaborative signals and redundant supplementary parts， as well as the problems of information sharing and propagation among different data. Therefore， a Multi-view Knowledge-aware and interactive Distillation Recommendation algorithm （MKDRec） was proposed. Firstly， to tackle data sparsity， the formed collaborative view was enhanced through random dropout， and then neighborhood contrastive learning was applied to node representations in this view. Secondly， regarding to knowledge redundancy problem， each relation type of edge in the knowledge view was encoded， and an items’ knowledge view was reconstructed on the basis of the head and tail entities as well as connecting relations to fully utilize the information. Finally， an associated view with remote connections was constructed on the basis of the equivalence relations between items and entities. With all the above， graph node representations were learned by different convolutional aggregation methods on the three views to extract multiple types of information for users and items， and embedded representations of multiple users and items were obtained. Besides， knowledge distillation and fusion of node feature vectors in pairwise views were performed to realize information sharing and propagation. Experimental results on Book-Crossing， MovieLens-1M， and Last.FM datasets show that compared to the best results among the baseline methods， MKDRec’s AUC （Area Under Curve） are improved by 2.13%， 1.07%， and 3.44%， respectively， and MKDRec’s F1-scores are improved by 3.56%， 1.14%， and 4.46%， respectively.

Zero-shot dialogue state tracking domain transfer model based on semantic prefix-tuning

Yuyang SUN, Minjie ZHANG, Jie HU

2025, 45(7): 2221-2228. DOI: 10.11772/j.issn.1001-9081.2024060865

Asbtract ( )

HTML ( )

PDF (1228KB) ( )

Figures and Tables | References | Related Articles | Metrics

Zero-shot Dialogue State Tracking （DST） requires transferring the existing models to new domains without labeled data. The existing related methods often struggle to capture contextual relationships in dialogue text during domain transfer， leading to insufficient generalization of the related models when facing unknown domains. To address this issue， a zero-shot DST domain transfer model based on semantic prefix-tuning was proposed. Firstly， the slot description was utilized to generate an initial prefix， thereby ensuring close semantic connection of the prefix with the dialogue text. Secondly， the prefix position and domain information were integrated to generate a prefix that combines internal knowledge and domain information. Thirdly， the prefix length was adjusted on the basis of the complexity of dialogue content dynamically to enhance the model’s sensitivity to contextual content. Finally， global prefix insertion was employed to enhance global memory ability of the model for dialogue history. Experimental results show that compared with Prompter model， the proposed model increases the Joint Goal Accuracy （JGA） by 5.50， 0.90 and 7.50 percentage points， respectively， in the Restaurant， Taxi and Train domains of MultiWOZ2.1 dataset， and by 0.65， 14.51 and 0.65 percentage points， respectively， in the Messaging， Payment and Trains domains of SGD dataset. It can be seen that the context understanding ability and generalization transfer performance of the proposed model in DST tasks in zero-shot scenarios are improved effectively.

Nested named entity recognition combined with boundary generation by multi-objective learning

Zhangjie XU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN

2025, 45(7): 2229-2236. DOI: 10.11772/j.issn.1001-9081.2024070980

Asbtract ( )

HTML ( )

PDF (1419KB) ( )

Figures and Tables | References | Related Articles | Metrics

Named Entity Recognition （NER） aims to identify predefined entity types from unstructured text. Span-based NER methods recognize entities through enumerating all the spans. However， adjacent spans in the text share contextual semantics， which leads to semantic information ambiguity among span boundaries， thus making it difficult for models to capture dependency information among spans. To address the issue of semantic information ambiguity among span boundaries， a multi-objective learning NER model combined with boundary generation was proposed. The model was trained through a multi-objective learning approach jointly through combining NER task with boundary generation task. Among which， the boundary generation task was used as an auxiliary task to guide the model network to focus on boundary information of the spans， thus improving the performance of NER. Tests conducted on the ACE2004， ACE2005， and GENIA datasets show that the proposed model achieves F1 scores of 87.83%， 86.90%， and 81.65%， respectively. Experimental results fully validate the effectiveness of the model on different datasets and also further confirm its superior performance in named entity recognition tasks.

Multimodal sentiment analysis model with cross-modal text information enhancement

Yihan WANG, Chong LU, Zhongyuan CHEN

2025, 45(7): 2237-2244. DOI: 10.11772/j.issn.1001-9081.2024060886

Asbtract ( )

HTML ( )

PDF (1163KB) ( )

Figures and Tables | References | Related Articles | Metrics

Multimodal Sentiment Analysis （MSA） that utilize text， visual， and audio data to analyze speakers’ emotions in videos have garnered widespread attention. However， the contributions of different modalities to sentiment analysis vary significantly. Generally， the information contained in text is more intuitive， making it particularly important to seek a strategy for enhancing text in sentiment analysis. To address this issue， a Multimodal Sentiment Analysis Model with Cross-modal Text-information Enhancement （MSAM-CTE） was proposed. Firstly， the BERT （Bidirectional Encoder Representations from Transformers） pre-trained model was employed to extract text features， and the Bi-directional Long Short-Term Memory （Bi-LSTM） network was used to further process the pre-processed audio and video features. Then， a text based cross-attention mechanism was applied to integrate text information into emotion related nonverbal representations， thereby learning text oriented pairwise cross-modal mappings to obtain effective unified multimodal representations. Finally， the fused features were utilized for sentiment analysis. Experimental results show that compared to the optimal baseline model — Text Enhanced Transformer Fusion Network （TETFN）， the proposed model achieved a 2.6% reduction in Mean Absolute Error （MAE） and a 0.1% increase in Pearson Correlation coefficient （Corr） on the CMU-MOSI （Carnegie Mellon University Multimodal Opinion Sentiment Intensity） dataset；on the CMU-MOSEI （Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity） dataset， the improvements are 3.8% for MAE and 1.7% for Corr， respectively， verifying the effectiveness of MSAM-CTE in sentiment analysis.

Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism

Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU

2025, 45(7): 2245-2252. DOI: 10.11772/j.issn.1001-9081.2024070945

Asbtract ( )

HTML ( )

PDF (4130KB) ( )

Figures and Tables | References | Related Articles | Metrics

To improve the training stability of temporal convolutional networks （TCNs） under varying batch sizes and address the issue of low prediction accuracy caused by the inability of batch process quality prediction to capture long-term dependencies and global correlations， a Batch Group Normalization （BGN） and Mish activation function-enhanced residual structure TCN （BMTCN） combined with multi-head self-attention mechanism （MHSA） for batch process quality prediction （BMTCN-MHSA） was proposed. First， the three-dimensional data of the batch process was unfolded into a two-dimensional matrix form， and the data was normalized. Then， singular spectrum analysis （SSA） decomposition was introduced to reconstruct the data. Second， BGN was integrated into the residual part of the time-domain convolution to reduce the network model’s sensitivity to changes in batch size， the Mish activation function was introduced to enhance the model’s generalization ability， and the multi-head self-attention mechanism was utilized to associate and weight feature information from different positions in the sequence， thereby further extracting key feature information and interdependencies within the sequence， and better capturing the dynamic characteristics of the batch process. Finally， the model was validated using penicillin simulation experiment data. The experimental results show that compared to the TCN model， the BMTCN-MHSA model reduces the Mean Absolute Error （MAE） by 56.86%， the Mean Squared Error （MSE） by 48.80%， and achieves a coefficient of determination （R2） of 99.48%， indicating that the BMTCN-MHSA model improves the accuracy of quality prediction for batch processes.

Real-time prediction of traffic status based on clustering multivariate time series model

Shujun GUO, Weijun REN, Qianqian CHEN, Guangfei YOU

2025, 45(7): 2253-2261. DOI: 10.11772/j.issn.1001-9081.2024070956

Asbtract ( )

HTML ( )

PDF (2758KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues that the existing traffic status prediction models cannot handle the fuzziness of highway traffic status effectively and fail to utilize real-time data streams after model training， a real-time prediction method for traffic status based on clustering multivariate time series model was proposed. Firstly， traffic flow parameters were analyzed， and a classification model based on the improved Fuzzy C-Means （FCM） algorithm and entropy weight method was developed to define traffic status and establish classification standards， and a Status Index （SI） indicator was employed to address the issue of classification boundary fuzziness. Secondly， a multivariate time series prediction model was constructed on the basis of the classification model. By combining convolutional networks and attention mechanisms， this model was able to capture both long-term and short-term dependencies in time series data effectively. Thirdly， a back-propagation update mechanism was applied for online learning to realize real-time prediction. Finally， the model was tested on the California Traffic Management Center Performance Measurement System （PeMS） dataset， the dataset was divided into training， validation， and test sets in time order， and real-time data stream simulations were conducted for online learning and prediction. Experimental results show that with the prediction step of 6， compared to the classic LightTS （Light Sampling-oriented MLP Structures） model， the proposed model reduces the Mean Squared Error （MSE） by 22.81% and the Mean Absolute Error （MAE） by 14.64%. It can be seen that the proposed model distinguishes traffic status levels effectively and achieves real-time traffic status prediction.

Time series forecasting model based on segmented attention mechanism

Huibin WANG, Zhan’ao HU, Jie HU, Yuanwei XU, Bo WEN

2025, 45(7): 2262-2268. DOI: 10.11772/j.issn.1001-9081.2024070929

Asbtract ( )

HTML ( )

PDF (831KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of local dependency loss during long-term forecasting due to increased sampling interval after time series segmentation， a time series forecasting model based on Segmented Attention Mechanism （SAMformer） was proposed. Firstly， time static covariates were fused with original data in proportion explicitly to enhance time domain information representation ability of the data. Secondly， two continuous linear layers with bias and an activation function were introduced to fine-tune the fused data， thereby improving the model’s ability to fit nonlinear data. Thirdly， a dot product attention mechanism was introduced in each segment of the segmented series to capture local feature dependencies. Finally， a cross-scale dependency based encoder-decoder architecture was utilized to predict time series data. Several experiments of the proposed model were carried out on five public time series datasets， and the results show that compared with other supervised learning based time series forecasting models， Crossformer， Pyraformer， and Informer， SAMformer reduces the Mean Squared Error （MSE） and Mean Absolute Error （MAE） by 2.0%-62.0% and 0.9%-49.8% respectively. Besides， through ablation experiments， the completeness and effectiveness of the proposed different components are verified， which further shows that fusion of time domain information and intra-segment attention mechanism is helpful to improve the accuracy of time series forecasting.

Granular-ball prototypical network for few-shot image classification

Ruifeng BAI, Guanglei GOU, Lang WEN, Wanyu MIAO

2025, 45(7): 2269-2277. DOI: 10.11772/j.issn.1001-9081.2024071008

Asbtract ( )

HTML ( )

PDF (1764KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of sparse training data and the inadequacy of a single distance metric in measuring relationships among samples comprehensively in few-shot learning， a few-shot image classification method based on Granular-Ball Prototypical Network （GBProtoNet） was proposed. Firstly， the Ball k-means algorithm was applied to the query set， and category information was obtained by adaptively updating iteratively， after the above， this information was combined with ProtoNet to construct granular-ball prototypes with information from both the query set and the support set， thereby mitigating the problem of limited training data. Secondly， after GBProtoNet feature extraction， a feature selection module was designed to extract important information from samples， and Ball k-means algorithm was used to obtain the cluster centers for categories in the query set， which were then weighted and fused with the original prototypes to construct more representative granular-ball prototypes. Thirdly， the Euclidean distance and cosine distance between the original query set samples and the granular-ball prototypes were computed and multiplied to achieve a comprehensive distance， thereby making distance metric between samples more comprehensive. Finally， according to the nearest neighbor assignment principle， the query set samples were assigned to their categories. Experimental results in 5-way 1-shot and 5-way 5-shot image classification tasks using MiniImageNet and TieredImageNet datasets show that the proposed method improves the classification accuracy by 6.18% and 3.85% on MiniImageNet dataset， and by 6.89% and 3.57% on TieredImageNet dataset， compared to the baseline ProtoNet. Additionally， the time cost required of the proposed method for 5-shot image classification tasks on MiniImageNet dataset is reduced by 72.6% compared to SSL-ProtoNet （Self-Supervised Learning ProtoNet）. These results demonstrate that the proposed method enhances classification accuracy for few-shot learning effectively and has high efficiency.

Cloud-based conditional broadcast proxy re-encryption scheme

Binhan LI, Lunzhi DENG, Huan LIU

2025, 45(7): 2278-2287. DOI: 10.11772/j.issn.1001-9081.2024070989

Asbtract ( )

HTML ( )

PDF (1578KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the common issue of cloud server authority abuse in Proxy Re-Encryption （PRE） schemes， as well as limitations of the existing Conditional Proxy Re-Encryption （CPRE） schemes in terms of multiple receivers， security， and computational cost， a Certificate-Based Conditional Broadcast Proxy Re-Encryption （CB-CBPRE） scheme was proposed. In the scheme， an access condition was set by the data owner when generating convertible ciphertext and re-encryption key， and a valid re-encrypted ciphertext only generated by the cloud server when the condition matched， thereby preventing the cloud server from abusing its authority and providing re-encrypted ciphertext to unauthorized users. The security of the proposed scheme was a Decisional Diffie-Hellman （DDH） problem， and ciphertext of the proposed scheme was proven to be indistinguishable in Random Oracle Model （ROM）. Experimental results show that when the number of receivers is 50， compared with four schemes： Identity-Based Broadcast Proxy Re-Encryption （IB-BPRE）， Privacy-Preserving Proxy Re-Encryption （PP-PRE）， Revocable Identity-Based Broadcast Proxy Re-Encryption （RIB-BPRE）， and Multi-Channel Broadcast Proxy Re-Encryption （MC-BPRE）， the proposed scheme reduces the computational time by 73%， 83%， 87%， and 92%， respectively， and the communication volume by 66%， 90%， 77%， and 66%， respectively， enhancing efficiency of encryption effectively.

Large-scale IoT binary component identification based on named entity recognition

Lixiao ZHANG, Yao MA, Yuli YANG, Dan YU, Yongle CHEN

2025, 45(7): 2288-2295. DOI: 10.11772/j.issn.1001-9081.2024070918

Asbtract ( )

HTML ( )

PDF (1953KB) ( )

Figures and Tables | References | Related Articles | Metrics

Internet of Things （IoT） device manufacturers often reuse a large number of open-source components compiled from open-source code in firmware development， with each firmware typically comprising hundreds of such components. If these components are not updated promptly， they may carry unpatched vulnerabilities to integrate into the firmware， thereby posing significant security risks to IoT devices. Therefore， identifying binary components in IoT firmware is crucial for ensuring the security of IoT devices. To address the difficulty of the existing methods in identifying binary components on a large scale， a large-scale IoT binary component identification method based on Named Entity Recognition （NER） was proposed. Firstly， internal binary components were extracted from firmware through decompression. Then， semantic information of the component was obtained through two ways： extraction of readable strings and execution of the component. Finally， the RoBERTa-BiLSTM-CRF’s NER model was utilized to identify component names and version numbers. Experimental results on 6 575 firmware samples released by 12 popular IoT manufacturers demonstrate that the proposed method achieves an F1 value of 87.67%， and identifying 163 binary components successfully. It can be seen that this method effectively expands the identification range of binary components in IoT firmware， enhancing firmware security from the perspective of software supply chain.

Source code vulnerability detection method based on Transformer-GCN

Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU

2025, 45(7): 2296-2303. DOI: 10.11772/j.issn.1001-9081.2024070998

Asbtract ( )

HTML ( )

PDF (2132KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing deep learning-based methods for source code vulnerability detection often suffer from severe loss of syntax and semantics in target code， and neural network models allocating weights to the graph nodes （edges） in target code unreasonably. To address these issues， a method named VulATGCN for detecting source code vulnerabilities was proposed on the basis of Code Property Graph （CPG） and Adaptive Transformer-Graph Convolutional Network （AT-GCN）. In the method， CPG was used to represent source code， CodeBERT was combined for node vectorization， and graph centrality analysis was employed to extract deep structural features， thereby capturing the code’s syntax and semantic information in multi-dimensional way. After that， AT-GCN model was designed by integrating strengths of Transformer-based self-attention mechanism， which excels at capturing long-range dependencies， and Graph Convolutional Network （GCN）， which is proficient at capturing local features， thereby realizing fusion learning and precise extraction of features from regions with different importance. Experimental results on real vulnerability datasets Big-Vul and SARD show that the proposed method VulATGCN achieves an average F1 score of 82.9%， which is 10.4% to 132.9% higher than deep learning-based vulnerability detection methods such as VulSniper， VulMPFF， and MGVD， with an average increase of approximately 52.9%.

Dung beetle optimizer algorithm with restricted reverse learning and Cauchy-Gauss variation

Zhilong YANG, Dexuan ZOU, Can LI, Yingying SHAO, Lejie MA

2025, 45(7): 2304-2316. DOI: 10.11772/j.issn.1001-9081.2024060778

Asbtract ( )

HTML ( )

PDF (1670KB) ( )

Figures and Tables | References | Related Articles | Metrics

To overcome the shortcomings of slow convergence， low accuracy and being easy to fall into local optimum in Dung Beetle Optimizer （DBO） algorithm， a Dung Beetle Optimizer algorithm with restricted reverse learning and Cauchy-Gauss variation （SI-DBO） was proposed. Firstly， Circle mapping was used to initialize the population to make distribution of the population more uniform and diverse， which improved the convergence speed and optimization accuracy of the algorithm. Secondly， restricted reverse learning was used to update the locations of dung beetles， so as to improve the search ability of dung beetles. Finally， Cauchy-Gauss variation strategy was used to help the population escape from the local optimal location and find the global optimal location. To verify the performance of SI-DBO， simulation experiments were carried out on the test functions and Wilcoxon rank-sum test was performed on the experimental results， and the algorithm was used to solve robot gripper problem. Experimental results show that SI-DBO achieves higher optimization accuracy and convergence speed than Black Widow-Dung Beetle Optimization （BWDBO） algorithm and Sparrow Search Algorithm （SSA） on the test functions. Meanwhile， SI-DBO performs better than Particle Swarm Optimization （PSO） algorithm for solving robot gripper problem， indicating better optimization performance and engineering practicability of SI-DBO.

Fast and fully autonomous exploration method for multi-UAV in large-scale complex environments

Shu LI, Guoqing LIU, Siyuan LI, Yaochang QIN

2025, 45(7): 2317-2324. DOI: 10.11772/j.issn.1001-9081.2024060868

Asbtract ( )

HTML ( )

PDF (3758KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the problems of low exploration efficiency and information exchange under limited communication bandwidth in the current Multiple Unmanned Aerial Vehicle （Multi-UAV） systems when exploring large-scale complex environments， a fast and fully autonomous exploration method for Multi-UAV in large-scale complex environments was proposed， including a fast and hierarchical exploration strategy and a lightweight large-scale environment modeling method. Firstly， closed viewpoints were generated in the front-end trajectory planning part to drive the Unmanned Aerial Vehicles （UAVs） to explore unknown environments. Then， the smooth， continuous， and time-optimal trajectory optimization problem was transformed into a convex optimization problem in the back-end， and this problem was modeled systematically. Meanwhile， in terms of environmental characterization， a random mapping method was used for lightweight mapping and map data interaction. Finally， in simulation， the proposed method was compared with fast exploration method using incremental boundary information and hierarchical planning — FUEL （Fast Unmanned aerial vehicle ExpLoration）， rapid exploration method based on frontiers — FBE （Frontier-Based Exploration）， and exploration method based on the next best viewpoint — NBVP （Next Best View Planner）. The results show that the proposed method improves the exploration time performance by 14.4%， 43.9% and 47.7%， respectively， and the lightweight mapping method reduces the data size by 28.3% and 22.4%， respectively， compared to the Bayesian method and the polyhedron method. It can be seen that the proposed method can perform fast and fully autonomous exploration in large-scale complex environments efficiently.

Pedestrian detection algorithm based on multi-view information

Haoyu LIU, Pengwei KONG, Yaoli WANG, Qing CHANG

2025, 45(7): 2325-2332. DOI: 10.11772/j.issn.1001-9081.2024070961

Asbtract ( )

HTML ( )

PDF (2996KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of false detection and missed detection caused by severe object occlusion and the lack of consideration of relationships among multiple views in the existing multi-view pedestrian detection algorithms， an improved multi-view pedestrian detection algorithm based on MVDeTr （MultiView Detection with shadow Transformer） algorithm was proposed. Firstly， during the feature extraction phase， a view enhancement module — VEM （View Enhancement Module） was designed to enhance important views by focusing on relationships among different views. Secondly， in the process of introducing multi-view information into a single view， an Efficient Multi-scale Attention （EMA） module was added to establish short-distance and long-distance dependencies， thereby improving the detection performance. Finally， based on the Shadow Transformer module in the original baseline algorithm， a new multi-view information processing module — EST （Efficient Shadow Transformer） was designed to reduce the use of redundant information in multiple views while maintaining detection effect. Experimental results show that the proposed algorithm enhances the main detection metric MODA （Multiple Object Detection Accuracy） by 1.8 percentage points， the detection metric MODP （Multiple Object Detection Precision） by 0.6 percentage points， and Recall by 1.8 percentage points on Wildtrack dataset compared to the original MVDeTr algorithm， demonstrating the effectiveness of the proposed algorithm in multi-view pedestrian detection tasks.

Helmet wearing detection algorithm for complex scenarios based on cross-layer multi-scale feature fusion

Liang CHEN, Xuan WANG, Kun LEI

2025, 45(7): 2333-2341. DOI: 10.11772/j.issn.1001-9081.2024070999

Asbtract ( )

HTML ( )

PDF (4986KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issue of missed and false detections of small objects of helmet wearing detection in construction scenarios， caused by reasons such as crowding， occlusion， and complex backgrounds， a cross-layer multi-scale helmet wearing detection algorithm with double attention mechanism based on YOLOv8n was proposed. Firstly， a small object detection head was designed to enhance the model’s ability to detect small objects. Secondly， the double attention mechanism was embedded in the feature extraction network to focus more on capturing object features in complex scenarios. Thirdly， the feature fusion network was replaced with the cross-layer multi-scale feature fusion structure S-GFPN （Selective layer Generalized Feature Pyramid Network）， which was improved with Re-parameterized Generalized Feature Pyramid Network （RepGFPN）， so as to enable multi-scale fusion of small object feature layer with other layers and establish long-term dependencies， thus reducing background information interference. Finally， the MPDIOU （Intersection Over Union with Minimum Point Distance） loss function was employed to address non-sensitivity issues related to scale changes. Experimental results on the public dataset GDUT-HWD show that compared to the YOLOv8n， the improved model increases the mAP@0.5 by 3.4 percentage points， and improves the detection accuracy for blue， yellow， white， and red helmets by 2.0， 1.1， 4.6， and 9.1 percentage points， respectively. The model also outperforms the YOLOv8n in five complex scenarios： density， occlusion， small objects， light reflection， and darkness， and provides an effective method for helmet wearing detection in real-world construction scenarios.

Small target detection model for UAV aerial photography based on improved YOLOv8

Bogan FAN, Shuqing WANG, Kaiyuan CHEN

2025, 45(7): 2342-2350. DOI: 10.11772/j.issn.1001-9081.2024070946

Asbtract ( )

HTML ( )

PDF (4318KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the current problem of low performance as well as missed and false detection of small targets in Unmanned Aerial Vehicle （UAV） perspective， an improved BDS-YOLO （BiFPN-Dual-Small target detection-YOLO） model based on YOLOv8 was proposed. Firstly， RepViTBlock （Revisiting mobile CNN from ViT perspective Block） and EMA （Efficient Multi-scale Attention） mechanism were used to construct C2f-RE （C2f-RepViTBlock Efficient multi-scale attention） to improve deep C2f （faster implementation of CSP bottleneck with 2 Convolutions） module in the backbone network， thereby enhancing the model’s ability to extract small target features and reducing the number of parameters. Secondly， BiFPN （Bi-directional Feature Pyramid Network） was used to reconstruct the neck network， so that features at different levels were able to be fused with each other. Thirdly， a dual small target detection layer was constructed on the basis of the improved neck network， and the layer was combined with shallow and shallowest features to improve detection ability of the model for small targets. Finally， the improved loss function Inner-EIoU （Inner-Efficient-Intersection over Union） was introduced. In this function， a more reasonable aspect ratio measure method was used and the limitations of IoU （Intersection over Union） itself were addressed. Experimental results show that compared to the original model on VisDrone2019 dataset， the improved model improves the precision， recall， mAP@50， and mAP@50：95 by 8.5， 7.7， 9.2， and 6.3 percentage points， respectively， with parameters of only 2.23×10⁶， which means a reduction in model size of 19.1%. It can be seen that the proposed model improves performance significantly while achieving certain lightweight.

Multi-object tracking algorithm for construction machinery in transmission line scenarios

Pingping YU, Yuting YAN, Xinliang TANG, He SU, Jianchao WANG

2025, 45(7): 2351-2360. DOI: 10.11772/j.issn.1001-9081.2024070985

Asbtract ( )

HTML ( )

PDF (11294KB) ( )

Figures and Tables | References | Related Articles | Metrics

In transmission line inspection tasks， utilizing deep learning technology to track the movement of construction machinery effectively is crucial for smart grid construction. To address the issue of significant performance degradation in multi-object tracking caused by occlusion among targets and false or missed detections， a multi-object tracking algorithm combining improved YOLOv5s and optimized ByteTrack was proposed. In the object detection section： firstly， lightweight Ghost convolution and SimAM were used to construct the SGC3 （SimAM and Ghost convolution with C3） module， thereby improving feature utilization and reducing redundant computations in the algorithm. Secondly， in deeper layers of the backbone network， a convolution-guided triplet attention module R-Triplet （RFAConv with Triplet attention） was proposed， thereby using a multi-branch structure to enhance cross-dimensional information interaction of the algorithm and suppress irrelevant background information to improve object association capability. Finally， in the feature fusion stage， a Multi-branch Receptive Block （MRB） was added， thereby utilizing dilated convolution to expand the receptive field of the object and enhancing reuse of multi-scale global feature information of the object. In the object tracking section： based on ByteTrack algorithm， according to motion characteristics of construction machinery， an NSA （Noise Scale Adaptively） Kalman filter algorithm with adaptive noise scale computation was proposed to decrease the influence of low-quality detection boxes on filtering performance. At the same time， Gaussian Smoothing Interpolation （GSI） algorithm was introduced into the data association process to further optimize multi-object tracking performance. Experimental results indicate that compared to the baseline algorithm YOLOv5s， the proposed CRM-YOLOv5s algorithm achieves mean Average Precision （mAP） of 97.4%， which is improved by 3.8 percentage points with the of parameters and floating-point operations reduced by 0.28×10⁶ and 1.8 GFLOPs， respectively， demonstrating stronger generalization capability in various application scenarios. Additionally， compared to the original YOLOv5s+ByteTrack tracking algorithm， after combining with improved ByteTrack， the proposed CRM-YOLOv5s algorithm has the Multiple Object Tracking Accuracy （MOTA） increased by 4.5 percentage points， the number of Identity switches （IDs） decreased by 15， and higher inference speed， demonstrating that the algorithm is suitable for multi-object tracking task of construction machinery in transmission line scenarios.

Robustness optimization method of visual model for intelligent inspection

Zhenzhou WANG, Fangfang GUO, Jingfang SU, He SU, Jianchao WANG

2025, 45(7): 2361-2368. DOI: 10.11772/j.issn.1001-9081.2024070959

Asbtract ( )

HTML ( )

PDF (3821KB) ( )

Figures and Tables | References | Related Articles | Metrics

The vision task of intelligent inspection of transmission lines is crucial to safety and stability of the power system. Although deep learning networks perform well on uniformly distributed training and test datasets， deviations in data distribution often degrade model performance in real-world applications. To solve this problem， a Training Method based on Contrastive Learning （TMCL） was proposed， aiming to enhance robustness of the model. Firstly， a benchmark test set， TLD-C （Transmission Line Dataset-Corruption）， specially designed for transmission line scenario was constructed to evaluate the model’s robustness facing image corruption. Secondly， the model’s ability to distinguish different categories of features was improved by constructing positive and negative sample pairs that are sensitive to category features. Thirdly， a joint optimization strategy combining contrastive loss and cross-entropy loss was used to impose additional constraints on the feature extraction process， so as to optimize representation of the feature vectors. Finally， a Non-local Feature Denoising network （NFD） was introduced to extract features closely related to categories. Experimental results show that compared to the original method， the improved training method achieves an average precision improved by 3.40 percentage points on Transmission Line Dataset （TLD）， and a relative Corruption Precision （rCP） increased by 4.69 percentage points on TLD-C dataset.

Dual-branch distribution consistency contrastive learning model for hard negative sample identification in chest X-rays

Jin XIE, Surong CHU, Yan QIANG, Juanjuan ZHAO, Hua ZHANG, Yong GAO

2025, 45(7): 2369-2377. DOI: 10.11772/j.issn.1001-9081.2024070968

Asbtract ( )

HTML ( )

PDF (4052KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the issues of Contrastive Learning （CL） methods struggling to distinguish similar chest X-ray samples and detect tiny lesions in medical images， a dual-branch distribution consistency contrastive learning model （TCL） was proposed. Firstly， inpainting and outpainting data augmentation strategies were employed to strengthen the model’s focus on lung textures， thereby improving the model’s ability to recognize complex structures. Secondly， a collaborative learning approach was used to further enhance the model’s sensitivity to tiny lesions in lungs， thereby capturing lesion information from different perspectives. Finally， the heavy-tailed characteristic of Student-t distribution was utilized to differentiate hard negative samples， so as to constrain the consistency of distributions among different augmented views and samples， thereby reinforcing the learning of feature relationships among hard negatives and other samples， and reducing the influence of hard negatives on the model. Experimental results on four chest X-ray datasets， including pneumoconiosis， NIH （National Institutes of Health）， Chest X-Ray Images （Pneumonia）， and COVID-19 （Corona Virus Disease 2019）， demonstrate that compared to MoCo v2 （Momentum Contrastive Learning） model， TCL model improves the accuracy by 6.14%， 3.08%， 0.65%， and 4.67%， respectively， and in terms of transfer performance on COVID-19 dataset， TCL model achieves improvements of 4.10%， 0.61%， and 8.41%， respectively， at label rate of 5%， 20%， and 50%. Furthermore， CAM （Class Activation Mapping） visualization verifies that TCL model focuses on critical pathological regions effectively， confirming the model’s effectiveness.

Neural architecture search for multi-tissue segmentation using convolutional and transformer-based networks in glioma segmentation

Yongpeng TAO, Shiqi BAI, Zhengwen ZHOU

2025, 45(7): 2378-2386. DOI: 10.11772/j.issn.1001-9081.2024070977

Asbtract ( )

HTML ( )

PDF (3485KB) ( )

Figures and Tables | References | Related Articles | Metrics

The significant variations in shape and size， blurred boundaries， and complex tissue structures of glioma in Magnetic Resonance Imaging （MRI） Images， lead to challenges in task for brain tumor segmentation. Typically， this task requires researchers with deep professional knowledge to design complex personalized network models， which is time-consuming and needs many human resources. To simplify the network design process and obtain the optimal network architecture automatically， a Neural Architecture Search for multi-tissue segmentation using Convolutional and Transformer-based Networks in glioma segmentation （NASCT-Net） was proposed to improve segmentation precision during construction of network architecture for multi-modal MRI brain tumor segmentation. Firstly， Neural Architecture Search （NAS） technology was applied to construction of an encoder， thereby forming stackable Neural Architecture Search ENcoder CNN （NAS-CNN） encoding modules to optimize the network structure for precise glioma segmentation automatically. Secondly， a feature encoding module based on Transformer was integrated at the lower layers of the encoder to improve the representation ability of relative positions and global information among tumor components. Finally， a Volume-Weighted Dice Loss function （VWDiceLoss） was constructed to address the imbalance between foreground and background. NASCT-Net was compared with methods such as Swin-Unet on BraTS2019 brain tumor datasets. Experimental results show that NASCT-Net has an average Dice Similarity Coefficient （DSC） improvement of 0.009 and an average Hausdorff Distance （HD） decrease of 1.831 mm， validating the effectiveness of NASCT-Net in enhancing the precision of multi-tissue brain tumor segmentation.

Table of Content