Journal of Computer Applications

Review of evolutionary multitasking from the perspective of optimization scenarios

Jiawei ZHAO, Xuefeng CHEN, Liang FENG, Yaqing HOU, Zexuan ZHU, Yew‑Soon Ong

2024, 44(5): 1325-1337. DOI: 10.11772/j.issn.1001-9081.2024020208

Asbtract ( )

HTML ( )

PDF (1383KB) ( )

Figures and Tables | References | Related Articles | Metrics

Due to the escalating complexity of optimization problems， traditional evolutionary algorithms increasingly struggle with high computational costs and limited adaptability. Evolutionary MultiTasking Optimization （EMTO） algorithms have emerged as a novel solution， leveraging knowledge transfer to tackle multiple optimization issues concurrently， thereby enhancing evolutionary algorithms’ efficiency in complex scenarios. The current progression of evolutionary multitasking optimization research was summarized， and different research perspectives were explored by reviewing existing literature and highlighting the notable absence of optimization scenario analysis. By focusing on the application scenarios of optimization problems， the scenarios suitable for evolutionary multitasking optimization and their fundamental solution strategies were systematically outlined. This study thus could aid researchers in selecting the appropriate methods based on specific application needs. Moreover， an in-depth discussion on the current challenges and future directions of EMTO were also presented to provide guidance and insights for advancing research in this field.

Research review of multitasking optimization algorithms and applications

Yue WU, Hangqi DING, Hao HE, Shunjie BI, Jun JIANG, Maoguo GONG, Qiguang MIAO, Wenping MA

2024, 44(5): 1338-1347. DOI: 10.11772/j.issn.1001-9081.2024020209

Asbtract ( )

HTML ( )

PDF (1486KB) ( )

Figures and Tables | References | Related Articles | Metrics

Evolutionary MultiTasking Optimization （EMTO） is one of the new methods in evolutionary computing， which can simultaneously solve multiple related optimization tasks and enhance the optimization of each task through knowledge transfer between tasks. In recent years， more and more research on evolutionary multitasking optimization has been devoted to utilizing its powerful parallel search capability and potential for reducing computational costs to optimize various problems， and EMTO has been used in a variety of real-world scenarios. The researches and applications of EMTO were discussed from four aspects： principle， core design， applications， and challenges. Firstly， the general classification of EMTO was introduced from two levels and four aspects， including single-population multitasking， multi-population multitasking， auxiliary task， and multiform task. Next， the core component design of EMTO was introduced， including task construction and knowledge transfer. Finally， its various application scenarios were introduced and a summary and outlook for future research was provided.

Two-stage differential grouping method for large-scale overlapping problems

Maojiang TIAN, Mingke CHEN, Wei DU, Wenli DU

2024, 44(5): 1348-1354. DOI: 10.11772/j.issn.1001-9081.2024020255

Asbtract ( )

HTML ( )

PDF (738KB) ( )

Figures and Tables | References | Related Articles | Metrics

Large-scale overlapping problems are prevalent in practical engineering applications， and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based Cooperative Co-evolution （CC） algorithms have demonstrated promising performance in addressing large-scale overlapping problems. However， certain novel CC frameworks designed for overlapping problems rely on grouping methods for the identification of overlapping problem structures and the current grouping methods for large-scale overlapping problems fail to consider both accuracy and efficiency simultaneously. To address the above problems， a Two-Stage Differential Grouping （TSDG） method for large-scale overlapping problems was proposed， which achieves accurate grouping while significantly reducing computational resource consumption. In the first stage， a grouping method based on the finite difference principle was employed to efficiently identify all subcomponents and shared variables. To enhance both stability and accuracy in grouping， a grouping refinement method was proposed in the second stage to examine the information of the subcomponents and shared variables obtained in the previous stage and correct inaccurate grouping results. Based on the synergy of the two stages， TSDG achieves efficient and accurate decomposition of large-scale overlapping problems. Extensive experimental results demonstrate that TSDG is capable of accurately grouping large-scale overlapping problems while consuming fewer computational resources. In the optimization experiment， TSDG exhibits superior performance compared to state-of-the-art algorithms for large-scale overlapping problems.

Multi-timescale cooperative evolutionary algorithm for large-scale crude oil scheduling

Wanting ZHANG, Wenli DU, Wei DU

2024, 44(5): 1355-1363. DOI: 10.11772/j.issn.1001-9081.2024020254

Asbtract ( )

HTML ( )

PDF (2180KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming to solve the problems of large-scale resources， complex constraints， and difficult cooperation of multi-timescale decision-making in the crude oil scheduling process， a Multi-Timescale Cooperation Evolutionary Algorithm （MTCEA） was proposed. Firstly， a large-scale multi-timescale crude oil scheduling optimization model was established according to the scale structure and actual demand of oil refining enterprises， which consists of a resource-oriented medium- and long-term scheduling model and an operation-oriented short-term scheduling model， and achieves a reasonable allocation of crude oil resources through employing a dynamic grouping strategy of crude oil resources to satisfy the requirements of different scheduling scales， multi-timescale characteristics， and fine production. Secondly， to promote the integration of scheduling decisions at different time scales， an evolutionary algorithm based on multi-timescale cooperation was designed and solved by constructing subproblems for the continuous decision variables in scheduling models at different time scales to achieve cooperation optimization between scheduling decisions at different time scales. Finally， MTCEA was verified in three practical industrial cases. Compared with three representative large-scale evolutionary optimization algorithms （i.e.， Competitive Swarm Optimizer （CSO）， Self-adaptive Differential Evolution with Modified Multi-Trajectory Search （SaDE-MMTS）， and Mixture Model-based Evolution Strategy （MMES）） and three high-performance Mixed Integer Non-Linear Programming （MINLP） mathematical solvers （ANTIGONE （Algorithms for coNTinuous/Integer Global Optimization of Nonlinear Equations）， SCIP （Solving Constraint Integer Programs）， and SHOT （Supporting Hyperplane Optimization Toolkit））， the results show that the metrics of the solution optimality and stability of MTCEA are improved by more than 30% and 25%， respectively. These significant performance improvements demonstrate the practical application value and advantages of MTCEA in large-scale multi-timescale crude oil scheduling decisions.

GPU-accelerated evolutionary optimization of multi-objective flow shop scheduling problems

Tao JIANG, Zhenyu LIANG, Ran CHENG, Yaochu JIN

2024, 44(5): 1364-1371. DOI: 10.11772/j.issn.1001-9081.2024010028

Asbtract ( )

HTML ( )

PDF (1464KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the realms of intelligent manufacturing and environmental sustainability， the significance of multi-objective scheduling in orchestrating a balance among production efficiency， cost management， and environmental conservation is paramount. Contemporary research indicates that CPU-based scheduling solutions are constrained by suboptimal efficiency and responsiveness， particularly when managing tasks of considerable scale. Consequently， the parallel computational prowess of GPUs heralds a novel avenue for the refinement of extensive flow shop scheduling challenges. For the multi-objective No-Wait Flow shop Scheduling Problem （NWFSP）， with the concurrent objectives of minimizing both the makespan and the Total Energy Consumption （TEC）， a Mixed-Integer Linear Programming model （MILP） was formulated to delineate the problem， and a bespoke GPU-accelerated tensorized evolutionary algorithm named Tensor-GPU-NSGA-Ⅱ was introduced for problem-solving. The ingenuity of Tensor-GPU-NSGA-Ⅱ resides in its tensorized algorithm for the computation of the makespan and TEC within the NWFSP framework， as well as in converting the conventional CPU-based serial individual updating to a GPU-driven parallel population renewal process. Empirical results demonstrate that for a scenario involving 500 jobs and 20 machines， Tensor-GPU-NSGA-Ⅱ realizes an enhancement in computational efficiency by a speedup of 9 761.75 over the traditional NSGA-Ⅱ algorithm. Furthermore， this acceleration efficacy markedly surges as the population scale expands.

Probability-driven dynamic multiobjective evolutionary optimization for multi-agent cooperative scheduling

Xiaofang LIU, Jun ZHANG

2024, 44(5): 1372-1377. DOI: 10.11772/j.issn.1001-9081.2023121865

Asbtract ( )

HTML ( )

PDF (1353KB) ( )

Figures and Tables | References | Related Articles | Metrics

In multi-agent systems， there are multiple cooperative tasks that change with time and multiple conflict optimization objective functions. To build a multi-agent system， the dynamic multiobjective multi-agent cooperative scheduling problem becomes one of critical problems. To solve this problem， a probability-driven dynamic prediction strategy was proposed to utilize the probability distributions in historical environments to predict the ones in new environments， thus generating new solutions and realizing the fast response to environmental changes. In detail， an element-based representation for probability distributions was designed to represent the adaptability of elements in dynamic environments， and the probability distributions were gradually updated towards real distributions according to the best solutions found by optimization algorithms in each iteration. Taking into account continuity and relevance of environmental changes， a fusion-based prediction mechanism was built to predict the probability distributions and to provide a priori knowledge of new environments by fusing historical probability distributions when the environment changes. A new heuristic-based sampling mechanism was also proposed by combining probability distributions and heuristic information to generate new solutions for updating out-of-date populations. The proposed probability-driven dynamic prediction strategy can be inserted into any multiobjective evolutionary algorithms， resulting in probability-driven dynamic multiobjective evolutionary algorithms. Experimental results on 10 dynamic multiobjective multi-agent cooperative scheduling problem instances show that the proposed algorithms outperform the competing algorithms in terms of solution optimality and diversity， and the proposed probability-driven dynamic prediction strategy can improve the performance of multiobjective evolutionary algorithms in dynamic environments.

Novel genetic algorithm for solving chance-constrained multiple-choice Knapsack problems

Xuanfeng LI, Shengcai LIU, Ke TANG

2024, 44(5): 1378-1385. DOI: 10.11772/j.issn.1001-9081.2024010113

Asbtract ( )

HTML ( )

PDF (1793KB) ( )

Figures and Tables | References | Related Articles | Metrics

Chance-Constrained Multi-Choice Knapsack Problem （CCMCKP） is a class of NP-hard combinatorial optimization problems with important practical applications. However， there is a lack of research on the solution methods for this problem. The first framework for solving CCMCKP was proposed for this problem， and two solution methods were established based on this framework， including the dynamic programming-based method RA-DP and the genetic algorithm-based method RA-IGA. RA-DP is an exact method with optimality guarantee， but it can only solve small-scale problem instances within a time budget of 1 hour. In contrast， RA-IGA is an approximation method with better scalability. Simulation experimental results verify the performance of the proposed methods. On small-scale problem instances， both RA-DP and RA-IGA can find the optimal solutions. On the medium- and large-scale problem instances， RA-IGA exhibits significantly higher efficiency than RA-DP， always obtaining feasible solutions quickly within 1 hour. In future research on CCMCKP， RA-DP and RA-IGA can be considered as baseline methods， and the benchmark set considered in this work can also be used as a standard benchmark test set.

Hybrid optimizer combining evolutionary computation and gradient descent for constrained multi-objective optimization

Ye TIAN, Jinjin CHEN, Xingyi ZHANG

2024, 44(5): 1386-1392. DOI: 10.11772/j.issn.1001-9081.2023121798

Asbtract ( )

HTML ( )

PDF (1501KB) ( )

Figures and Tables | References | Related Articles | Metrics

Constrained Multi-Objective Evolutionary Algorithms （CMOEAs） are metaheuristics tailored for solving constrained multi-objective optimization problems. These algorithms use population-based stochastic search paradigms， striking balance between objectives and constrains on various optimization problems. However， they do not take advantage of gradient information of the functions， exhibiting slow convergence speed on complex problems. Nevertheless， the introduction of gradients is not trivial， as the calculation of the gradients of all the objectives and constraints are computationally expensive， and the conflicts between objectives and constraints make it difficult to determine the gradient directions. Therefore， an optimization algorithm combining evolutionary computation and Gradient Descent （GD） was proposed， namely CMOEA with Multiple Stages assisted by Gradients （CMOEA-MSG）. It consists of two stages： at the first stage， helper problems were constructed and either the gradients of objectives or the gradients of constraints were calculated， which were used to update solutions and drive the population to quickly converge towards feasible regions； at the second stage， the constraint-first principle was utilized to solve the original problem， so as to ensure the feasibility and diversity of the population. Compared with peer algorithms on LIR-CMOP， MW and DAS-CMOP test sets， CMOEA-MSG is verified to be more effective for solving constrained multi-objective optimization problems.

Distributed data-driven evolutionary computation for multi-constrained optimization

Fengfeng WEI, Weineng CHEN

2024, 44(5): 1393-1400. DOI: 10.11772/j.issn.1001-9081.2023121814

Asbtract ( )

HTML ( )

PDF (1005KB) ( )

Figures and Tables | References | Related Articles | Metrics

Distributed data acquisition and processing in ubiquitous computing mode have brought the demand for distributed data-driven optimization. To address the challenges such as distributed data acquisition， asynchronous constraints evaluation and incomplete information， a Distributed Data-Driven Evolutionary Algorithm （DDDEA） framework for multi-constrained optimization was constructed. A series of terminal nodes were responsible for data provision and distributed evaluation， while the server nodes were responsible for global evolutionary optimization. Based on this framework， a specific algorithm instance was implemented， the terminal nodes utilized their local data to construct a Radial Basis Function （RBF） model to assist the differential evolution of the server node. Experimental comparison with three centralized data-driven evolutionary algorithms for multi-constrained optimization on two standard test suites show that， the DDDEA achieves significant optimal results in 68.4% of test cases and has a success rate of 1.00 in finding feasible solutions in 84.2% of test cases. Therefore， the DDDEA has satisfactory global search and convergence abilities.

Short-range UAV air combat maneuver decision-making via finite tolerance pigeon-inspired optimization

Zhiqiang ZHENG, Haibin DUAN

2024, 44(5): 1401-1407. DOI: 10.11772/j.issn.1001-9081.2023121837

Asbtract ( )

HTML ( )

PDF (2642KB) ( )

Figures and Tables | References | Related Articles | Metrics

Due to the rapid change of the situation during the confrontation， the autonomous maneuver decision-making for short-range Unmanned Aerial Vehicle （UAV） air combat is difficult and complex， which is a difficult point in air combat. To address this issue， a short-range UAV air combat maneuver decision-making method based on Finite Tolerance Pigeon-Inspired Optimization （FTPIO） algorithm was proposed. Two parts were designed in the proposed method： opponent action prediction based on the maneuver library and optimization solution of maneuver control and execution time based on FTPIO algorithm. To improve the global exploration ability of the basic Pigeon-Inspired Optimization （PIO） algorithm， the finite tolerance strategy was introduced. When the pigeon individual failed to find a better solution in several iterations， its attributes would been reset to avoid falling into the local optimal trap. The optimization variables used in the proposed method were the increments of the control variables of UAV motion model， which broke the limitations of the maneuver library. Simulation and adversarial testing results with the MiniMax method， basic PIO algorithm， and Particle Swarm Optimization （PSO） algorithm show that the proposed maneuver decision-making method can effectively defeat opponents during confrontation and generate more flexible deceptive maneuver behaviors.

Evolutionary bi-level adaptive local feature selection

Lin GAO, Yu ZHOU, Tak Wu KWONG

2024, 44(5): 1408-1414. DOI: 10.11772/j.issn.1001-9081.2023121829

Asbtract ( )

HTML ( )

PDF (2984KB) ( )

Figures and Tables | References | Related Articles | Metrics

Local Feature Selection （LFS） methods partition the sample space into multiple local regions and select the optimal feature subset for each region to reflect local heterogeneous information. However， the existing LFS methods partition local regions around each sample and find the optimal feature subset， resulting in low optimization efficiency and limited applicability. To address this issue， a new evolutionary Bi-level adaptive Local Feature Selection （BiLFS） algorithm was proposed. The LFS problem was formulated as a bi-level optimization problem， with feature subsets and locally optimized regions as the decision variables. At the upper level， Non-dominated Sorting Genetic Algorithm Ⅱ was employed to find the optimal feature subsets for the selected local regions， with region purity and selected feature ratio as the objective functions. At the lower level， based on the upper-level solution， local region clustering analysis was used to obtain center samples within each region， followed by local region fusion to eliminate unnecessary regions and update the population of necessary regions. Experimental results on 11 UCI datasets demonstrate that BiLFS achieves an average classification accuracy up to 98.48%， and an average computation time down to 9.51% compare to those of non-adaptive LFS methods based on evolutionary algorithms， significantly improving computational efficiency to the level of linear programming-based LFS methods. Visual analysis of the locally optimized regions selected by the BiLFS algorithm during the iteration process indicates the stability and reliability of selecting necessary local regions.

Two-stage search-based constrained evolutionary multitasking optimization algorithm

Kaiwen ZHAO, Peng WANG, Xiangrong TONG

2024, 44(5): 1415-1422. DOI: 10.11772/j.issn.1001-9081.2023050696

Asbtract ( )

HTML ( )

PDF (1756KB) ( )

Figures and Tables | References | Related Articles | Metrics

It is crucial in solving Constrained Multi-objective Optimization Problems （CMOPs） to efficiently balance the relationship between diversity， convergence and feasibility. However， the emergence of complex constraints poses a greater challenge in solving CMOPs. Therefore， a Two-stage search-based constrained Evolutionary Multitasking optimization Algorithm （TEMA） was proposed to achieve the balance between diversity， convergence and feasibility by completing the two cooperatively evolutionary tasks together. At first， the whole evolutionary process was divided into two stages， exploration stage and utilization stage， which were dedicated to enhance the extensive exploration capability and efficient search capability of the algorithm in the target space， respectively. Second， a dynamic constraint handling strategy was designed to balance the proportions of the feasible solutions in the population to enhance the exploration capability of the algorithm in the feasible region. Then， a backward search strategy was proposed to utilize the information contained in the unconstrained Pareto front to guide the algorithm to converge quickly to the constrained Pareto front. Finally， comparative experiments were performed on 23 problems in two benchmark test suites to verify the performance of the proposed algorithm. Experimental results indicate that the proposed algorithm achieves optimal IGD （Inverted Generational Distance） and HV （HyperVolume） values on 14 and 13 test problems， respectively， which reflects its significant advantages.

Missing value imputation algorithm using dual discriminator based on conditional generative adversarial imputation network

Jia SU, Hong YU

2024, 44(5): 1423-1427. DOI: 10.11772/j.issn.1001-9081.2023050697

Asbtract ( )

HTML ( )

PDF (872KB) ( )

Figures and Tables | References | Related Articles | Metrics

Various factors in the application may cause data loss and affect the analysis of subsequent tasks. Therefore， the imputation of missing data values in data sets is particularly important. Moreover， the accuracy of data imputation can significantly impact the analysis of subsequent tasks. Incorrect imputation data may introduce more severe bias in the analysis compared to missing data. A new missing value imputation algorithm named DDC-GAIN （Dual Discriminator based on Conditional Generation Adversarial Imputation Network） was introduced based on Conditional Generative Adversarial Imputation Network （C-GAIN） and dual discriminator， in which the primary discriminator was assisted by the auxiliary discriminator in assessing the validity of predicted values. In other words， the authenticity of the generated sample was judged by global sample information and the relationship between features was emphasized to estimate predicted values. Experimental results on four datasets show that， compared with five classical imputation algorithms， DDC-GAIN algorithm achieves the lowest Root Mean Square Error （RMSE） under the same conditions and with large sample size； when the missing rate is 15% on the Default credit card dataset， the RMSE of DDC-GAIN is 28.99% lower than that of the optimal comparison algorithm C-GAIN. This indicates that it is effective to utilize the auxiliary discriminator to support the primary discriminator in learning feature relationships.

Oversampling algorithm based on synthesizing minority class samples using relationship between features

Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN

2024, 44(5): 1428-1436. DOI: 10.11772/j.issn.1001-9081.2023050803

Asbtract ( )

HTML ( )

PDF (1836KB) ( )

Figures and Tables | References | Related Articles | Metrics

The phenomenon of data imbalance is very common in real life. In order to improve the overall classification accuracy， classifiers often misclassify minority class at the cost. But in real life， the consequences of misclassifying minority class may be very serious. Considering that the traditional resampling algorithm ignores the relationship between the spatial distribution of data and the sample features of minority class， a new sampling algorithm SABRF （Sampling Algorithm Based on Relationship between Features） was proposed to generate a new sample set. The key distinguishing features of imbalanced dataset were preserved through Pareto-based multi-objective feature selection， and the relationships among key features of minority class samples were captured through XGBoost （eXtreme Gradient Boosting） regression model. In addition， considering the quality of newly generated samples， a new sample selection strategy was proposed to retain better samples. Experiments were conducted on six publicly available UCI datasets and one real post-orthopedic thrombus dataset. Experimental results show that the proposed algorithm has good performance on Area Under receiver operating characteristic Curve （AUC）， F1 score （F1_score） and Geometric Mean （G_mean）. In addition， when using the new samples selected by the sample selection strategy based on multi-index evaluation for classification， the classification result of imbalanced data is also the best， which verifies the effectiveness of the sample selection strategy.

Few-shot object detection via fusing multi-scale and attention mechanism

Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN

2024, 44(5): 1437-1444. DOI: 10.11772/j.issn.1001-9081.2023050699

Asbtract ( )

HTML ( )

PDF (2781KB) ( )

Figures and Tables | References | Related Articles | Metrics

The existing two-stage few-shot object detection methods based on fine-tuning are not sensitive to the features of new classes， which will cause misjudgment of new classes into base classes with high similarity to them， thus affecting the detection performance of the model. To address the above issue， a few-shot object detection algorithm that incorporates multi-scale and attention mechanism was proposed， namely MA-FSOD （Few-Shot Object Detection via fusing Multi-scale and Attention mechanism）. Firstly， grouped convolutions and large convolution kernels were used to extract more class-discriminative features in the backbone network， and Convolutional Block Attention Module （CBAM） was added to achieve adaptive feature augmentation. Then， a modified pyramid network was used to achieve multi-scale feature fusion， which enables Region Proposal Network （RPN） to accurately find Regions of Interest （RoI） and provide more abundant high-quality positive samples from multiple scales to the classification head. Finally， the cosine classification head was used for classification in the fine-tuning stage to reduce the intra-class variance. Compared with the Few-Shot object detection via Contrastive proposal Encoding （FSCE） algorithm on PASCAL-VOC 2007/2012 dataset， the MA-FSOD algorithm improved AP₅₀ for new classes by 5.6 percentage points； and on the more challenging MSCOCO dataset， compared with Meta-Faster-RCNN， the APs corresponding to 10-shot and 30-shot were improved by 0.1 percentage points and 1.6 percentage points， respectively. Experimental results show that MA-FSOD can more effectively alleviate the misclassification problem and achieve higher accuracy in few-shot object detection than some mainstream few-shot object detection algorithms.

Sleep stage classification model by meta transfer learning in few-shot scenarios

Wangjun SHI, Jing WANG, Xiaojun NING, Youfang LIN

2024, 44(5): 1445-1451. DOI: 10.11772/j.issn.1001-9081.2023050747

Asbtract ( )

HTML ( )

PDF (1546KB) ( )

Figures and Tables | References | Related Articles | Metrics

Sleep disorders are receiving more and more attention， and the accuracy and generalization of automated sleep stage classification are facing more and more challenges. However， due to the very limited human sleep data publicly available， the sleep stage classification task is actually similar to a few-shot scenario. And due to the widespread individual differences in sleep features， it is difficult for existing machine learning models to guarantee accurate classification of data from new subjects who have not participated in the training. In order to achieve accurate stage classification of new subjects’ sleep data， existing studies usually require additional collection and labeling of large amounts of data from new subjects and personalized fine-tuning of the model. Based on this， a new sleep stage classification model， Meta Transfer Sleep Learner （MTSL）， was proposed. Inspired by the idea of Scale & Shift based weight transfer strategy in transfer learning， a new meta transfer learning framework was designed. The training phase included two steps： pre-training and meta transfer training， and many meta-tasks were used for meta transfer training. In the test phase， the model could be easily adapted to the feature distribution of new subjects by fine-tuning with only a few new subjects’ data， which greatly reduced the cost of accurate sleep stage classification for new subjects. Experimental results on two public sleep datasets show that MTSL model can achieve higher accuracy and F1-score under both single-dataset and cross-dataset conditions. This indicates that MTSL is more suitable for sleep stage classification tasks in few-shot scenarios.

Weakly supervised video anomaly detection based on triplet-centered guidance

Zimeng ZHU, Zhixin LI, Zhan HUAN, Ying CHEN, Jiuzhen LIANG

2024, 44(5): 1452-1457. DOI: 10.11772/j.issn.1001-9081.2023050748

Asbtract ( )

HTML ( )

PDF (2177KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of the complex diversity and short time persistence of surveillance video anomaly， a weakly supervised video abnormal detection method was introduced to detect anomalies by only using video-level tags， and an anomaly regression network VLARNet based on Variational AutoEncoder （VAE） and Long Short-Term Memory （LSTM） network was proposed as an anomaly detection framework to effectively capture the temporal dependencies in time series data， eliminate redundant information and retain key information in the data. Anomaly detection was considered as a regression problem by VLARNet. To learn detection features， a Triplet-Centered Loss for Anomaly Score Regression （TCLASR） was designed and combined with Dynamic Multiple Instance Learning loss （DMIL） to further improve the discrimination ability of features. The DMIL widened the inter-class distance between abnormal instances and normal instances， but it also widened the intra-class distance. The TCLASR made the distances between the instances in the same class and the center closer and the distances between instances in different classes and the center farther. The proposed VLARNet was comprehensively tested on ShanghaiTech and CUHK Avenue datasets. Experimental results show that VLARNet can effectively utilize various information in video data， achieving Area Under receiver operating characteristic Curve （AUC） of 94.64% and 93.00% respectively on the two datasets， which is significantly better than those of the comparison algorithms.

EraseMTS： iterative active multivariable time series anomaly detection algorithm based on margin anomaly candidate set

Fan MENG, Qunli YANG, Jing HUO, Xinkuan WANG

2024, 44(5): 1458-1463. DOI: 10.11772/j.issn.1001-9081.2023050726

Asbtract ( )

HTML ( )

PDF (1234KB) ( )

Figures and Tables | References | Related Articles | Metrics

Unsupervised anomaly detection methods for Multivariable Time Series （MTS） have attracted wide attention due to their low labeling costs. However， traditional unsupervised anomaly detection methods are often based on two assumptions： 1） Independent and Identical Distribution （IID） assumption， i.e.， there is no dependency between samples and attributes of MTS. 2） High-purity starting assumption， i.e.， it is assumed that a completely normal time series should be used for training. The above assumptions are often difficult to satisfy in practical scenarios. To address this problem， an iterative active MTS anomaly detection algorithm based on margin anomaly candidate set （called EraseMTS） was proposed. Firstly， a multi-granularity representation learning method was utilized to capture the dependencies within subsequences and between subsequences， and then represent the original MTS. Secondly， a selection strategy was proposed to interact with experts based on margin anomaly candidate set， which was determined by the subsequence anomaly score and the uncertainty of its anomaly degree. Finally， an iterative subsequence weight update mechanism was designed to integrate the abnormal feedback information into the training process of the unsupervised anomaly detection model. The performance of the initial training model was continuously optimized through iteration. The proposed algorithm was verified in detection performance， scalability， and stability respectively on four datasets in UCR time series archive and one synthetic dataset. Experimental results show that the proposed algorithm can run effectively and stably.

Node classification algorithm fusing 2-connected motif-structure information

Wenping ZHENG, Huilin GE, Meilin LIU, Gui YANG

2024, 44(5): 1464-1470. DOI: 10.11772/j.issn.1001-9081.2023050846

Asbtract ( )

HTML ( )

PDF (1734KB) ( )

Figures and Tables | References | Related Articles | Metrics

Node representation learning has been widely applied in machine learning tasks， such as node classification， clustering and link prediction， since it can encode graph structure data information into low-dimensional potential space. In complex networks， nodes are interacted through not only low-order interactions， but also higher-order interactions formed by special connection modes. The higher-order interactions of a complex network are also called motifs. A node classification algorithm Fusing 2-connected Motif-structure Information （FMI） was proposed to use motif information among nodes to obtain node representation for node classification tasks. Firstly， the 2-connected motifs in the network were counted. A measure index of node importance， named motif-ratio， was proposed by using the motif information in the node； and a sampling probability was calculated according to the motif-ratio to carry out neighborhood sampling. A weighted auxiliary graph was constructed to fuse the low-order relations and the high-order relations of network nodes to aggregate neighborhoods weightedly. The node classification was performed on 5 datasets， Cora， Citeseer， Pubmed， Wiki and DBLP. By comparing with 5 classical baseline algorithms， the proposed algorithm FMI shows better performance on Accuracy， F1-score and other indicators.

Point cloud classification network based on node structure

Wenshuo GAO, Xiaoyun CHEN

2024, 44(5): 1471-1478. DOI: 10.11772/j.issn.1001-9081.2023050802

Asbtract ( )

HTML ( )

PDF (2562KB) ( )

Figures and Tables | References | Related Articles | Metrics

The non-structured and non-uniform distribution of point cloud data poses significant challenges for feature representation and classification tasks. To extract the three-dimensional structural features of point cloud objects， existing methods often employ complex local feature extraction structures to construct hierarchical networks， resulting in a complex feature extraction network that mainly focuses on the local structures of the point cloud objects. To better extract features from unevenly distributed point cloud objects， a Node structure Network （NsNet） with sample point convolution density adaptive weighting was proposed. The convolutional network adaptively weighted sample points based on Gaussian density to differentiate the density differences among sampling points， thereby better characterizing the overall structure of objects. Additionally， the network structure was simplified by incorporating spherical coordinates to reduce model complexity. Experimental results on three public datasets demonstrate that， NsNet based on adaptive density weighting improves the Overall Accuracy （OA） by 9.1 and 1.3 percentage points respectively compared with PointNet++ and PointMLP， andreduces the number of parameters by 4.6×10⁶ compared to PointMLP. NsNet can effectively address the problem of information loss caused by uneven distribution of point clouds， improve the classification accuracy and reduce the model complexity.

Robust learning method by reweighting examples with negative learning

Boshi ZOU, Ming YANG, Chenchen ZONG, Mingkun XIE, Shengjun HUANG

2024, 44(5): 1479-1484. DOI: 10.11772/j.issn.1001-9081.2023050880

Asbtract ( )

HTML ( )

PDF (1241KB) ( )

Figures and Tables | References | Related Articles | Metrics

Noisy label learning methods can effectively use data containing noisy labels to train models and significantly reduce the labeling cost of large-scale datasets. Most existing noisy label learning methods usually assume that the number of each class in the dataset is balanced， but the data in many real-world scenarios tend to have noisy labels， and long-tailed distributions often present in the dataset simultaneously， making it difficult for existing methods to select clean examples from noisy examples in the tail class according to traning loss or confidence. To solve noisy long-tailed learning problem， a ReWeighting examples with Negative Learning （NLRW） method was proposed， by which examples were reweighted adaptively based on negative learning. Specifically， at each training epoch， the weights of examples were calculated according to the output distributions of the model to head classes and tail classes. The weights of clean examples were close to one while the weights of noisy examples were close to zero. To ensure accurate estimation of weights， negative learning and cross entropy loss were combined to train the model with a weighted loss function. Experimental results on CIFAR-10 and CIFAR-100 datasets with various imbalance rates and noise rates show that， compared with the optimal baseline model TBSS （Two stage Bi-dimensional Sample Selection） for noisy long-tail classification， NLRW method improves the average accuracy by 4.79% and 3.46%， respectively.

PAGCL： positive augmentation graph contrastive learning recommendation method without negative sampling

Jiong WANG, Taotao TANG, Caiyan JIA

2024, 44(5): 1485-1492. DOI: 10.11772/j.issn.1001-9081.2023050756

Asbtract ( )

HTML ( )

PDF (2404KB) ( )

Figures and Tables | References | Related Articles | Metrics

Contrastive Learning （CL） has been widely used for recommendation because of its ability to extract supervised signals contained in data itself. The recent study shows that the success of CL in recommendation depends on the uniformity of node distribution brought by comparative loss — Infomation Noise Contrastive Estimation （InfoNCE） loss. In addition， the other study proves that Bayesian Personalized Ranking （BPR） loss is beneficial to alignment and uniformity， which contribute to higher recommendation performance. Since the CL loss can bring stronger uniformity than the negative term of BPR， the necessity of the negative term of BPR in CL framework has aroused suspicion. Therefore， this study experimentally disclosed that the negative term of BPR is unnecessary in CL framework for recommendation. Based on this observation， a joint optimization loss without negative sampling was proposed， which could be applied to classical CL-based methods and achieve the same or higher performance. Besides， unlike studies which focus on improving uniformity， a novel Positive Augmentation Graph Contrastive Learning method （PAGCL） was presented， which used random positive samples for perturbation at representation level to further strengthen alignment. Experimental results on several benchmark datasets show that the proposed method is superior to SOTA （State-Of-The-Art） methods like Self-supervised Graph Learning （SGL） and Simple Graph Contrastive Learning （SimGCL） on recall and Normalized Discounted Cumulative Gain （NDCG）. The method’s improvement over the base model Light Graph Convolutional Network （LightGCN） can reach up to 17.6% at NDCG@20.

Task offloading method based on dynamic service cache assistance

Junna ZHANG, Xinxin WANG, Tianze LI, Xiaoyan ZHAO, Peiyan YUAN

2024, 44(5): 1493-1500. DOI: 10.11772/j.issn.1001-9081.2023050831

Asbtract ( )

HTML ( )

PDF (2414KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of user experience quality degradation due to the lack of comprehensive consideration of the diversity and dynamics of user service requests in the joint optimization of service caching and task offloading， a task offloading method based on dynamic service cache assistance was proposed. Firstly， to address the problem of the large action spaces for edge servers performing caching service， the actions were redefined and the optimal set of actions was selected to improve the efficiency of algorithm training. Secondly， an improved multi-agent Q-Learning algorithm was designed to learn an optimal service caching policy. Thirdly， the task offloading problem was converted into a convex optimization problem， and the optimal solution was obtained using a convex optimization tool. Finally， the optimal computational resource allocation policy was found using the Lagrangian dual method. To verify the effectiveness of the proposed method， extensive experiments were conducted based on a real dataset. Experimental results show that the response time of the proposed method is reduced by 8.5%， 11.8% and 12.6%， respectively， and the average quality of experience is improved by 1.5%， 2.7% and 4.3%， respectively， compared with Q-Learning， Double Deep Q Network （D2QN） and Multi-Agent Deep Deterministic Policy Gradient （MADDPG） method.

Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning

Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN

2024, 44(5): 1501-1510. DOI: 10.11772/j.issn.1001-9081.2023050788

Asbtract ( )

HTML ( )

PDF (2661KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the rapid development of Internet of Vehicles （IoV）， smart connected vehicles generate a large number of latency-sensitive and computation-intensive tasks， and limited vehicle computing resources and traditional cloud service modes cannot meet the needs of in-vehicle users. Mobile Edge Computing （MEC） provides an effective paradigm for solving task offloading of massive data. However， when considering multi-task and multi-user scenarios， the complexity of task offloading scenarios in IoV is high due to the real-time and dynamic changes in vehicle locations， task types and vehicle density， and the offloading process is prone to problems such as unbalanced edge resource allocation， excessive communication cost overhead and slow algorithm convergence. To solve the above problems， cooperative task offloading strategy of multiple edge servers in multi-task and multi-user mobile scenarios in IoV was focused on. First， a three-layer heterogeneous network model for multi-edge collaborative processing was proposed， and dynamic collaborative clusters were introduced for the changing environment in IoV to transform the offloading problem into a joint optimization problem of delay and energy consumption. Then， the problem was divided into two subproblems of offloading decision and resource allocation， where the resource allocation problem was further split into resource allocation for edge servers and transmission bandwidth， and the two subproblems were solved based on convex optimization theory. In order to find the optimal offloading decision set， a Multi-edge Collaborative Deep Deterministic Policy Gradient （MC-DDPG） algorithm that can handle continuous problems in collaborative clusters was proposed， based on which an Asynchronous MC-DDPG （AMC-DDPG） algorithm was designed. The training parameters in collaborative clusters were asynchronously uploaded to the cloud for global update， and then the updated results were returned to each collaborative cluster to improve the convergence speed. Simulation results show that the AMC-DDPG algorithm improves the convergence speed by at least 30% over the DDPG algorithm and achieves better results in terms of reward and total cost.

Driver behavior recognition based on dual-path spatiotemporal network

Zhiyuan XI, Chao TANG, Anyang TONG, Wenjian WANG

2024, 44(5): 1511-1519. DOI: 10.11772/j.issn.1001-9081.2023050800

Asbtract ( )

HTML ( )

PDF (3642KB) ( )

Figures and Tables | References | Related Articles | Metrics

Dangerous driving behavior of drivers is one of the main causes of vicious traffic accidents， so identifying driver’s behavior is of great significance for engineering applications. Currently， the mainstream vision-based detection methods are to study the local spatiotemporal features of driver behavior， and less research is done on global spatial features and long-term temporal correlation features， which to a certain extent cannot be combined with the scene context information to identify dangerous driving behaviors. To solve the above problems， a driver behavior recognition method based on a dual-path spatiotemporal network was proposed， which integrated the advantages of different spatiotemporal pathways to improve the richness of behavioral features. Firstly， an improved Two-Stream convolutional Network （TSN） was used to learn the spatiotemporal information for characterization while reducing the sparsity of extracted features. Secondly， a Transformer-based serial spatiotemporal network was constructed to supplement the long-term temporal correlation information. Finally， a fusion decision was made using a dual-path spatiotemporal network to enhance the robustness of the model. Experimental results show that the proposed method achieves recognition accuracies of 99.85%， 99.94% and 98.77% on three publicly available datasets： a driver fatigue detection dataset YawDD， a driver distraction detection dataset SF-DDDD （State-Farm Distracted Driver Detection Dataset）， and a the latest driver behavior recognition dataset SynDD1， respectively； especially on SynDD1， the recognition accuracy is improved by 1.64 percentage points compared to MoviNet-A0， a recognition network by motion. Ablation experimental results confirm that the proposed method has high recognition accuracy of driver behavior.

PIPNet： lightweight asphalt pavement crack image segmentation network

Jun FENG, Jiankang BI, Yiru HUO, Jiakuan LI

2024, 44(5): 1520-1526. DOI: 10.11772/j.issn.1001-9081.2023050911

Asbtract ( )

HTML ( )

PDF (3158KB) ( )

Figures and Tables | References | Related Articles | Metrics

Crack segmentation is an important prerequisite for evaluating the damage degree of pavement diseases. In order to balance the effectiveness and real-time of deep neural network segmentation， a lightweight asphalt pavement crack segmentation neural network based on U?Net encoder-decoder structure was proposed， namely PIPNet （Parallel dilated convolution of Inverted Pyramid Network）. The encoding part was an inverted pyramid structure. Multi-branch parallel dilated convolution module with different dilatation rates was proposed to extract multi-scale information from the top， middle and bottom features and reduce model complexity， which combined deep separable convolutions with ordinary convolutions and gradually reduced the number of parallel convolutions. Drawing on the characteristics of GhostNet， an inverse residual lightweight module was designed， which was embedded with parallel dual pooling attention. Test results on GAPs384 dataset show that， compared with ResNet50 encoding method， PIPNet has mIoU （mean Intersection over Union） 1.10 percentage points higher with only about one-sixth of parameter quantity and MFLOPs （Million FLOating Point operations）， and its mIoU is 4.14 and 9.95 percentage points higher than those of lightweight GhostNet and SegNet， respectively. Experimental results show that PIPNet has high crack segmentation performance while reducing the model complexity， and has good adaptability to segmentation of different pavement crack images.

Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix

Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG

2024, 44(5): 1527-1538. DOI: 10.11772/j.issn.1001-9081.2023050727

Asbtract ( )

HTML ( )

PDF (6953KB) ( )

Figures and Tables | References | Related Articles | Metrics

Clustering is to divide a dataset into multiple clusters based on the similarity between samples. Most existing clustering methods face two challenges. On the one hand， when defining the similarity between samples， the spatial distribution structure of the samples is often not considered， making it difficult to construct a stable similarity matrix. On the other hand， the sample graph structure constructed by graph clustering is too complex and has high computational costs. To solve these two problems， a Multi-order Nearest Neighbor Graph Clustering algorithm by fusing transition probability matrix （MNNGC） was proposed. Firstly， the nearest neighbor relationship and spatial distribution structure of samples were comprehensively considered， the similarity defined by shared nearest neighbor was weighted for densification， and the densification affinity matrix between nodes was obtained. Secondly， by utilizing multi-order probability transition between nodes， the correlation degrees of non-adjacent nodes were predicted， and a stable inter-node affinity matrix was obtained by fusing the multi-order transition probability matrix. Then， to further enhance the local structure of the graph， the multi-order nearest neighbor graph of nodes was reconstructed， and hierarchically clustered. Finally， the edge node allocation strategy was optimized. Positioning experimental results show that MNNGC achieves the highest Accuracy （Acc） among comparison clustering algorithms on all the synthetic datasets and 8 UCI datasets. The Acc， Adjusted Mutual Information （AMI）， Adjusted Rand Index （ARI） and Fowlkes and Mallows Index （FMI） of MNNGC algorithm are improved by 38.6， 27.2， 45.4 and 35.1 percentage points， respectively， compared with Local Density Peaks-based Spectral Clustering （LDP-SC） algorithm.

CBAM-CGRU-SVM based malware detection method for Android

Min SUN, Qian CHENG, Xining DING

2024, 44(5): 1539-1545. DOI: 10.11772/j.issn.1001-9081.2023050708

Asbtract ( )

HTML ( )

PDF (2825KB) ( )

Figures and Tables | References | Related Articles | Metrics

With the increasing variety and quantity of Android malware， it becomes increasingly important to detect malware to protect system security and user privacy. To address the problem of low classification accuracy of traditional malware detection models， A malware detection model for Android named CBAM-CGRU-SVM was proposed based on Convolutional Neural Network （CNN）， Gated Recurrent Unit （GRU）， and Support Vector Machine （SVM）. In this model， more key features of malware were learned by adding a Convolutional Block Attention Module （CBAM） to the convolutional neural network， and GRUs were employed to further extract features. In order to solve the problem of insufficient generalization ability of the model when performing image classification， SVM was used instead of softmax activation function as the classification function of the model. Experiments were conducted on Malimg public dataset， in which the malware data was transformed to images as model input. Experimental results show that the classification accuracy of CBAM-CGRU-SVM model reaches 94.73%， which can effectively classify malware families.

Location privacy protection algorithm based on trajectory perturbation and road network matching

Peiqian LIU, Shuilian WANG, Zihao SHEN, Hui WANG

2024, 44(5): 1546-1554. DOI: 10.11772/j.issn.1001-9081.2023050680

Asbtract ( )

HTML ( )

PDF (4105KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem of low data availability caused by existing disturbance mechanisms that do not consider the semantic relationship of location points， a Trajectory Location Privacy protection Mechanism based on Differential Privacy was proposed， namely DP-TLPM. Firstly， the sliding windows were used to extract trajectory dwell points to generate the fuzzy regions， and the regions were sampled using exponential and Laplacian mechanisms. Secondly， a road network matching algorithm was proposed to eliminate possible semantic free location points in the sampled points， and the trajectory was segmented and iteratively matched by using Error Ellipse Matching （EEM）. Finally， a disturbance trajectory was formed based on the matched location points， which was sent to the server by the user. The mechanism was evaluated comprehensively by confusion quality and Root Mean Square Error （RMSE）. Compared with the GeoInd algorithm， the data quality loss of the DP-TLPM is reduced by 24% and the confusion quality of the trajectories is improved by 52%， verifying the effectiveness of DP-TLPM in terms of both privacy protection strength and data quality.

User cluster partitioning method based on weighted fuzzy clustering in ground-air collaboration scenarios

Tianyu HUANG, Yuanxing LI, Hao CHEN, Zijia GUO, Mingjun WEI

2024, 44(5): 1555-1561. DOI: 10.11772/j.issn.1001-9081.2023050643

Asbtract ( )

HTML ( )

PDF (1670KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the user cluster partitioning issue in the deployment strategy of Unmanned Aerial Vehicle （UAV） base stations for auxiliary communication in emergency scenarios， a feature-weighted fuzzy clustering algorithm， named Improved FCM， was proposed by considering both the performance of UAV base stations and user experience. Firstly， to tackle the problem of high computational complexity and convergence difficulty in the partitioning process of user clusters under random distribution conditions， a feature-weighted node data projection algorithm based on distance weighting was introduced according to the performance constraints of signal coverage range and maximum number of served users for each UAV base station. Secondly， to address the effectiveness of user partitioning when the same user falls within the effective ranges of multiple clusters， as well as the maximization of UAV base station resource utilization， a value-weighted algorithm based on user location and UAV base station load balancing was proposed. Experimental results demonstrate that the proposed methods meet the service performance constraints of UAV base stations. Additionally， the deployment scheme based on the proposed methods effectively improves the average load rate and coverage ratio of the system， reaching 0.774 and 0.026 3 respectively， which are higher than those of GFA （Geometric Fractal Analysis）， Sp-C （Spectral Clustering）， etc.

DDPG-based resource allocation in D2D communication-empowered cellular network

Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE

2024, 44(5): 1562-1569. DOI: 10.11772/j.issn.1001-9081.2023050612

Asbtract ( )

HTML ( )

PDF (2146KB) ( )

Figures and Tables | References | Related Articles | Metrics

To deal with the co-channel interference in Device-to-Device （D2D） communication-empowered cellular networks， the sum rate of D2D links was maximized through joint channel allocation and power control while satisfying the power constraints and the Quality-of-Service （QoS） requirements of cellular links. In order to efficiently solve the mixed-integer non-convex programming problem corresponding to the above resource allocation， the original problem was transformed into a Markov decision process， and a Deep Deterministic Policy Gradient （DDPG） algorithm-based mechanism was proposed. Through offline training， the mapping relationship from the channel state information to the optimal resource allocation policy was directly built up without solving any optimization problems， so it could be deployed in an online fashion. Simulation results show that compared with the exhausting search-based mechanism， the proposed mechanism reduces the computation time by 4 orders of magnitude （99.51%） at the cost of only 9.726% performance loss.

3D shape reconstruction with spatial correlation based on spatio-temporal attention

Yanxin GE, Tao YAN, Jiangfeng ZHANG, Xiaoying GUO, Bin CHEN

2024, 44(5): 1570-1578. DOI: 10.11772/j.issn.1001-9081.2023050651

Asbtract ( )

HTML ( )

PDF (2607KB) ( )

Figures and Tables | References | Related Articles | Metrics

Focused shape restoration realizes 3D shape reconstruction by modeling the potential relationship between scene depth and defocus blur. However， the existing 3D shape reconstruction network cannot effectively utilize the sequential correlation of image sequences for representation learning. Therefore， a depth network framework based on spatial correlation features of multi-depth image sequences， namely 3D Spatial Correlation Horizon Analysis Model （3D SCHAM）， was proposed for 3D shape reconstruction， by which not only the edge features could be accurately captured from the focus region to the defocus region in a single image frame， but also the spatial dependence features between different image frames could be utilized effectively. Firstly， the temporal continuous model for 3D shape reconstruction was constructed by constructing a network with composite extension of depth， width and receptive field to determine the single point depth results. Secondly， an attention module based on spatial correlation was introduced to fully learn the spatial dependence relationships of “adjacency” and “distance” between frames. In addition， residual-reversal bottleneck was used for resampling to maintain semantic richness across scales. Experimental results on DDFF 12-Scene real scene dataset show that compared with DfFintheWild model， the accuracy of 3D SCHAM model at three thresholds $1.25,1 . 252, 1.253$ is improved by 15.34%， 3.62% and 0.86% respectively， verifying the robustness of 3D SCHAM in real scenes.

Image super-resolution reconstruction based on residual attention network with receptive field expansion

Lin GUO, Kunhu LIU, Chenyang MA, Youxue LAI, Yingfen XU

2024, 44(5): 1579-1587. DOI: 10.11772/j.issn.1001-9081.2023050689

Asbtract ( )

HTML ( )

PDF (3874KB) ( )

Figures and Tables | References | Related Articles | Metrics

To solve the problems of insufficient utilization of residual features and loss of details in existing residual networks， a deep neural network model combining the two-layer structure of residual aggregation and dual-attention mechanism with receptive field expansion， was proposed for Single Image Super-Resolution （SISR） reconstruction. In this model， a two-layer nested network structure of residual aggregation was constructed through skip connections， to agglomerate and fuse hierarchically the residual information extracted by each layer of the network， thereby reducing the loss of residual information containing image details. Meanwhile， a multi-scale receptive field expansion module was designed to capture a larger range of context-dependent information at different scales for the effective extraction of deep residual features； and a space-channel dual attention mechanism was introduced to enhance the discriminative learning ability of the residual network， thus improving the quality of reconstructed images. Quantitative and qualitative assessments were performed on benchmark datasets Set5， Set14， B100 and Urban100 for comparison with the mainstream methods. The objective evaluation results indicate that the proposed method outperforms the comparative methods on all four datasets； compared with the classical SRCNN （Super-Resolution using Convolutional Neural Network） model and second best performing comparison model ISRN （Iterative Super-Resolution Network）， the proposed model improves the average values of Peak Signal-to-Noise Ratio （PSNR） by 1.91， 1.71， 1.61 dB and 0.06， 0.04， 0.04 dB， respectively， at the magnification of 2， 3 and 4. Visual effects show that the proposed model reconstructs clearer image details and textures.

Image super-resolution network based on global dependency Transformer

Zihan LIU, Dengwen ZHOU, Yukai LIU

2024, 44(5): 1588-1596. DOI: 10.11772/j.issn.1001-9081.2023050636

Asbtract ( )

HTML ( )

PDF (2858KB) ( )

Figures and Tables | References | Related Articles | Metrics

At present， the image super-resolution networks based on deep learning are mainly implemented by convolution. Compared with the traditional Convolutional Neural Network （CNN）， the main advantage of Transformer in the image super-resolution task is its long-distance dependency modeling ability. However， most Transformer-based image super-resolution models cannot establish global dependencies with small parameters and few network layers， which limits the performance of the model. In order to establish global dependencies in super-resolution network， an image Super-Resolution network based on Global Dependency Transformer （GDTSR） was proposed. Its main component was the Residual Square Axial Window Block （RSAWB）， and in Transformer residual layer， axial window and self-attention were used to make each pixel globally dependent on the entire feature map. In addition， the super-resolution image reconstruction modules of most current image super-resolution models are composed of convolutions. In order to dynamically integrate the extracted feature information， Transformer and convolution were combined to jointly reconstruct super-resolution images. Experimental results show that the Peak Signal-to-Noise Ratio （PSNR） and Structural Similarity Index （SSIM） of GDTSR on five standard test sets， including Set5， Set14， B100， Urban100 and Manga109， are optimal for three multiples （ $× 2$ ， $× 3$ ， $× 4$ ）， and on large-scale datasets Urban100 and Manga109， the performance improvement is especially obvious.

Self-supervised image registration algorithm based on multi-feature fusion

Guijin HAN, Xinyuan ZHANG, Wentao ZHANG, Ya HUANG

2024, 44(5): 1597-1604. DOI: 10.11772/j.issn.1001-9081.2023050692

Asbtract ( )

HTML ( )

PDF (2617KB) ( )

Figures and Tables | References | Related Articles | Metrics

To ensure that extracted features contain rich information， current deep learning-based image registration algorithms usually employ deep convolutional neural networks， which have high computational complexity and low discrimination of similar feature points. To address the above issues， a Self-supervised Image Registration Algorithm based on Multi-Feature Fusion （SIRA-MFF） was proposed. First， shallow convolutional neural networks were used to extract image features and reduce the computational complexity. Moreover， the problem of single feature information in shallow networks was remedied by adding feature point direction descriptors to the feature extraction layer. Second， an embedding and interaction layer was added after the feature extraction layer to enlarge the receptive field of feature points， by which local and global information of feature points was fused to improve the discrimination of similar feature points. Finally， the feature matching layer was optimized to obtain the best matching scheme. A cross-entropy based loss function was also designed for model training. The SIRA-MFF achieved the Average Matching Accuracy （AMA） of 95.18% and 93.26% on the two test sets generated from the ILSVRC2012 dataset， which was better than comparison algorithms. In the IMC-PT-SparseGM-50 test set， the SIRA-MFF achieved the AMA of 89.69%， which was also better than comparison algorithms； and compared to ResMtch algorithm， SIRA-MFF decreased the operation time of a single image by 49.45%. Experimental results show that SIRA-MFF has higher accurate and stronger robust.

Real-time object detection algorithm for complex construction environments

Xiaogang SONG, Dongdong ZHANG, Pengfei ZHANG, Li LIANG, Xinhong HEI

2024, 44(5): 1605-1612. DOI: 10.11772/j.issn.1001-9081.2023050687

Asbtract ( )

HTML ( )

PDF (3015KB) ( )

Figures and Tables | References | Related Articles | Metrics

A real-time object detection algorithm YOLO-C for complex construction environment was proposed for the problems of cluttered environment， obscured objects， large object scale range， unbalanced positive and negative samples， and insufficient real-time of existing detection algorithms， which commonly exist in construction environment. The extracted low-level features were fused with the high-level features to enhance the global sensing capability of the network， and a small object detection layer was designed to improve the detection accuracy of the algorithm for objects of different scales. A Channel-Spatial Attention （CSA） module was designed to enhance the object features and suppress the background features. In the loss function part， VariFocal Loss was used to calculate the classification loss to solve the problem of positive and negative sample imbalance. GhostConv was used as the basic convolutional block to construct the GCSP （Ghost Cross Stage Partial） structure to reduce the number of parameters and the amount of computation. For complex construction environments， a concrete construction site object detection dataset was constructed， and comparison experiments for various algorithms were conducted on the constructed dataset. Experimental results demonstrate that the YOLO?C has higher detection accuracy and smaller parameters， making it more suitable for object detection tasks in complex construction environments.

Embedded road crack detection algorithm based on improved YOLOv8

Huantong GENG, Zhenyu LIU, Jun JIANG, Zichen FAN, Jiaxing LI

2024, 44(5): 1613-1618. DOI: 10.11772/j.issn.1001-9081.2023050635

Asbtract ( )

HTML ( )

PDF (2002KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deploying the YOLOv8L model on edge devices for road crack detection can achieve high accuracy， but it is difficult to guarantee real-time detection. To solve this problem， a target detection algorithm based on the improved YOLOv8 model that can be deployed on the edge computing device Jetson AGX Xavier was proposed. First， the Faster Block structure was designed using partial convolution to replace the Bottleneck structure in the YOLOv8 C2f module， and the improved C2f module was recorded as C2f-Faster； second， an SE （Squeeze-and-Excitation） channel attention layer was connected after each C2f-Faster module in the YOLOv8 backbone network to further improve the detection accuracy. Experimental results on the open source road damage dataset RDD20 （Road Damage Detection 20） show that the average F1 score of the proposed method is 0.573， the number of detection Frames Per Second （FPS） is 47， and the model size is 55.5 MB. Compared with the SOTA （State-Of-The-Art） model of GRDDC2020 （Global Road Damage Detection Challenge 2020）， the F1 score is increased by 0.8 percentage points， the FPS is increased by 291.7%， and the model size is reduced by 41.8%， which realizes the real-time and accurate detection of road cracks on edge devices.

YOLOv5 multi-attribute classification based on separable label collaborative learning

Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG

2024, 44(5): 1619-1628. DOI: 10.11772/j.issn.1001-9081.2023050675

Asbtract ( )

HTML ( )

PDF (4949KB) ( )

Figures and Tables | References | Related Articles | Metrics

An Multi-YOLOv5 method was proposed for vehicle multi-attribute classification based on YOLOv5 to address the challenges of insufficient ability of convolutional networks to extract fine-grained features of images and inability to recognize dependencies between multiple attributes in image classification tasks. A collaborative working mechanism of Multi-head Non-Maximum Suppression （Multi-NMS） and separable label loss （Separate-Loss） function was designed to complete the multi-attribute classification task of vehicles. Additionally， the YOLOv5 detection model was reconstructed by using Convolutional Block Attention Module （CBAM）， Shuffle Attention （SA）， and CoordConv methods to enhance the ability of extracting multi-attribute features， strengthen the correlation between different attributes， and enhance the network’s perception of positional information， thereby improving the accuracy of the model in multi-attribute classification of objects. Finally， training and testing were conducted on datasets such as VeRi. Experimental results demonstrate that the Multi-YOLOv5 approach achieves superior recognition outcomes in multi-attribute classification of objects compared to network architectures including GoogLeNet， Residual Network （ResNet）， EfficientNet， and Vision Transformer （ViT）. The mean Average Precision （mAP） of Multi-YOLOv5 reaches 87.37% on VeRi dataset， with a remarkable improvement of 4.47 percentage points over the best-performing method mentioned above. Moreover， Multi-YOLOv5 exhibits better robustness compared to the original YOLOv5 model， thus providing reliable data information for traffic object perception in dense environments.

Meta-learning adaption for few-shot text-to-speech

Zhihao WU, Ziqiu CHI, Ting XIAO, Zhe WANG

2024, 44(5): 1629-1635. DOI: 10.11772/j.issn.1001-9081.2023050640

Asbtract ( )

HTML ( )

PDF (1457KB) ( )

Figures and Tables | References | Related Articles | Metrics

Few-shot Text-To-Speech （TTS） aims to synthesize speech that closely resembles the original speaker using only a small amount of training data. However， this approach faces challenges in quickly adapting to new speakers and improving the similarity between generated speech and speakers while ensuring high speech quality. Existing models often overlook changes in model features during different adaptation stages， leading to slow improvement of speech similarity. To address these issues， a meta-learning-guided model for adapting to new speakers was proposed. The model was guided by a meta-feature module during the adaptation process， ensuring the improvement of speech similarity while maintaining the quality of generated speech during the adaptation to new speakers. Furthermore， the differentiation of adaptation stages was achieved through a step encoder， thereby enhancing the speed of model adaptation to new speakers. The proposed method was evaluated on the Libri-TTS and VCTK datasets using subjective and objective evaluation metrics. Experimental results show that the Dynamic Time Warping-Mel Cepstral Distortion （DTW-MCD） of the proposed model are 7.450 2 and 6.524 3， respectively. It surpasses other meta-learning methods in terms of synthesized speech similarity and enables faster adaptation to new speakers.

Classroom speech emotion recognition method based on multi-scale temporal-aware network

Juxiang ZHOU, Jinsheng LIU, Jianhou GAN, Di WU, Zijie LI

2024, 44(5): 1636-1643. DOI: 10.11772/j.issn.1001-9081.2023050663

Asbtract ( )

HTML ( )

PDF (4548KB) ( )

Figures and Tables | References | Related Articles | Metrics

Speech emotion recognition has been widely used in multi-scenario intelligent systems in recent years， and it also provides the possibility to realize intelligent analysis of teaching behaviors in smart classroom environments. Classroom speech emotion recognition technology can be used to automatically recognize the emotional states of teachers and students during classroom teaching， help teachers understand their own teaching styles and grasp students’ classroom learning status in a timely manner， thereby achieving the purpose of precise teaching. For the classroom speech emotion recognition task， firstly， classroom teaching videos were collected from primary and secondary schools， the audio was extracted， and manually segmented and annotated to construct a primary and secondary school teaching speech emotion corpus containing six emotion categories. Secondly， based on the Temporal Convolutional Network （TCN） and cross-gated mechanism， dual temporal convolution channels were designed to extract multi-scale cross-fusion features. Finally， a dynamic weight fusion strategy was adopted to adjust the contributions of features at different scales， reduce the interference of non-important features on the recognition results， and further enhance the representation and learning ability of the model. Experimental results show that the proposed method is superior to advanced models such as TIM-Net （Temporal-aware bI-direction Multi-scale Network）， GM-TCNet （Gated Multi-scale Temporal Convolutional Network）， and CTL-MTNet （CapsNet and Transfer Learning-based Mixed Task Net） on multiple public datasets， and its UAR （Unweighted Average Recall） and WAR （Weighted Average Recall） reach 90.58% and 90.45% respectively in real classroom speech emotion recognition task.

Survey of visual object tracking methods based on Transformer

Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN

2024, 44(5): 1644-1654. DOI: 10.11772/j.issn.1001-9081.2023060796

Asbtract ( )

HTML ( )

PDF (1615KB) ( )

Figures and Tables | References | Related Articles | Metrics

Visual object tracking is one of the important tasks in computer vision， in order to achieve high-performance object tracking， a large number of object tracking methods have been proposed in recent years. Among them， Transformer-based object tracking methods become a hot topic in the field of visual object tracking due to their ability to perform global modeling and capture contextual information. Firstly， existing Transformer-based visual object tracking methods were classified based on their network structures， an overview of the underlying principles and key techniques for model improvement were expounded， and the advantages and disadvantages of different network structures were also summarized. Then， the experimental results of the Transformer-based visual object tracking methods on public datasets were compared to analyze the impact of network structure on performance. in which MixViT-L （ConvMAE） achieved tracking success rates of 73.3% and 86.1% on LaSOT and TrackingNet， respectively， proving that the object tracking methods based on pure Transformer two-stage architecture have better performance and broader development prospects. Finally， the limitations of these methods， such as complex network structure， large number of parameters， high training requirements， and difficulty in deploying on edge devices， were summarized， and the future research focus was outlooked， by combining model compression， self-supervised learning， and Transformer interpretability analysis， more kinds of feasible solutions for Transformer-based visual target tracking could be presented.

Table of Content