Journal of Computer Applications

Survey on hypergraph application methods： issues， advances， and challenges

Li ZENG, Jingru YANG, Gang HUANG, Xiang JING, Chaoran LUO

2024, 44(11): 3315-3326. DOI: 10.11772/j.issn.1001-9081.2023111629

Asbtract ( )

HTML ( )

PDF (795KB) ( )

Figures and Tables | References | Related Articles | Metrics

Hypergraph is the generalization of graph， which has significant advantages in representing higher-order features of complex relationships compared with ordinary graph. As a relatively new data structure， hypergraph is playing a crucial role in various application fields increasingly. By appropriately using hypergraph models and algorithms， specific problems in real world were modeled and solved with higher efficiency and quality. Existing surveys of hypergraph mainly focus on the theory and techniques of hypergraph itself， and lack of a summary of modeling and solving methods in specific scenarios. To this end， after summarizing and introducing some fundamental concepts of hypergraph， the application methods， techniques， common issues， and solutions of hypergraph in various application scenarios were analyzed； by summarizing the existing work， some problems and obstacles that still exist in the applications of hypergraph to real-world problems were elaborated. Finally， the future research directions of hypergraph applications were prospected.

Graph classification method based on graph pooling contrast learning

Nengbing HU, Biao CAI, Xu LI, Danhua CAO

2024, 44(11): 3327-3334. DOI: 10.11772/j.issn.1001-9081.2023101526

Asbtract ( )

HTML ( )

PDF (715KB) ( )

Figures and Tables | References | Related Articles | Metrics

In the tasks of graph classification， the graph embedding representations obtained by the existing dropped nodes based graph pooling algorithms do not effectively utilize the implicit information in the dropped nodes and the node information between graphs. Meanwhile， traditional methods do not learn the graph embedding separately， thereby limiting some of its performance in graph classification tasks. To address the shortcomings of traditional methods， a Graph classification method based on graph Pooling Contrast Learning （GPCL） was proposed to effectively utilize the dropped node information. Firstly， the graph attention mechanism was utilized to learn the corresponding attention score for each node， and the nodes were sorted based on the attention scores and the nodes with lower scores were dropped out. Then， the nodes retained in the graph were treated as positive samples， while the dropped nodes from other graphs were treated as negative samples， the embedding representation of the graph was considered as the target node， and the pairwise similarity scores were calculated for contrast learning. Experimental results demonstrate that on D&D （Dobson PD-Doig AJ）， MUTAG， PROTEINS， and IMDB-B datasets， GPCL improves the accuracy in graph classification tasks by 5.79， 15.54， 5.42， and 1.75 percentage points respectively compared to the method using attention mechanism with hierarchical pooling alone. It is verified that GPCL enhances the utilization of inter-graph information effectively and performances well in graph classification tasks.

Graph convolution network-based masked data augmentation

Xinrong HU, Jingxue CHEN, Zijian HUANG, Bangchao WANG, Xun YAO, Junping LIU, Qiang ZHU, Jie YANG

2024, 44(11): 3335-3344. DOI: 10.11772/j.issn.1001-9081.2023111645

Asbtract ( )

HTML ( )

PDF (788KB) ( )

PDF（mobile） (2025KB) ( 12 )

Figures and Tables | References | Related Articles | Metrics

Concerning the problems of inaccurate information of raw data， low quality of samples and poor generalisation ability of models in the field of Multiple-Choice Question Answering （MCQA）， a mask data augmentation model based on Graph Convolutional Network （GCN） was proposed， namely GMDA （Graph convolution network-based MASK Data Augmentation）. Using GCN as the basic frame， the words in the articles were abstracted as graph nodes and connected by Question-candidate Answer （QA） pair nodes to establish connections with related article nodes. Secondly， the similarity between nodes was calculated and the masking technique was applied to mask the nodes in the graph to generate the augmented samples. Thirdly， the augmented samples were subjected to feature expansion by using GCN to enhance the model's information representation capability. Finally， a scorer was introduced to score the original and augmented samples， and the curriculum learning strategy was combined to improve the accuracy of answer prediction. The results of the comprehensive evaluation experiments show that compared with the best baseline model EAM on RACE-M and RACE-H datasets， the proposed GMDA model improves the accuracy by an average of 0.8 and 0.4 percentage points respectively， and compared with the best baseline model STM （SelfTraining Method） on DREAM dataset， the GMDA model has the average accuracy improved by 1.4 percentage points. Besides， comparative experiments also prove the effectiveness of the GMDA model in MCQA tasks， which can help further research and application of data augmentation techniques in this field.

Personalized federated learning based on similarity clustering and regularization

Jie WU, Xuezhong QIAN, Wei SONG

2024, 44(11): 3345-3353. DOI: 10.11772/j.issn.1001-9081.2023111693

Asbtract ( )

HTML ( )

PDF (1016KB) ( )

Figures and Tables | References | Related Articles | Metrics

In Federated Learning （FL） application scenarios， the problems of data heterogeneity and the need to provide personalized models for different task requirements are often faced. However， the trade-off between personalization and global generalization exists in some existing Personalized Federated Learning （PFL） algorithms， and most of these algorithms use the weighted aggregation based on the amount of client data in traditional FL method， which causes poor model performance for clients with significant differences in data distribution and a lack of personalized aggregation strategies. In response to the above problems， a new PFL algorithm based on similarity clustering and regularization， namely pFedSCR， was proposed. The pFedSCR algorithm trains personalized models and local models in the client local update phase， in which the L2 norm regularization was introduced into the cross entropy loss function by the personalized models to dynamically adjust the degree of reference to the global model， thereby achieving personalization based on learning global knowledge； in the server aggregation phase， an aggregation weight matrix was constructed based on the similarity clustering updated by the client models， and the aggregation weights were dynamically adjusted to aggregate personalized models for different clients， so as to make the parameter aggregation strategy personalized while solving the problem of data heterogeneity at the same time. Experimental results under multiple Non-Independent Identical Distribution （Non-IID） data scenarios simulated through Dirichlet distribution on three datasets such as CIFAR-10， MNIST and Fashion-MNIST show that compared with some FL algorithms including the classic algorithm FedProx and the latest personalized algorithm FedPCL （Federated Prototype-wise Contrastive Learning）， the pFedSCR algorithm has higher precision and communication efficiency in various scenarios， and can obtain 99.03% accuracy at most.

Transfer kernel learning method based on spatial features for motor imagery EEG

Siqi YANG, Tianjian LUO, Xuanhui YAN, Guangju YANG

2024, 44(11): 3354-3363. DOI: 10.11772/j.issn.1001-9081.2023111593

Asbtract ( )

HTML ( )

PDF (1026KB) ( )

Figures and Tables | References | Related Articles | Metrics

Motor Imagery ElectroEncephaloGram （MI-EEG） signal has gained widespread attention in the construction of non-invasive Brain Computer Interfaces （BCIs） for clinical assisted rehabilitation. Limited by the differences in the distribution of MI-EEG signal samples from different subjects， cross-subject MI-EEG signal feature learning has become the focus of research. However， the existing related methods have problems such as weak domain-invariant feature expression capabilities and high time complexity， and cannot be directly applied to online BCIs. To address this issue， an efficient cross-subject MI-EEG signal classification algorithm， Transfer Kernel Riemannian Tangent Space （TKRTS）， was proposed. Firstly， the MI-EEG signal covariance matrices were projected into the Riemannian space and the covariance matrices of different subjects were aligned in Riemannian space while extracting Riemannian Tangent Space （RTS） features. Subsequently， the domain-invariant kernel matrix on the tangent space feature set was learnt， thereby achieving a complete representation of cross-subject MI?EEG signal features. This matrix was then used to train a Kernel Support Vector Machine （KSVM） for classification. To validate the feasibility and effectiveness of TKRTS method， multi-source domain to single-target domain and single-source domain to single-target domain experiments were conducted on three public datasets， and the average classification accuracy is increased by 0.81 and 0.13 percentage points respectively. Experimental results demonstrate that compared to state-of-the-art methods， TKRTS method improves the average classification accuracy while maintaining similar time complexity. Furthermore， ablation experimental results confirm the completeness and parameter insensitivity of TKRTS method in cross-subject feature expression， making this method suitable for constructing online BCIs.

Meta label correction method based on shallow network predictions

Yuxin HUANG, Yiwang HUANG, Hui HUANG

2024, 44(11): 3364-3370. DOI: 10.11772/j.issn.1001-9081.2023111616

Asbtract ( )

HTML ( )

PDF (828KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at overfitting problem caused by memory behavior of Deep Neural Networks （DNNs） on image data with noisy labels， a meta label correction method based on predictions from shallow neural networks was proposed. In this method， with the use of weakly supervised training method， a label reweighting network was set to reweight noise data， meta learning method was employed to facilitate dynamic learning of the model to noise data， and the prediction output from both deep and shallow networks was used as the pseudo labels to train the model. At the same time， the knowledge distillation algorithm was applied to allow the deep network to guide the training of the shallow networks. In this way， the memory behavior of the model was alleviated effectively and the robustness of the model was enhanced. Experiments conducted on CIFAR10/100 and Clothing1M datasets demonstrate the superiority of the proposed method over Meta Label Correction （MLC） method. Particularly， on CIFAR10 dataset with symmetrical noise ratios of 60% and 80%， the accuracy improvements are 3.49 and 1.56 percentage points respectively. Furthermore， in ablation experiments on CIFAR100 dataset with asymmetric noise ratio of 40%， at most 5.32 percentage points accuracy improvement is achieved by the proposed method over models trained without predicted labels， confirming the feasibility and effectiveness of the proposed method.

Fusing entity semantic and structural information for knowledge graph reasoning

Linqin WANG, Te ZHANG, Zhihong XU, Yongfeng DONG, Guowei YANG

2024, 44(11): 3371-3378. DOI: 10.11772/j.issn.1001-9081.2023111677

Asbtract ( )

HTML ( )

PDF (705KB) ( )

Figures and Tables | References | Related Articles | Metrics

Currently， Graph ATtention network （GAT） assigns different weights to entities in the neighbourhood of the target entity and performs information aggregation by introducing an attention mechanism， which makes it pay more attention to the local neighbourhood of the entity and ignore the topology between entities and relations in the graph structure. Moreover， the output embedding vectors are simply spliced or averaged after the multi-head attention， resulting in the independence of attention heads， and fails to capture important semantic information of different attention heads. Aiming at the problems that GAT does not fully mine entity structural information and semantic information when it is applied to knowledge graph reasoning task， a Fusing Entity Semantic and Structural Information for knowledge graph reasoning （FESSI） model was proposed. Firstly， TransE was used to represent entities and relationships as embedding vectors in the same space. Secondly， an interactive attention mechanism was proposed to reintegrate the multi-head attention in GAT into multiple hybrid attentions， which enhanced the interaction between the attention heads to extract richer semantic information of the target entity. At the same time， the structural information of the entity was extracted by utilizing the Relational Graph Convolutional Network （R-GCN）， and the output feature vectors of GAT and R-GCN were learned through weight matrices. Finally， ConvKB was used as a decoder for scoring. Experimental results on the knowledge graph datasets Kinship， NELL-995 and FB15K-237 show that the FESSI model outperforms most comparison models， with the Mean Reciprocal Rank （MRR） index on the three datasets of 0.964， 0.565 and 0.562， respectively.

Document-level relationship extraction based on evidence enhancement and multi-feature fusion

Xinyue YAN, Shuqun YANG, Yongbin GAO

2024, 44(11): 3379-3385. DOI: 10.11772/j.issn.1001-9081.2023101516

Asbtract ( )

HTML ( )

PDF (1096KB) ( )

Figures and Tables | References | Related Articles | Metrics

Document-level Relationship Extraction （DocRE） aims at identifying all the relationships that exist between entity pairs in a document. Aiming at the problems of ineffective use of evidence sentences as well as document information， and multiple mentions of entities， a multi-feature fusion DocRE model named EMF （Evidence Multi-feature Fusion） was constructed based on evidence-enhanced contextual features. Firstly， entity types were added before and after entities， and relationship text features were associated with entity mentions to obtain relationship-specific entity features. Secondly， fragment representations were obtained through different convolutional kernels， and multi-granularity fragment-level features perceived by entity pairs were obtained through the attention mechanism. Meanwhile， contextual features highly correlated with the entity pairs were enhanced by using evidence distribution. Finally， the above features were fused for relationship classification， and during inference， the obtained evidence was composed into a pseudo-document and input into the classifier together with the original document for relationship classification. Experimental results on DocRED （Document-level Relation Extraction Dataset）， a DocRE dataset， show that when using BERT_base as the PLM encoder， compared with the state-of-the-art model EIDER （EvIDence-Enhanced DocRE）， the EMF model has the Ign F1 and F1 improved by 0.42 and 0.41 percentage points respectively， and the F1 reached 62.89%. It can be seen that the EMF model pays more attention to the parts that are related to entities and relationships， improves the extraction accuracy， and has a good interpretability.

Optimization of edge connection rules for supply chain network based on improved expectation maximization algorithm

Zhongyu WANG, Xiaodong QIAN

2024, 44(11): 3386-3395. DOI: 10.11772/j.issn.1001-9081.2023111596

Asbtract ( )

HTML ( )

PDF (1292KB) ( )

Figures and Tables | References | Related Articles | Metrics

Aiming at the problem that the stochastic connection of enterprises may lead to the decrease of network stability and operational efficiency in the evolution stage of supply chain network， an improved connection algorithm of supply chain network based on Expectation Maximization （EM） algorithm was proposed. Firstly， the number of edges of network nodes was added to the algorithm as a new parameter to determine the number of edges possessed by new nodes in the supply chain network more accurately. Secondly， with the number of edges determined， residual edge connection rules was introduced to enhance the selectivity and differentiation of nodes. Finally， by ensuring the smooth operation of the enterprise nodes newly connected to the network， the influence of different initial edge numbers on the network evolution was studied. Simulation results show that compared with the EM algorithm， the proposed improved algorithm only needs 80 iterations to obtain stable results， and the number of connected edges is stable around 4 within the network scale of 1 000 nodes， which matches the evolution process of the actual supply chain network. It can be seen that the proposed algorithm is obviously better than the original EM algorithm in fitting performance of the actual supply chain network.

Non-overlapping community detection with imbalanced community sizes

Shiliang LIU, Yi WANG, Yinglong MA

2024, 44(11): 3396-3402. DOI: 10.11772/j.issn.1001-9081.2023101536

Asbtract ( )

HTML ( )

PDF (659KB) ( )

Figures and Tables | References | Related Articles | Metrics

Community detection helps to comprehend the complex structure of social networks， but most of the existing community detection methods do not consider the imbalanced sizes of communities to detect， and the discovered community structures are relatively single with low accuracy. Therefore， a non-overlapping community detection method based on Local Expansion of Initial Community Structure （LEICS） was proposed. LEICS was divided into three stages： in the first stage， the initial community structures with different scales were detected by utilizing the hierarchical structure information and local structure information of the network； in the second stage， the initial community was expanded by calculating the connection intensity between the node and the nodes in the community and the modularity contribution of the node， and then using the Label Propagation Algorithm （LPA） to deal with the rest of the nodes； in the third stage， for unstable communities with size smaller than the average community size， the nodes were redistributed to further optimize the results of community detection. Experimental results on twelve datasets of real-world networks and Lancichinetti-Fortunato-Radicchi （LFR） simulated networks show that compared to the suboptimal Local Balanced Label Diffusion （LBLD） algorithm， LEICS improves the Normalized Mutual Information （NMI） by at least 5 percentage points on Polbooks and YouTube networks， and the accuracy and robustness of LEICS in both small-size and large-size networks are fully validated， proving that LEICS can adapt to the imbalance of community size.

Distributed k-nearest neighbor query algorithm for moving objects in dynamic road network

Guoxiang CHEN, Ziqiang YU, Haoyu ZHAO

2024, 44(11): 3403-3410. DOI: 10.11772/j.issn.1001-9081.2023101503

Asbtract ( )

HTML ( )

PDF (758KB) ( )

Figures and Tables | References | Related Articles | Metrics

The k-Nearest Neighbor （kNN） query in dynamic road network is an important problem in many Location-Based Services （LBS）. To address this problem， a distributed moving object kNN query algorithm named DkNN （Distributed k-Nearest Neighbor） was proposed for dynamic road networks. Firstly， the entire road network was divided into multiple subgraphs deployed in different nodes in the cluster. Then， the accurate kNN results were obtained by searching the subgraphs involved in the query range in parallel. Finally， the searching process of the query was optimized， and the query range pruning strategy as well as the query termination strategy was introduced. A full comparison and validation with three baseline algorithms on four road network datasets was performed. Experimental results show that the DkNN algorithm reduces the query time by 56.8% and the road network update time by 3 orders of magnitude compared to TEN^*-Index （Tree dEcomposition based kNN^* Index） algorithm. The DkNN algorithm can quickly respond to kNN query requests in dynamic road network and has a low update cost when dealing with road network updates.

Personalized multi-layer interest extraction click-through rate prediction model

Liqing QIU, Xiaopan SU

2024, 44(11): 3411-3418. DOI: 10.11772/j.issn.1001-9081.2023111681

Asbtract ( )

HTML ( )

PDF (702KB) ( )

Figures and Tables | References | Related Articles | Metrics

Currently， the most common way to predict Click-Through Rate （CTR） is to extract interest features through feature interaction techniques， but most of these methods ignore the inherent relationships between users and items， and fail to fully explore the potential interests of users implied among items. To address this problem， a Personalized Multi-layer Interest extraction Click-through rate prediction model （PMIC） was proposed， aiming to mine multi-layer interests shown by users at the same time from different perspectives. Firstly， the recall matching method was employed to learn and model the relationships between users and items from both the item learning module and the user learning module， thereby capturing diverse interests of users. Secondly， the multi-head self-attention mechanism was utilized within the item learning module to concurrently extract multiple potential interests. Finally， a corresponding inner product method was adopted to further refine and enhance the feature representation between users and items. Experimental results on multiple public datasets demonstrate that PMIC improves the Area Under the receiver operating characteristic Curve （AUC） by at least 2.3%.

Multivariate time series anomaly detection based on multi-domain feature extraction

Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG

2024, 44(11): 3419-3426. DOI: 10.11772/j.issn.1001-9081.2023111636

Asbtract ( )

HTML ( )

PDF (754KB) ( )

PDF（mobile） (1807KB) ( 24 )

Figures and Tables | References | Related Articles | Metrics

Due to the high dimensionality and the complex variable distribution of Multivariate Time Series （MTS） data， the existing anomaly detection models generally suffer from high error rates and training difficulties when dealing with MTS datasets. Moreover， most models only consider the spatial-temporal features of time series samples， which are not sufficient to learn the features of time series. To solve the above problems， a multivariate Time Series anomaly detection model based on Multi-domain Feature Extraction （MFE-TS） was proposed. Firstly， starting from the original data domain， the Long Short-Term Memory （LSTM） network and the Convolutional Neural Network （CNN） were used to extract the temporal correlation and spatial correlation features of the MTS respectively. Secondly， Fourier transform was used to convert the original time series into frequency domain space， and Transformer was used to learn the amplitude and phase features of the data in frequency domain space. Multi-domain feature learning was able to model time series features more comprehensively， thereby improving anomaly detection performance of the model to MTS. In addition， the masking strategy was introduced to further enhance the feature learning ability of the model and make the model have a certain degree of noise resistance. Experimental results show that MFE-TS has superior performance on multiple real MTS datasets， while it still maintain good detection accuracy on datasets with noise.

Time series prediction algorithm based on multi-scale gated dilated convolutional network

Yu ZENG, Yang ZHANG, Shang ZENG, Maoli FU, Qixue HE, Linlong ZENG

2024, 44(11): 3427-3434. DOI: 10.11772/j.issn.1001-9081.2023111583

Asbtract ( )

HTML ( )

PDF (803KB) ( )

Figures and Tables | References | Related Articles | Metrics

Addressing challenges in time series prediction tasks， such as high-dimensional features， large-scale data， and the demand for high prediction accuracy， a multi-scale trend-period decomposition model based on a multi-head gated dilated convolutional network was proposed. A multi-scale decomposition approach was employed to decompose the original covariate sequence and the prediction variable sequence into their respective periodic terms and trend terms， thereby enabling independent prediction. For the periodic terms， the multi-head gated dilated convolutional network encoder was introduced to extract respective periodic information； in the decoder stage， channel information interaction and fusion were performed through the utilization of a cross-attention mechanism， and after sampling and aligning the periodic information of the prediction variables， the periodic prediction was performed through time attention and channel fusion information. The trend terms prediction was executed by using an autoregressive approach. Finally， the prediction sequence was obtained by incorporating the trend prediction results with the periodic prediction results. Compared with multiple mainstream benchmark models such as Long Short-Term Memory （LSTM） and Informer， on five datasets including ETTm1 and ETTh1， a reduction in Mean Squared Error （MSE） is observed， ranging from 19.2% to 52.8% on average， a decrease in Mean Absolute Error （MAE） is noted， ranging from 12.1% to 33.8% on average. Ablation experiments confirm that the proposed multi-scale decomposition module， multi-head gated dilation convolution， and time attention module can enhance the accuracy of time series prediction.

Long-term prediction model of time series based on multi-scale feature fusion

Wenbo LIU, Lianfei YU, Dongmei XIE, Chuang CAI, Zhijian QU, Chongguang REN

2024, 44(11): 3435-3441. DOI: 10.11772/j.issn.1001-9081.2023111705

Asbtract ( )

HTML ( )

PDF (1095KB) ( )

Figures and Tables | References | Related Articles | Metrics

Long-term time series prediction has a wide range of application requirements in many fields. However， the non-stationarity problem shown in the long-term prediction process of time series is a key factor affecting the prediction accuracy. To improve the long-term prediction accuracy of time series and the universality of prediction model， a Multi?Scale Decomposition Fusion Attention Network （MSDFAN） was constructed. The model uses time series decomposition to extract seasonal components and trend components in the input data， and models different predictions for different data components， and is able to model and predict non?stationary time components with multi?scale stability characteristics. Experimental results show that compared with FEDformer， the Mean Squared Error （MSE） and Mean Absolute Error （MAE） of MSDFAN on five benchmark datasets are reduced by 12.95% and 8.49%， averagely and respectively. MSDFAN achieves a better prediction accuracy on multivariate time series.

Multivariate long-term series forecasting model based on decomposition and frequency domain feature extraction

Yiyang FAN, Yang ZHANG, Shang ZENG, Yu ZENG, Maoli FU

2024, 44(11): 3442-3448. DOI: 10.11772/j.issn.1001-9081.2023111684

Asbtract ( )

HTML ( )

PDF (753KB) ( )

PDF（mobile） (1302KB) ( 18 )

Figures and Tables | References | Related Articles | Metrics

In response to the problems that the existing Transformer-based Multivariate Long-Term Series Forecasting （MLTSF） models mainly extract features from the time domain， and it is difficult to find out reliable dependencies directly from the dispersed time points of the long-term series， a new decomposition and frequency domain feature extraction model was proposed. Firstly， a periodic term-trend term decomposition method based on the frequency domain was proposed， which reduced the time complexity of the decomposition process. Then， based on the extraction of trend features using periodic term-trend term decomposition， a Transformer network performing frequency domain feature extraction based on Gabor transform was utilized to capture periodic dependencies， which enhanced the stability and robustness of forecasting. Experimental results on five benchmark datasets show that compared with the current state-of-the-art methods， the proposed model has the Mean Squared Error （MSE） in MLTSF is reduced by an average of 7.6% with a maximum reduction of 18.9%， which demonstrates that the proposed model improves forecasting accuracy effectively.

Disease sample classification algorithm by Bayesian network with gene association analysis

Zhijie LI, Xuhong LIAO, Yuanxiang LI, Qinglan LI

2024, 44(11): 3449-3458. DOI: 10.11772/j.issn.1001-9081.2024030398

Asbtract ( )

HTML ( )

PDF (644KB) ( )

Figures and Tables | References | Related Articles | Metrics

As a specific type of big data in biology， similarity of gene expression data is not based on Euclidean distance but on whether gene expression values show a trend of both rise and fall together， although they are all ordinary real values. The current gene Bayesian network uses gene expression level values as node random variables and does not reflect the similarity of this kind of subspace pattern. Therefore， a Bayesian network disease Classification algorithm based on Gene Association analysis （BCGA） was proposed to learn Bayesian networks from labeled disease sample-gene expression data and predict the classification of new disease samples. Firstly， disease samples were discretized and filtered to select genes， and the dimensionally reduced gene expression values were sorted and replaced with gene column subscripts. Secondly， the subscript sequence of gene column was decomposed into a set of atomic sequences with a length of 2， and the frequent atomic sequence of this set was corresponding to the association of a pair of genes. Finally， causal relationships were measured through gene association entropy for Bayesian network structure learning. Besides， the parameter learning of BCGA became easy， and the conditional probability distribution of a gene node was able to be obtained by counting the atomic sequence occurrence frequency of the gene and its parent node gene. Experimental results on multiple tumor and non-tumor gene expression datasets show that BCGA significantly improves disease classification accuracy and effectively reduces analysis time compared to the existing similar algorithms. In addition， BCGA uses gene association entropy instead of conditional independence， and gene atomic sequences instead of gene expression values， which can better fit gene expression data better.

Overview of backdoor attacks and defense in federated learning

Xuebin CHEN, Changsheng QU

2024, 44(11): 3459-3469. DOI: 10.11772/j.issn.1001-9081.2023111653

Asbtract ( )

HTML ( )

PDF (785KB) ( )

Figures and Tables | References | Related Articles | Metrics

Federated Learning （FL） is a distributed machine learning approach that allows different participants to train a machine model collaboratively using their respective local datasets， addressing issues such as data island and user privacy protection. However， due to the inherent distributed nature of FL， it is more susceptible to backdoor attacks， posing greater challenges in practical applications of FL. Therefore， a deep understanding of backdoor attacks and defense methods in FL environment is crucial for the advancement of this field. Firstly， the definition， process， and classification of federated learning， as well as the definition of backdoor attacks， were introduced. Then， detailed representation and analysis were performed on both backdoor attacks and defense schemes in FL environment. Moreover， comparisons of backdoor attacks and defense methods were conducted. Finally， the development of backdoor attacks and defense methods in the FL environment were prospected.

Survey of software security testing techniques in DevSecOps

Yixi LIU, Jun HE, Bo WU, Bingtong LIU, Ziyu LI

2024, 44(11): 3470-3478. DOI: 10.11772/j.issn.1001-9081.2023101531

Asbtract ( )

HTML ( )

PDF (616KB) ( )

Figures and Tables | References | Related Articles | Metrics

Software security testing technology has become an essential method for software developers to improve software performance and resist network attacks in the Internet age. DevSecOps （Development， Security and Operations）， as a new generation software development pattern which integrates Security and Operations into Development and maintenance， can identify the possible threats to the software and effectively evaluate the security of software， and can make software security risks within control. Therefore， starting from the process of DevOps （Development and Operations）， the various stages of DevOps involving software security testing techniques were sorted out， including source code audit， fuzzing， vulnerability scanning， penetration testing， and security crowdsourced testing techniques. And by collecting and analyzing the relevant technical literature in the last three years in well-known index databases， such as SCI， EI， SCOPUS， CNKI， CSCD and WanFang， the research status of the above techniques was summarized and the recommendations for the use of relevant testing tools were given. At the same time， aiming at the advantages and disadvantages of each technical support means， the future development directions of software development mode DevSecOps were prospected.

Model integrity verification framework of deep neural network based on fragile fingerprint

Xiang LIN, Biao JIN, Weijing YOU, Zhiqiang YAO, Jinbo XIONG

2024, 44(11): 3479-3486. DOI: 10.11772/j.issn.1001-9081.2023101518

Asbtract ( )

HTML ( )

PDF (1197KB) ( )

Figures and Tables | References | Related Articles | Metrics

Pre-trained models are susceptible to attacks implemented by external enemies， such as model fine-tuning and pruning， which destroy their integrity. To address this issue， a fragile fingerprint framework FFWAS （Fragile Fingerprint With Adversarial Samples） for black-box models was proposed. Firstly， a model replication framework without prior knowledge was introduced， and independent model copy for each user was generated by FFWAS. Then， a black-box approach was employed to place a fragile fingerprint trigger set at the model's boundary. If the model was modified and the boundaries were changed， the trigger set would be misclassified. Finally， the integrity of the model was verified by users with the help of the fragile fingerprint trigger set on the model replicas， and if the recognition rate of the trigger set fell below the predefined threshold， it indicated that the model integrity had been compromised. The effectiveness and fragility of FFWAS were analyzed through experiments based on two publicly datasets MNIST and CIFAR-10. Experimental results demonstrate that under both model fine-tuning and pruning attacks， the fingerprint recognition rates of FFWAS significantly decrease compared to the complete model and fall below the predefined thresholds. Compared to Deep Neural Network Authentication framework （DeepAuth） based on model uniqueness and fragile signatures， FFWAS exhibits approximately 22% and 16% improvements in the similarity between the trigger set and the original samples on two datasets， indicating better stealthiness of FFWAS.

Malicious traffic detection model based on semi-supervised federated learning

Shuaihua ZHANG, Shufen ZHANG, Mingchuan ZHOU, Chao XU, Xuebin CHEN

2024, 44(11): 3487-3494. DOI: 10.11772/j.issn.1001-9081.2023101500

Asbtract ( )

HTML ( )

PDF (748KB) ( )

Figures and Tables | References | Related Articles | Metrics

Malicious traffic detection is one of the key technologies to deal with network security challenges. Aiming at the problems of insufficient local labeled data and degradation of co-trained model performance due to non-Independent and Identical Distribution （non-IID） when using federated learning for malicious traffic detection， a semi-supervised federated learning-based malicious traffic detection model was constructed. The proposed model was trained effectively by information extracted from unlabeled data with the help of semi-supervised learning techniques of pseudo-labeling and consistency regularization terms. At the same time， a nonlinear function was designed to dynamically adjust the weights of the client's local supervised and unsupervised losses during aggregation to make full use of unlabeled data and improve accuracy of the model. To reduce the impact of non-IID problems on performance of the global model， a federated aggregation algorithm FedLD （Federated-Loss-Data） was proposed， which adaptively adjusted the weights of different client models in the global model aggregation process through a weight calculation method that combined training loss and data volume. Experimental results show that on NSL-KDD dataset， the proposed model can achieve higher detection accuracy when labeled data is limited. Compared with the baseline model FedSem （Federated Semi-supervised）， the proposed model has the detection accuracy increased by 4.11 percentage points， and the recall in Normal， Denial-of-Service （DoS）， Probe and other categories also increased by 1.65 to 7.66 percentage points， verifying that the proposed model is more suitable for applications in the field of malicious traffic detection.

SM9-based attribute-based searchable encryption scheme with cryptographic reverse firewall

Gaimei GAO, Mingbo DUAN, Yaling XUN, Chunxia LIU, Weichao DANG

2024, 44(11): 3495-3502. DOI: 10.11772/j.issn.1001-9081.2023111678

Asbtract ( )

HTML ( )

PDF (951KB) ( )

Figures and Tables | References | Related Articles | Metrics

In response to the facts that most of Attribute-Based Searchable Encryption （ABSE） schemes are designed on the basis of non-national encryption algorithms and are unable to resist internal Algorithm Substitution Attack （ASA）， an SM9-based Attribute-Based Searchable Encryption with Cryptographic Reverse Firewall （SM9ABSE-CRF） was proposed. This scheme extends the SM9 algorithm to the ABSE field， realizes fine-grained data access control， and introduces Cryptographic Reverse Firewall （CRF） technology to effectively resist ASA. SM9ABSE-CRF was analyzed under the Decisional Bilinear Diffie-Hellman （DBDH） assumption， and the deployment of CRF was formally proved to maintain functionality， preserving security， and resisting exfiltration. Theoretical analysis and simulation results show that compared to the ABSE scheme providing CRF — cABKS-CRF （consistent Attribute-Based Keyword Search system with CRF）， SM9ABSE-CRF has higher security and demonstrates notable performance advantages during index and trapdoor generation phases.

Review of phase transition in satisfiability problems

Qingyuan PENG, Xiaofeng WANG, Junxia WANG, Yingying HUA, Ao TANG, Fei HE

2024, 44(11): 3503-3512. DOI: 10.11772/j.issn.1001-9081.2023111559

Asbtract ( )

HTML ( )

PDF (739KB) ( )

Figures and Tables | References | Related Articles | Metrics

Constraint Satisfaction Problem （CSP） is a combinatorial optimization problem in the field of theoretical computer science. As a special case of CSP， the SATisfiability （SAT） problem is a hot issue in the fields such as theoretical computer science， mathematical logic and artificial intelligence. Phase transition is a phenomenon in satisfiability problems. The study of the phase transition phenomenon and phase transition mechanism of satisfiability problems have important guiding significance to deeply understand the essence of being hard to solve and general mathematical phenomena of satisfiability problems as well as design more efficient algorithms to solve satisfiability problems. Therefore， according to some important research results of scholars at home and abroad in recent years on phase transition phenomenon in satisfiability problems， firstly， the related knowledge of phase transition in satisfiability problems as well as the probability analysis methods and instance generation models of satisfiability problem were introduced. Then， the phase transition point solution methods and phase transition thresholds of unsatisfiable phase transition and satisfiable phase transition in satisfiability problems were summarized and analyzed. Finally， the research trends of phase transition in satisfiability problems were prospected.

Two echelon location-routing optimization considering facility sizing decision

Qin LENG, Zhengyuan MAO

2024, 44(11): 3513-3520. DOI: 10.11772/j.issn.1001-9081.2023101515

Asbtract ( )

HTML ( )

PDF (777KB) ( )

Figures and Tables | References | Related Articles | Metrics

A two Echelon Location-Routing Problem （2E-LRP） solving model considering facility sizing decision was proposed to address the issues of unreasonable infrastructure layout and space utilization in the existing e-commerce industry. Firstly， differential facility sizing constraints were introduced into the traditional 2E-LRP， different combinations of facility sizes were designed by identifying customer base， the total cost composition was adjusted by using changes in size， and a 2E-LRP model considering facility size change with the minimum operating cost as goal was established. Secondly， a two-stage hybrid iterated local search heuristic algorithm was proposed for solving the model. Finally， the performance of the proposed model and optimization algorithm were analyzed and verified with examples in different datasets such as Prodhon. Experimental results show that the proposed model is universal for regional differences and different data sizes， and there is an inverse relationship between the total cost and the change range of facility size. Compared with the optimal costs of algorithms such as Lagrangean Relaxation Granular Tabu Search （LRGTS）， the average value of the optimal cost of the proposed algorithm on all instances is reduced by 6.67%， which can effectively save the operating cost.

Dynamic partition algorithm for diagonal sparse matrix vector multiplication based on GPU

Jinxing TU, Zhixiong LI, Jianqiang HUANG

2024, 44(11): 3521-3529. DOI: 10.11772/j.issn.1001-9081.2023101524

Asbtract ( )

HTML ( )

PDF (1105KB) ( )

Figures and Tables | References | Related Articles | Metrics

Implementing diagonal Sparse Matrix Vector multiplication （SpMV） on Graphics Processing Unit （GPU） can make full use of the parallel computing capabilities of GPU and accelerate matrix vector multiplication. However， related mainstream algorithms have problems such as a large amount of zero-element filling data and low computational efficiency. In response to the above problems， a diagonal SpMV algorithm DIA-Dynamic （Diagonal-Dynamic） was proposed. Firstly， a new dynamic partition strategy was designed to divide the matrix into blocks according to different characteristics， which greatly reduced the zero-element filling while ensuring high computational efficiency of GPU， thereby removing redundant calculations. Then， a diagonal sparse matrix storage format BDIA （Block DIAgonal） was proposed to store block data， and the data layout was adjusted to improve memory access performance on GPU. Finally， based on the bottom of GPU， the conditional branch optimization was performed to reduce branch judgments， and dynamic shared memory was used to solve the problem of irregular access of vectors. Compared with the state-of-the-art Tile SpMV algorithm， DIA-Dynamic has the average acceleration ratio of 1.88； compared with the cutting-edge BRCSD （Diagonal Compressed Storage based on Row-Blocks）-Ⅱ algorithm， DIA-Dynamic has the average zero-element filling reduced by 43%， and the average acceleration ratio reaches 1.70. Experimental results show that DIA-Dynamic can effectively improve the computational efficiency of diagonal SpMV on GPU， shorten the computing time， improving the program performance.

Efficient adaptive robustness optimization algorithm for complex networks

Jie HUANG, Ruizi WU, Junli LI

2024, 44(11): 3530-3539. DOI: 10.11772/j.issn.1001-9081.2023111659

Asbtract ( )

HTML ( )

PDF (1046KB) ( )

Figures and Tables | References | Related Articles | Metrics

Enhancing the robustness of complex networks is crucial for the networks to resist external attacks and cascading failures. Existing evolutionary algorithms have limitations in solving network structure optimization problems， especially in terms of convergence and optimization speed. In response to this challenge， a new adaptive complex network robustness optimization algorithm named SU-ANet （SUrrogate-assisted and Adaptive Network optimization algorithm） was proposed. To reduce the huge time overhead of robustness computation， a robustness predictor based on attention mechanism was constructed in SU-ANet as an offline surrogate model to replace the frequent robustness computation in local search operator. In the evolutionary process， the global and local information was considered comprehensively to avoid falling into local optimum and broaden the search space simultaneously. By designing crossover operators， each individual exchanges edges with the global optimum candidate solution and a randomly selected individual to balance the convergence and diversity of the algorithm. Additionally， a parameter self-adaptive mechanism was applied to adjust the operator execution probabilities automatically， thereby alleviating the uncertainty of the algorithm brought by the parameter design. Experimental results on both synthetic networks and real-world networks demonstrate that SU-ANet has better search capabilities and higher evolutionary efficiency.

Multi-parameter channel transmission performance evaluation method with improved TCP/IP frame structure

Fengtao HE, Binghui WANG, Bin ZHANG, Yi YANG, Yibo FENG

2024, 44(11): 3540-3547. DOI: 10.11772/j.issn.1001-9081.2023111638

Asbtract ( )

HTML ( )

PDF (1273KB) ( )

Figures and Tables | References | Related Articles | Metrics

At present， the phenomenon of network densification accelerates the degradation of channel transmission performance. And the widely used evaluation methods face significant challenges in evaluating channel transmission performance due to their limited consideration of parameters and constrained applicability. In response to the difficulties in evaluating channel transmission performance， a method for evaluating multi-parameter channel transmission performance through an improved Transmission Control Protocol/Internet Protocol （TCP/IP） frame structure was proposed. Firstly， the standardized test data was generated， including pseudo-random codes， basic curve data， and custom curve data， so as to ensure that the test data follow a uniform standard. Secondly， an improved TCP/IP frame structure was employed to package test data information， including total frame quantity and frame sequences， into the TCP/IP frames. In this way， the sending， receiving and parsing of test data were realized， and the statistics on basic channel transmission variables were completed， such as the number of frames by type， the number of frames by length， the total number of frames， and the volume of effective data. Finally， the received data were analyzed to obtain two types of high-level channel transmission information， namely frame error rate and bit error rate， completing the overall evaluation of the channel transmission performance. The designed method employed six parameters to evaluate channel quality， with the evaluation precision of the method reaching 0.01% and maintaining a minimum error margin of 0.01%. It is compatible with all communication channels using TCP/IP. Experimental results demonstrate that the proposed channel transmission performance evaluation method can perform the statistics and analysis of the six channel communication information， and evaluating the channel transmission performance accurately.

Physical system simulation based on deep representation learning for 3D geometric features

Fu LIN, Jiasheng SHI, Ze GAO, Zunkang CHU, Qiongmin MA, Haiyan YU, Weixiong RAO

2024, 44(11): 3548-3555. DOI: 10.11772/j.issn.1001-9081.2023101505

Asbtract ( )

HTML ( )

PDF (2170KB) ( )

Figures and Tables | References | Related Articles | Metrics

To address the limitations of the existing deep learning methods in handling scenarios where both geometric boundaries and initial conditions vary in physical simulation problems， a technical approach was proposed to decouple the representation of geometric boundary constraints from the physical system simulation， and a two-step technical route of geometric feature representation learning and physical system simulation was designed. After constructing an independent geometric feature extraction module which was unaffected by external physical conditions， the extracted geometric features were fused with physical features， and finally a neural network-based physical system simulation model was designed. In stress field prediction experiments， the proposed method achieves a prediction time of only 2.63 ms， which is much lower than 0.6 s of Finite Element Method （FEM）， and has a Mean Absolute Error （MAE） only 0.389 times of that of MeshNet. Experimental results demonstrate that the proposed method maintains high simulation accuracy while effectively adapting to different geometric boundaries and initial conditions.

Multi-view stereo method based on quadtree prior assistance

Lihua HU, Xiaoping LI, Jianhua HU, Sulan ZHANG

2024, 44(11): 3556-3564. DOI: 10.11772/j.issn.1001-9081.2023111661

Asbtract ( )

HTML ( )

PDF (6054KB) ( )

Figures and Tables | References | Related Articles | Metrics

PatchMatch-based Multi-View Stereo （MVS） method can estimate the depth of a scene based on multiple input images and is currently applied in large-scale 3D scene reconstruction. However， the existing methods have lower accuracy and completeness in depth estimation in low-texture regions due to unstable feature matching and unreliable reliance on photometric consistency alone. To address the above problems， an MVS method based on quadtree prior assistance was proposed. Firstly， the image pixel values were used to obtain local textures. Secondly， a coarse depth map was obtained by Adaptive Checkerboard sampling and Multi-Hypothesis joint view selection （ACMH）， which combined the structural information in the low-texture region to generate a priori plane hypothesis by using quadtree segmentation. Thirdly， by integrating the above information， a new multi-view matching cost function was designed to guide the low-texture regions for obtaining the best depth assumption， thereby improving the accuracy of stereo matching. Finally， comparison experiments were conducted with many existing traditional MVS methods on ETH3D， Tanks and Temples， and Chinese Academy of Sciences' ancient architecture datasets. The results demonstrate that the proposed method performs better， especially in ETH3D test dataset with error threshold of 2 cm， its F1 score and completeness are improved by 1.29 and 2.38 percentage points， respectively， compared with the current state-of-the-art multi-scale geometric consistency guided and planar prior assisted multi-view stereo method （ACMMP）.

Automatic thresholding method guided by maximizing four-directional weighted Shannon entropy

Yaobin ZOU, Bin ZHANG

2024, 44(11): 3565-3573. DOI: 10.11772/j.issn.1001-9081.2023111639

Asbtract ( )

HTML ( )

PDF (1458KB) ( )

Figures and Tables | References | Related Articles | Metrics

The grayscale histogram of a grayscale image may have non-modal， unimodal， bimodal， or multi-modal morphological characteristics. However， most traditional entropy thresholding methods are only suitable for processing the grayscale images with unimodal or bimodal morphological characteristics. To improve the segmentation accuracy and adaptability of entropy thresholding methods， an automatic thresholding method guided by maximizing four-directional weighted Shannon entropy was proposed， namely FWSE（Four-directional Weighted Shannon Entropy）. Firstly， a series of Multi-scale Product Transformation （MPT） images were obtained by performing MPTs with the directional Prewitt convolution kernels in four directions. Secondly， the optimal MPT image in each direction was computed automatically based on the cubic spline interpolation function and the curvature maximization criterion. Thirdly， the pixels on each optimal MPT image were resampled by using inner and outer contour images to reconstruct the grayscale histogram， and the corresponding Shannon entropy was calculated based on the above. Finally， the optimal segmentation threshold was selected based on the criterion of maximizing weighted Shannon entropy in four directions. FWSE method was compared with three recent thresholding methods and two recent non-thresholding methods on 4 synthetic images and 100 real-world images. Experimental results show that： on the synthesis images， the average Matthews Correlation Coefficient （MCC） of the FWSE method reaches 0.999； on the real-world images， the average MCCs of the FWSE method and the other five segmentation methods are 0.974， 0.927， 0.668， 0.595， 0.550， and 0.525 respectively. It can be seen that the FWSE method has higher segmentation accuracy and more flexible segmentation adaptability.

High-fidelity image editing based on fine-tuning of diffusion model

Yusheng LIU, Xuezhong XIAO

2024, 44(11): 3574-3580. DOI: 10.11772/j.issn.1001-9081.2023111570

Asbtract ( )

HTML ( )

PDF (2101KB) ( )

Figures and Tables | References | Related Articles | Metrics

Addressing the issues such as single task， user-unfriendliness， and low-fidelity in current mainstream image editing methods， a diffusion model-based method for high-fidelity image editing was proposed. In the method， with the mainstream stable diffusion model as the backbone network， initially， the model was fine-tuned using Low Rank Adaptation （LoRA） method， so that the model could better reconstruct the original images. Subsequently， the refined model was employed to infer images with simple prompts through a designed framework， ultimately generating edited images. Furthermore， a dual-layer U-Net structure was proposed extensively based on the aforementioned method for specific image editing tasks and video synthesis. Comparative experiments with leading methods Imagic， DiffEdit， and InstructPix2Pix on Tedbench dataset demonstrate that the proposed method can perform various editing tasks to images， including non-rigid editing， with strong editability， and it also has a 30.38% decrease in Learned Perceptual Image Patch Similarity （LPIPS） index compared to Imagic， indicating that the proposed method has a higher fidelity.

Remote sensing image segmentation network based on fuzzy multiscale features

Ziyi LI, Tingting QU, Qianpeng CHONG, Jindong XU

2024, 44(11): 3581-3586. DOI: 10.11772/j.issn.1001-9081.2023101540

Asbtract ( )

HTML ( )

PDF (1521KB) ( )

Figures and Tables | References | Related Articles | Metrics

Affected by imaging distance， illumination， surface features， environment and other factors， objects of the same category in remote sensing images may have certain differences， while objects of different categories instead show similar visual features， which leads to uncertainty in segmentation， that is intra-class heterogeneity and inter-class ambiguity. To solve these problems， a Fuzzy Multiscale Convolutional Neural Network （FMCNet） was proposed for remote sensing image segmentation. By extracting receptive fields of different scales， sizes and aspect ratios， the detailed information in remote sensing objects was fully represented， and fuzzy logic was used to effectively express the relationship between pixels and their adjacent pixels， thus overcoming the uncertainty problem in remote sensing image segmentation. Experimental results show that the Overall Accuracy （OA） of FMCNet on ISPR Vaihingen and Potsdam datasets is 85.3% and 86.3% respectively， outperforming the existing state-of-the-art semantic segmentation methods.

Pixel-level unsupervised industrial anomaly detection based on multi-scale memory bank

Yongjiang LIU, Bin CHEN

2024, 44(11): 3587-3594. DOI: 10.11772/j.issn.1001-9081.2023111690

Asbtract ( )

HTML ( )

PDF (1125KB) ( )

Figures and Tables | References | Related Articles | Metrics

Unsupervised anomaly detection methods based on feature embedding often use patch-level features to localize anomalies. Patch-level features are competitive in image-level anomaly detection tasks， but suffer from insufficient accuracy in pixel-level localization. To address this issue， MemAD， a pixel-level anomaly detection method composed of a multi-scale memory bank and a segmentation network， was proposed. Firstly， a pre-trained feature extraction network was used to extract features from normal samples in the training set， thereby constructing a normal sample feature memory bank at three scales. Then， during the training of the segmentation network， difference features between simulated pseudo-anomaly sample features and the nearest normal sample features in the memory bank were calculated， thereby further guiding the segmentation network to learn how to locate anomalous pixels. Experimental results show that MemAD achieves image-level and pixel-level AUC （Area Under the Receiver Operating Characteristic curve） of 0.980 and 0.974 respectively on MVTec AD （MVTec Anomaly Detection） dataset， outperforming most existing methods and confirming the accuracy of the proposed method in pixel-level anomaly localization.

Small target detection method in UAV images based on fusion of dilated convolution and Transformer

Lin WANG, Jingliang LIU, Wuwei WANG

2024, 44(11): 3595-3602. DOI: 10.11772/j.issn.1001-9081.2023111575

Asbtract ( )

HTML ( )

PDF (1433KB) ( )

Figures and Tables | References | Related Articles | Metrics

A multi-scale dilated convolution based Unmanned Aerial Vehicle （UAV） image target detection algorithm Swin-Det was proposed to address the issues of complex target scenes， diverse scales of targets， dense small targets and severe occlusion of targets in UAV aerial images. Firstly， Swin Transformer was used as the backbone feature extraction network， and a Spatial Information Blending Module （SIBM） was introduced into the backbone network to solve the problem of fuzziness in target information due to occlusion between objects. Secondly， a Fusion of Dilation Feature Pyramid Network （FDFPN） was proposed to fuse feature information through multi-branch dilated convolution， thereby effectively improving the receptive field of the network and the reuse of feature information， so that the model was able to learn detailed features of different dimensions. Finally， the issues of mismatches in the prediction area and sample imbalance were addressed by using linear interpolation method and multi-task loss function， thereby improving the detection precision of the model. Experimental results on VisDrone dataset show that the Swin-Det algorithm reaches a mean Average Precision （mAP） of 27.2%， which is 4.1 percentage points higher than that of the original Swin Transformer， and converges faster under the same training batch. It can be seen tha the Swin-Det algorithm can achieve high-precision detection of UAV image targets in complex scenes.

Small object detection algorithm from drone perspective based on improved YOLOv8n

Tao LIU, Shihong JU, Yimeng GAO

2024, 44(11): 3603-3609. DOI: 10.11772/j.issn.1001-9081.2023111644

Asbtract ( )

HTML ( )

PDF (1561KB) ( )

Figures and Tables | References | Related Articles | Metrics

In view of the low accuracy of object detection algorithms in small object detection from drone perspective， a new small object detection algorithm named SFM-YOLOv8 was proposed by improving the backbone network and attention mechanism of YOLOv8. Firstly， the SPace-to-Depth Convolution （SPDConv） suitable for low-resolution images and small object detection was integrated into the backbone network to retain discriminative feature information and improve the perception ability to small objects. Secondly， a multi-branch attention named MCA （Multiple Coordinate Attention） was introduced to enhance the spatial and channel information on the feature layer. Then， a convolution FE-C2f fusing FasterNet and Efficient Multi-scale Attention （EMA） was constructed to reduce the computational cost and lightweight the model. Besides， a Minimum Point Distance based Intersection over Union （MPDIoU） loss function was introduced to improve the accuracy of the algorithm. Finally， a small object detection layer was added to the network structure of YOLOv8n to retain more location information and detailed features of small objects. Experimental results show that compared with YOLOv8n， SFM-YOLOv8 achieves a 4.37 percentage point increase in mAP₅₀ （mean Average Precision） with a 5.98% reduction in parameters on VisDrone-DET2019 dataset. Compared to the related mainstream models， SFM-YOLOv8 achieves higher accuracy and meets real-time detection requirements.

Underwater target detection algorithm based on improved YOLOv8

Dahai LI, Bingtao LI, Zhendong WANG

2024, 44(11): 3610-3616. DOI: 10.11772/j.issn.1001-9081.2023111550

Asbtract ( )

HTML ( )

PDF (1637KB) ( )

Figures and Tables | References | Related Articles | Metrics

Due to the unique characteristics of underwater creatures， underwater images usually exit many small targets being hard to detect and often overlapping with each other. In addition， light absorption and scattering in underwater environment can cause underwater images' color offset and blur. To overcome those challenges， an underwater target detection algorithm， namely WCA-YOLOv8， was proposed. Firstly， the Feature Fusion Module （FFM） was designed to improve the focus on spatial dimension in order to improve the recognition ability for targets with color offset and blur. Secondly， the FReLU Coordinate Attention （FCA） module was added to enhance the feature extraction ability for overlapped and occluded underwater targets. Thirdly， Complete Intersection over Union （CIoU） loss function was replaced by Wise-IoU version 3 （WIoU v3） loss function to strengthen the detection performance for small size targets. Finally， the Downsampling Enhancement Module （DEM） was designed to preserve context information during feature extraction more completely. Experimental results show that WCA-YOLOv8 achieves 75.8% and 88.6% mean Average Precision （mAP_0.5） and 60 frame/s and 57 frame/s detection speeds on RUOD and URPC datasets， respectively. Compared with other state-of-the-art underwater target detection algorithms， WCA-YOLOv8 can achieve higher detection accuracy with faster detection speed.

Polyp segmentation algorithm based on context-aware network

Cong GU, Qiqiang DUAN, Siyu REN

2024, 44(11): 3617-3622. DOI: 10.11772/j.issn.1001-9081.2023111650

Asbtract ( )

HTML ( )

PDF (838KB) ( )

Figures and Tables | References | Related Articles | Metrics

Deep learning-based methods for polyp image segmentation face the following problems： images captured by different medical devices differ in feature distribution， resulting in domain bias between different polyp segmentation datasets； most existing models focus on processing features of the same scale size， and there are some limitations in their abilities to capture polyps of different scales； the visual features and color differences between a polyp and the surrounding tissue are usually small， making it difficult for the model to accurately distinguish the polyp from the background. To solve these problems， a Context-Aware Network （CANet） with Pyramid Vision Transformer （PVT） as the main part was proposed， which mainly contains the following modules： 1） Domain Adaptive Denoising Module （DADM）， which employs channel attention and spatial attention to the low-level feature maps to solve the problem of domain bias and noise between images of different domains； 2） Scale Recalibration Module （SRM）， which processes multi-scale features extracted by the encoder to solve the problem of the obvious changes in the size and shape of polyps； 3） Iterative Semantic Embedding Module （ISEM）， which reduces background interference， improves perception of the target boundary， and enhances the accuracy of polyp segmentation. Experimental results on five publicly available colon polyp datasets show that CANet achieves better results than current widely used colon polyp segmentation methods， with mDice of 92.6% and 94.0% on Kvasir-SEG and CVC?ClinicDB datasets， respectively.

Distributed UAV cluster pursuit decision-making based on trajectory prediction and MADDPG

Yu WANG, Zhihui GUAN, Yuanpeng LI

2024, 44(11): 3623-3628. DOI: 10.11772/j.issn.1001-9081.2023101538

Asbtract ( )

HTML ( )

PDF (918KB) ( )

Figures and Tables | References | Related Articles | Metrics

A Trajectory Prediction based Distributed Multi-Agent Deep Deterministic Policy Gradient （TP-DMADDPG） algorithm was proposed to address the problems of insufficient flexibility and poor generalization ability of Unmanned Aerial Vehicle （UAV） cluster pursuit decision-making algorithms in complex mission environments. Firstly， to enhance the realism of the pursuit mission， an intelligent escape strategy was designed for the target. Secondly， considering the conditions such as missing information of target due to communication interruption and other reasons， a Long Short-Term Memory （LSTM） network was used to predict the position information of target in real time， and the state space of the decision-making model was constructed on the basis of the prediction information. Finally， TP-DMADDPG was designed based on the distributed framework and Multi-Agent Deep Deterministic Policy Gradient （MADDPG） algorithm， which enhanced the flexibility and generalization ability of pursuit decision-making in the process of complex air combat. Simulation results show that compared with Deep Deterministic Policy Gradient （DDPG）， Twin Delayed Deep Deterministic policy gradient （TD3） and MADDPG algorithms， the TP-DMADDPG algorithm increases the success rate of collaborative decision-making by more than 15 percentage points， and can solve the problem of pursuing intelligent escaping target with incomplete information.

Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm

Jiachen YU, Ye YANG

2024, 44(11): 3629-3638. DOI: 10.11772/j.issn.1001-9081.2023111712

Asbtract ( )

HTML ( )

PDF (3923KB) ( )

PDF（mobile） (1849KB) ( 18 )

Figures and Tables | References | Related Articles | Metrics

In order to cope with the problem of poor algorithm stability and learning rate of traditional Deep Reinforcement Learning （DRL） algorithms in processing complex scenes， especially in irregular object grasping and soft robotic arm applications， a soft robotic arm control strategy based on Clipped Proximal Policy Optimization （CPPO） algorithm was proposed. By introducing a clipping function， the performance of Proximal Policy Optimization （PPO） algorithm was optimized， which improved the stability and learning efficiency in high-dimensional state space. Firstly， the state space and action space of soft robotic arm were defined， and a soft robotic arm model imitating the tentacles of octopus was designed. Secondly， Matlab's toolbox SoRoSim （Soft Robot Simulation） was used for modeling， and an environmental reward function that combines continuous and sparse functions was defined. Finally， a simulation platform based on Matlab was constructed， the irregular object images were preprocessed through Python scripts and filters， and the Redis cache was used to efficiently transmit the processed contour data to the simulation platform. Comparative experimental results with TRPO （Trust Region Policy Optimization） and SAC （Soft Actor-Critic） algorithms show that CPPO algorithm achieves a success rate of 86.3% in the task of grasping irregular objects by soft robotic arm， which is higher than that of TRPO algorithm by 3.6 percentage points. It indicates that CPPO algorithm can be applied to control of soft robotic arms and can provide an important reference for the application of soft robotic arms in complex grasping tasks in unstructured environments.

Lightweight fall detection algorithm framework based on RPEpose and XJ-GCN

Ruiyan LIANG, Hui YANG

2024, 44(11): 3639-3646. DOI: 10.11772/j.issn.1001-9081.2023101379

Asbtract ( )

HTML ( )

PDF (1283KB) ( )

Figures and Tables | References | Related Articles | Metrics

The traditional joint keypoint detection model based on the Vision Transformer （ViT） model usually adopts 2D Sine Position Embedding， which is prone to losing key two-dimensional shape information in the image， leading to a decrease in accuracy. For behavior classification models， the traditional Spatio-Temporal Graph Convolutional Network （ST?GCN） suffers from the lack of correlation between non-physically connected joint connections in uni-labeling partitioning strategy. To address the above problems， a lightweight real-time fall detection algorithm framework was designed to detect fall behavior quickly and accurately. The framework contains a joint keypoint detection model RPEpose （Relative Position Encoding pose estimation） and a behavior classification model XJ-GCN （Cross-Joint attention Graph Convolutional Network）. On the one hand， a type of relative position encoding was adopted by the RPEpose model to overcome the position insensitivity defect of the original position encoding and improve the performance of the ViT architecture in joint keypoint detection. On the other hand， an X-Joint （Cross-Joint） attention mechanism was proposed， after reconstructing the partitioning strategy into the XJL （X-Joint Labeling） partitioning strategy， the dependencies between all joint connections were modelled to obtain the potential correlation between joint connections with excellent classification performance and few parameters. Experimental results indicate that， on the COCO 2017 validation set， RPEpose model only requires 8.2 GFLOPs （Giga FLOating Point of operations） of computational overhead while achieving a testing Average Precision （AP） of 74.3% for images with a resolution of 256×192； on the NTU RGB+D dataset， the Top-1 accuracy using Cross Subject （X?Sub） as the partitioning standard is 89.6%， and the proposed framework RPEpose+XJ-GCN has a prediction accuracy of 87.2% at a processing speed of 30 frame/s， verifying its high real-time and accuracy.

Table of Content