Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Review of multi-modal medical image segmentation based on deep learning
    Meng DOU, Zhebin CHEN, Xin WANG, Jitao ZHOU, Yu YAO
    Journal of Computer Applications    2023, 43 (11): 3385-3395.   DOI: 10.11772/j.issn.1001-9081.2022101636
    Abstract1831)   HTML90)    PDF (3904KB)(1909)       Save

    Multi-modal medical images can provide clinicians with rich information of target areas (such as tumors, organs or tissues). However, effective fusion and segmentation of multi-modal images is still a challenging problem due to the independence and complementarity of multi-modal images. Traditional image fusion methods have difficulty in addressing this problem, leading to widespread research on deep learning-based multi-modal medical image segmentation algorithms. The multi-modal medical image segmentation task based on deep learning was reviewed in terms of principles, techniques, problems, and prospects. Firstly, the general theory of deep learning and multi-modal medical image segmentation was introduced, including the basic principles and development processes of deep learning and Convolutional Neural Network (CNN), as well as the importance of the multi-modal medical image segmentation task. Secondly, the key concepts of multi-modal medical image segmentation was described, including data dimension, preprocessing, data enhancement, loss function, and post-processing, etc. Thirdly, different multi-modal segmentation networks based on different fusion strategies were summarized and analyzed. Finally, several common problems in medical image segmentation were discussed, the summary and prospects for future research were given.

    Table and Figures | Reference | Related Articles | Metrics
    Embedded road crack detection algorithm based on improved YOLOv8
    Huantong GENG, Zhenyu LIU, Jun JIANG, Zichen FAN, Jiaxing LI
    Journal of Computer Applications    2024, 44 (5): 1613-1618.   DOI: 10.11772/j.issn.1001-9081.2023050635
    Abstract1641)   HTML48)    PDF (2002KB)(1640)       Save

    Deploying the YOLOv8L model on edge devices for road crack detection can achieve high accuracy, but it is difficult to guarantee real-time detection. To solve this problem, a target detection algorithm based on the improved YOLOv8 model that can be deployed on the edge computing device Jetson AGX Xavier was proposed. First, the Faster Block structure was designed using partial convolution to replace the Bottleneck structure in the YOLOv8 C2f module, and the improved C2f module was recorded as C2f-Faster; second, an SE (Squeeze-and-Excitation) channel attention layer was connected after each C2f-Faster module in the YOLOv8 backbone network to further improve the detection accuracy. Experimental results on the open source road damage dataset RDD20 (Road Damage Detection 20) show that the average F1 score of the proposed method is 0.573, the number of detection Frames Per Second (FPS) is 47, and the model size is 55.5 MB. Compared with the SOTA (State-Of-The-Art) model of GRDDC2020 (Global Road Damage Detection Challenge 2020), the F1 score is increased by 0.8 percentage points, the FPS is increased by 291.7%, and the model size is reduced by 41.8%, which realizes the real-time and accurate detection of road cracks on edge devices.

    Table and Figures | Reference | Related Articles | Metrics
    Multimodal knowledge graph representation learning: a review
    Chunlei WANG, Xiao WANG, Kai LIU
    Journal of Computer Applications    2024, 44 (1): 1-15.   DOI: 10.11772/j.issn.1001-9081.2023050583
    Abstract1185)   HTML113)    PDF (3449KB)(1473)       Save

    By comprehensively comparing the models of traditional knowledge graph representation learning, including the advantages and disadvantages and the applicable tasks, the analysis shows that the traditional single-modal knowledge graph cannot represent knowledge well. Therefore, how to use multimodal data such as text, image, video, and audio for knowledge graph representation learning has become an important research direction. At the same time, the commonly used multimodal knowledge graph datasets were analyzed in detail to provide data support for relevant researchers. On this basis, the knowledge graph representation learning models under multimodal fusion of text, image, video, and audio were further discussed, and various models were summarized and compared. Finally, the effect of multimodal knowledge graph representation on enhancing classical applications, including knowledge graph completion, question answering system, multimodal generation and recommendation system in practical applications was summarized, and the future research work was prospected.

    Table and Figures | Reference | Related Articles | Metrics
    Technology application prospects and risk challenges of large language models
    Yuemei XU, Ling HU, Jiayi ZHAO, Wanze DU, Wenqing WANG
    Journal of Computer Applications    2024, 44 (6): 1655-1662.   DOI: 10.11772/j.issn.1001-9081.2023060885
    Abstract1170)   HTML85)    PDF (1142KB)(1054)       Save

    In view of the rapid development of Large Language Model (LLM) technology, a comprehensive analysis was conducted on its technical application prospects and risk challenges which has great reference value for the development and governance of Artificial General Intelligence (AGI). Firstly, with representative language models such as Multi-BERT (Multilingual Bidirectional Encoder Representations from Transformer), GPT (Generative Pre-trained Transformer) and ChatGPT (Chat Generative Pre-trained Transformer) as examples, the development process, key technologies and evaluation systems of LLM were reviewed. Then, a detailed analysis of LLM on technical limitations and security risks was conducted. Finally, suggestions were put forward for technical improvement and policy follow-up of the LLM. The analysis indicates that at a developing status, the current LLMs still produce non-truthful and biased output, lack real-time autonomous learning ability, require huge computing power, highly rely on data quality and quantity, and tend towards monotonous language style. They have security risks related to data privacy, information security, ethics, and other aspects. Their future developments can continue to improve technically, from “large-scale” to “lightweight”, from “single-modal” to “multi-modal”, from “general-purpose” to “vertical”; for real-time follow-up in policy, their applications and developments should be regulated by targeted regulatory measures.

    Table and Figures | Reference | Related Articles | Metrics
    Review of YOLO algorithm and its applications to object detection in autonomous driving scenes
    Yaping DENG, Yingjiang LI
    Journal of Computer Applications    2024, 44 (6): 1949-1958.   DOI: 10.11772/j.issn.1001-9081.2023060889
    Abstract886)   HTML30)    PDF (1175KB)(831)       Save

    Object detection in autonomous driving scenes is one of the important research directions in computer vision. The researches focus on ensuring real-time and accurate object detection of objects by autonomous vehicles. Recently, a rapid development in deep learning technology had been witnessed, and its wide application in the field of autonomous driving had prompted substantial progress in this field. An analysis was conducted on the research status of object detection by YOLO (You Only Look Once) algorithms in the field of autonomous driving from the following four aspects. Firstly, the ideas and improvement methods of the single-stage YOLO series of detection algorithms were summarized, and the advantages and disadvantages of the YOLO series of algorithms were analyzed. Secondly, the YOLO algorithm-based object detection applications in autonomous driving scenes were introduced, the research status and applications for the detection and recognition of traffic vehicles, pedestrians, and traffic signals were expounded and summarized respectively. Additionally, the commonly used evaluation indicators in object detection, as well as the object detection datasets and automatic driving scene datasets, were summarized. Lastly, the problems and future development directions of object detection were discussed.

    Table and Figures | Reference | Related Articles | Metrics
    Dynamic multi-domain adversarial learning method for cross-subject motor imagery EEG signals
    Xuan CAO, Tianjian LUO
    Journal of Computer Applications    2024, 44 (2): 645-653.   DOI: 10.11772/j.issn.1001-9081.2023030286
    Abstract798)   HTML13)    PDF (3364KB)(228)       Save

    Decoding motor imagery EEG (ElectroEncephaloGraphy) signal is one of the crucial techniques for building Brain Computer Interface (BCI) system. Due to EEG signal’s high cost of acquisition, large inter-subject discrepancy, and characteristics of strong time variability and low signal-to-noise ratio, constructing cross-subject pattern recognition methods become the key problem of such study. To solve the existing problem, a cross-subject dynamic multi-domain adversarial learning method was proposed. Firstly, the covariance matrix alignment method was used to align the given EEG samples. Then, a global discriminator was adapted for marginal distribution of different domains, and multiple class-wise local discriminators were adapted to conditional distribution for each class. The self-adaptive adversarial factor for multi-domain discriminator was automatically learned during training iterations. Based on dynamic multi-domain adversarial learning strategy, the Dynamic Multi-Domain Adversarial Network (DMDAN) model could learn deep features with generalization ability between cross-subject domains. Experimental results on public BCI Competition IV 2A and 2B datasets show that, DMDAN model improves the ability of learning domain-invariant features, achieving 1.80 and 2.52 percentage points higher average classification accuracy on dataset 2A and dataset 2B compared with the existing adversarial learning method Deep Representation Domain Adaptation (DRDA). It can be seen that DMDAN model improves the decoding performance of cross-subject motor imagery EEG signals, and has generalization ability on different datasets.

    Table and Figures | Reference | Related Articles | Metrics
    Gradient descent with momentum algorithm based on differential privacy in convolutional neural network
    Yu ZHANG, Ying CAI, Jianyang CUI, Meng ZHANG, Yanfang FAN
    Journal of Computer Applications    2023, 43 (12): 3647-3653.   DOI: 10.11772/j.issn.1001-9081.2022121881
    Abstract722)   HTML128)    PDF (1985KB)(707)       Save

    To address the privacy leakage problem caused by the model parameters memorizing some features of the data during the training process of the Convolutional Neural Network (CNN) models, a Gradient Descent with Momentum algorithm based on Differential Privacy in CNN (DPGDM) was proposed. Firstly, the Gaussian noise meeting differential privacy was added to the gradient in the backpropagation process of model optimization, and the noise-added gradient value was used to participate in the model parameter update process, so as to achieve differential privacy protection for the overall model. Secondly, to reduce the impact of the introduction of differential privacy noise on convergence speed of the model, a learning rate decay strategy was designed and then the gradient descent with momentum algorithm was improved. Finally, to reduce the influence of noise on the accuracy of the model, the value of the noise scale was adjusted dynamically during model optimization, thereby changing the amount of noise that needs to be added to the gradient in each round of iteration. Experimental results show that compared with DP-SGD (Differentially Private Stochastic Gradient Descent) algorithm, the proposed algorithm can improve the accuracy of the model by about 5 and 4 percentage points at privacy budget of 0.3 and 0.5, respectively, proving that by using the proposed algorithm, the model usability is improved and privacy protection of the model is achieved.

    Table and Figures | Reference | Related Articles | Metrics
    Poisoning attack detection scheme based on generative adversarial network for federated learning
    Qian CHEN, Zheng CHAI, Zilong WANG, Jiawei CHEN
    Journal of Computer Applications    2023, 43 (12): 3790-3798.   DOI: 10.11772/j.issn.1001-9081.2022121831
    Abstract705)   HTML42)    PDF (2367KB)(515)       Save

    Federated Learning (FL) emerges as a novel privacy-preserving Machine Learning (ML) paradigm. However, the distributed training structure of FL is more vulnerable to poisoning attack, where adversaries contaminate the global model through uploading poisoning models, resulting in the convergence deceleration and the prediction accuracy degradation of the global model. To solve the above problem, a poisoning attack detection scheme based on Generative Adversarial Network (GAN) was proposed. Firstly, the benign local models were fed into the GAN to output testing samples. Then, the testing samples were used to detect the local models uploaded by the clients. Finally, the poisoning models were eliminated according to the testing metrics. Meanwhile, two test metrics named F1 score loss and accuracy loss were defined to detect the poisoning models and extend the detection scope from one single type of poisoning attacks to all types of poisoning attacks. Besides, a threshold determination method was designed to deal with misjudgment, so that the robust of misjudgment was confirmed. Experimental results on MNIST and Fashion-MNIST datasets show that the proposed scheme can generate high-quality testing samples, and then detect and eliminate poisoning models. Compared with the global models trained with the detection scheme based on directly gathering test data from clients and the detection scheme based on generating test data and using test accuracy as the test metric, the global model trained with the proposed scheme has significant accuracy improvement from 2.7 to 12.2 percentage points.

    Table and Figures | Reference | Related Articles | Metrics
    UAV cluster cooperative combat decision-making method based on deep reinforcement learning
    Lin ZHAO, Ke LYU, Jing GUO, Chen HONG, Xiancai XIANG, Jian XUE, Yong WANG
    Journal of Computer Applications    2023, 43 (11): 3641-3646.   DOI: 10.11772/j.issn.1001-9081.2022101511
    Abstract698)   HTML22)    PDF (2944KB)(537)       Save

    When the Unmanned Aerial Vehicle (UAV) cluster attacks ground targets, it will be divided into two formations: a strike UAV cluster that attacks the targets and a auxiliary UAV cluster that pins down the enemy. When auxiliary UAVs choose the action strategy of aggressive attack or saving strength, the mission scenario is similar to a public goods game where the benefits to the cooperator are less than those to the betrayer. Based on this, a decision method for cooperative combat of UAV clusters based on deep reinforcement learning was proposed. First, by building a public goods game based UAV cluster combat model, the interest conflict problem between individual and group in cooperation of intelligent UAV clusters was simulated. Then, Muti-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm was used to solve the most reasonable combat decision of the auxiliary UAV cluster to achieve cluster victory with minimum loss cost. Training and experiments were performed under conditions of different numbers of UAV. The results show that compared to the training effects of two algorithms — IDQN (Independent Deep Q-Network) and ID3QN (Imitative Dueling Double Deep Q-Network), the proposed algorithm has the best convergence, its winning rate can reach 100% with four auxiliary UAVs, and it also significantly outperforms the comparison algorithms with other UAV numbers.

    Table and Figures | Reference | Related Articles | Metrics
    Feature selection method for graph neural network based on network architecture design
    Dapeng XU, Xinmin HOU
    Journal of Computer Applications    2024, 44 (3): 663-670.   DOI: 10.11772/j.issn.1001-9081.2023030353
    Abstract687)   HTML120)    PDF (1001KB)(848)       Save

    In recent years, researchers have proposed many improved model architecture designs for Graph Neural Network (GNN), driving performance improvements in various prediction tasks. But most GNN variants start with the assumption that node features are equally important, which is not the case. To solve this problem, a feature selection method was proposed to improve the existing model and select important feature subsets for the dataset. The proposed method consists of two components, a feature selection layer, and a separate label-feature mapping. Softmax normalizer and feature “soft selector” were used for feature selection in the feature selection layer, and the model structure was designed under the idea of separate label-feature mapping to select the corresponding subsets of related features for different labels, and multiple related feature subsets were performed union operation to obtain an important feature subset of the final dataset. Graph ATtention network (GAT) and GATv2 models were selected as the benchmark models, and the algorithm was applied to the benchmark models to obtain new models. Experimental results show that when the proposed models perform node classification tasks on six datasets, their accuracies are improved by 0.83% - 8.79% compared with the baseline models. The new models also select the corresponding important feature subsets for the six datasets, in which the number of features accounts for 3.94% - 12.86% of the total number of features in their respective datasets. After using the important feature subset as the new input of the benchmark model, the accuracy more than 95% (using all features) is still achieved. That is, the scale of the model is reduced while ensuring the accuracy. It can be seen that the proposed new algorithm can improve the accuracy of node classification, and can effectively select the corresponding important feature subset for the dataset.

    Table and Figures | Reference | Related Articles | Metrics
    Lightweight image super-resolution reconstruction network based on Transformer-CNN
    Hao CHEN, Zhenping XIA, Cheng CHENG, Xing LIN-LI, Bowen ZHANG
    Journal of Computer Applications    2024, 44 (1): 292-299.   DOI: 10.11772/j.issn.1001-9081.2023010048
    Abstract597)   HTML24)    PDF (1855KB)(362)       Save

    Aiming at the high computational complexity and large memory consumption of the existing super-resolution reconstruction networks, a lightweight image super-resolution reconstruction network based on Transformer-CNN was proposed, which made the super-resolution reconstruction network more suitable to be applied on embedded terminals such as mobile platforms. Firstly, a hybrid block based on Transformer-CNN was proposed, which enhanced the ability of the network to capture local-global depth features. Then, a modified inverted residual block, with special attention to the characteristics of the high-frequency region, was designed, so that the improvement of feature extraction ability and reduction of inference time were realized. Finally, after exploring the best options for activation function, the GELU (Gaussian Error Linear Unit) activation function was adopted to further improve the network performance. Experimental results show that the proposed network can achieve a good balance between image super-resolution performance and network complexity, and reaches inference speed of 91 frame/s on the benchmark dataset Urban100 with scale factor of 4, which is 11 times faster than the excellent network called SwinIR (Image Restoration using Swin transformer), indicates that the proposed network can efficiently reconstruct the textures and details of the image and reduce a significant amount of inference time.

    Table and Figures | Reference | Related Articles | Metrics
    Survey of incomplete multi-view clustering
    Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN
    Journal of Computer Applications    2024, 44 (6): 1673-1682.   DOI: 10.11772/j.issn.1001-9081.2023060813
    Abstract597)   HTML17)    PDF (2050KB)(414)       Save

    Multi-view clustering has recently been a hot topic in graph data mining. However, due to the limitations of data collection technology or human factors, multi-view data often has the problem of missing views or samples. Reducing the impact of incomplete views on clustering performance is a major challenge currently faced by multi-view clustering. In order to better understand the development of Incomplete Multi-view Clustering (IMC) in recent years, a comprehensive review is of great theoretical significance and practical value. Firstly, the missing types of incomplete multi-view data were summarized and analyzed. Secondly, four types of IMC methods, based on Multiple Kernel Learning (MKL), Matrix Factorization (MF) learning, deep learning, and graph learning were compared, and the technical characteristics and differences among the methods were analyzed. Thirdly, from the perspectives of dataset types, the numbers of views and categories, and application fields, twenty-two public incomplete multi-view datasets were summarized. Then, the evaluation metrics were outlined, and the performance of existing incomplete multi-view clustering methods on homogeneous and heterogeneous datasets were evaluated. Finally, the existing problems, future research directions, and existing application fields of incomplete multi-view clustering were discussed.

    Table and Figures | Reference | Related Articles | Metrics
    Differential privacy clustering algorithm in horizontal federated learning
    Xueran XU, Geng YANG, Yuxian HUANG
    Journal of Computer Applications    2024, 44 (1): 217-222.   DOI: 10.11772/j.issn.1001-9081.2023010019
    Abstract531)   HTML15)    PDF (1418KB)(281)       Save

    Clustering analysis can uncover hidden interconnections between data and segment the data according to multiple indicators, which can facilitate personalized and refined operations. However, data fragmentation and isolation caused by data islands seriously affects the effectiveness of cluster analysis applications. To solve data island problem and protect data privacy, an Equivalent Local differential privacy Federated K-means (ELFedKmeans) algorithm was proposed. A grid-based initial cluster center selection method and a privacy budget allocation scheme were designed for the horizontal federation learning model. To generate same random noise with lower communication cost, all organizations jointly negotiated random seeds, protecting local data privacy. The ELFedKmeans algorithm was demonstrated satisfying differential privacy protection through theoretical analysis, and it was also compared with Local Differential Privacy distributed K-means (LDPKmeans) algorithm and Hybrid Privacy K-means (HPKmeans) algorithm on different datasets. Experimental results show that all three algorithms increase F-measure and decrease SSE (Sum of Squares due to Error) gradually as privacy budget increases. As a whole, the F-measure values of ELFedKmeans algorithm was 1.794 5% to 57.066 3% and 21.245 2% to 132.048 8% higher than those of LDPKmeans and HPKmeans algorithms respectively; the Log(SSE) values of ELFedKmeans algorithm were 1.204 2% to 12.894 6% and 5.617 5% to 27.575 2% less than those of LDPKmeans and HPKmeans algorithms respectively. With the same privacy budget, ELFedKmeans algorithm outperforms the comparison algorithms in terms of clustering quality and utility metric.

    Table and Figures | Reference | Related Articles | Metrics
    Survey of visual object tracking methods based on Transformer
    Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN
    Journal of Computer Applications    2024, 44 (5): 1644-1654.   DOI: 10.11772/j.issn.1001-9081.2023060796
    Abstract517)   HTML20)    PDF (1615KB)(506)       Save

    Visual object tracking is one of the important tasks in computer vision, in order to achieve high-performance object tracking, a large number of object tracking methods have been proposed in recent years. Among them, Transformer-based object tracking methods become a hot topic in the field of visual object tracking due to their ability to perform global modeling and capture contextual information. Firstly, existing Transformer-based visual object tracking methods were classified based on their network structures, an overview of the underlying principles and key techniques for model improvement were expounded, and the advantages and disadvantages of different network structures were also summarized. Then, the experimental results of the Transformer-based visual object tracking methods on public datasets were compared to analyze the impact of network structure on performance. in which MixViT-L (ConvMAE) achieved tracking success rates of 73.3% and 86.1% on LaSOT and TrackingNet, respectively, proving that the object tracking methods based on pure Transformer two-stage architecture have better performance and broader development prospects. Finally, the limitations of these methods, such as complex network structure, large number of parameters, high training requirements, and difficulty in deploying on edge devices, were summarized, and the future research focus was outlooked, by combining model compression, self-supervised learning, and Transformer interpretability analysis, more kinds of feasible solutions for Transformer-based visual target tracking could be presented.

    Table and Figures | Reference | Related Articles | Metrics
    Zero-shot relation extraction model via multi-template fusion in Prompt
    Liang XU, Chun ZHANG, Ning ZHANG, Xuetao TIAN
    Journal of Computer Applications    2023, 43 (12): 3668-3675.   DOI: 10.11772/j.issn.1001-9081.2022121869
    Abstract477)   HTML39)    PDF (1768KB)(286)       Save

    Prompt paradigm is widely used to zero-shot Natural Language Processing (NLP) tasks. However, the existing zero-shot Relation Extraction (RE) model based on Prompt paradigm suffers from the difficulty of constructing answer space mappings and dependence on manual template selection, which leads to suboptimal performance. To address these issues, a zero-shot RE model via multi-template fusion in Prompt was proposed. Firstly, the zero-shot RE task was defined as the Masked Language Model (MLM) task, where the construction of answer space mapping was abandoned. Instead, the words output by the template were compared with the relation description text in the word embedding space to determine the relation class. Then, the part of speech of the relation description text was introduced as a feature, and the weight between this feature and each template was learned. Finally, this weight was utilized to fuse the results output by multiple templates, thereby reducing the performance loss caused by the manual selection of Prompt templates. Experimental results on FewRel (Few-shot Relation extraction dataset) and TACRED (Text Analysis Conference Relation Extraction Dataset) show that, the proposed model significantly outperforms the current state-of-the-art model, RelationPrompt, in terms of F1 score under different data resource settings, with an increase of 1.48 to 19.84 percentage points and 15.27 to 15.75 percentage points, respectively. These results convincingly demonstrate the effectiveness of the proposed model for zero-shot RE tasks.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-robot task allocation algorithm combining genetic algorithm and rolling scheduling
    Fuqin DENG, Huanzhao HUANG, Chaoen TAN, Lanhui FU, Jianmin ZHANG, Tinlun LAM
    Journal of Computer Applications    2023, 43 (12): 3833-3839.   DOI: 10.11772/j.issn.1001-9081.2022121916
    Abstract455)   HTML12)    PDF (2617KB)(261)       Save

    The purpose of research on Multi-Robot Task Allocation (MRTA) is to improve the task completion efficiency of robots in smart factories. Aiming at the deficiency of the existing algorithms in dealing with large-scale multi-constrained MRTA, an MRTA Algorithm Combining Genetic Algorithm and Rolling Scheduling (ACGARS) was proposed. Firstly, the coding method based on Directed Acyclic Graph (DAG) was adopted in genetic algorithm to efficiently deal with the priority constraints among tasks. Then, the prior knowledge was added to the initial population of genetic algorithm to improve the search efficiency of the algorithm. Finally, a rolling scheduling strategy based on task groups was designed to reduce the scale of the problem to be solved, thereby solving large-scale problems efficiently. Experimental results on large-scale problem instances show that compared with the schemes generated by Constructive Heuristic Algorithm (CHA), MinInterfere Algorithm (MIA), and Genetic Algorithm with Penalty Strategy (GAPS), the scheme generated by the proposed algorithm has the average order completion time shortened by 30.02%, 16.86% and 75.65% respectively when the number of task groups is 20, which verifies that the proposed algorithm can effectively shorten the average waiting time of orders and improve the efficiency of multi-robot task allocation.

    Table and Figures | Reference | Related Articles | Metrics
    Review of mean field theory for deep neural network
    Mengmei YAN, Dongping YANG
    Journal of Computer Applications    2024, 44 (2): 331-343.   DOI: 10.11772/j.issn.1001-9081.2023020166
    Abstract451)   HTML56)    PDF (1848KB)(367)       Save

    Mean Field Theory (MFT) provides profound insights to understand the operation mechanism of Deep Neural Network (DNN), which can theoretically guide the engineering design of deep learning. In recent years, more and more researchers have started to devote themselves into the theoretical study of DNN, and in particular, a series of works based on mean field theory have attracted a lot of attention. To this end, a review of researches related to mean field theory for deep neural networks was presented to introduce the latest theoretical findings in three basic aspects: initialization, training process, and generalization performance of deep neural networks. Specifically, the concepts, properties and applications of edge of chaos and dynamical isometry for initialization were introduced, the training properties of overparameter networks and their equivalence networks were analyzed, and the generalization performance of various network architectures were theoretically analyzed, reflecting that mean field theory is a very important basic theoretical approach to understand the mechanisms of deep neural networks. Finally, the main challenges and future research directions were summarized for the investigation of mean field theory in the initialization, training and generalization phases of DNN.

    Table and Figures | Reference | Related Articles | Metrics
    Current research status and challenges of blockchain in supply chain applications
    Lina GE, Jingya XU, Zhe WANG, Guifen ZHANG, Liang YAN, Zheng HU
    Journal of Computer Applications    2023, 43 (11): 3315-3326.   DOI: 10.11772/j.issn.1001-9081.2022111758
    Abstract448)      PDF (2371KB)(561)       Save

    The supply chain faces many challenges in the development process, including how to ensure the authenticity and reliability of information as well as the security of the traceability system in the process of product traceability, the security of products in the process of logistics, and the trust management in the financing process of small and medium enterprises. With characteristics of decentralization, immutability and traceability, blockchain provides efficient solutions to supply chain management, but there are some technical challenges in the actual implementation process. To study the applications of blockchain technology in the supply chain, some typical applications were discussed and analyzed. Firstly, the concept of supply chain and the current challenges were briefly introduced. Secondly, problems faced by blockchain in three different supply chain fields of information flow, logistics flow and capital flow were described, and a comparative analysis of related solutions was given. Finally, the technical challenges faced by blockchain in the practical applications of supply chain were summarized, and future applications were prospected.

    Reference | Related Articles | Metrics
    Short-term power load forecasting by graph convolutional network combining LSTM and self-attention mechanism
    Hanxiao SHI, Leichun WANG
    Journal of Computer Applications    2024, 44 (1): 311-317.   DOI: 10.11772/j.issn.1001-9081.2023010078
    Abstract440)   HTML14)    PDF (2173KB)(330)       Save

    Aiming at the problems of the existing power load forecasting models such as heavy modeling workload, insufficient spatiotemporal joint representation, and low forecasting accuracy, a Short-Term power Load Forecasting model based on Graph Convolutional Network (GCN) combining Long Short-Term Memory (LSTM) network and Self-attention mechanism (GCNLS-STLF) was proposed. Firstly, original multi-dimensional time series data was transformed into a power load graph containing the correlation between series by using LSTM and self-attention mechanism. Then, the features were extracted from the power load graph by GCN, LSTM and Graph Fourier Transform (GFT). Finally, a full connection layer was used to reconstruct features, and the residual was used to forecast the power load for multiple times to enhance the expression ability of the original power load data. The short-term power load forecasting experimental results on real historical power load data of power stations in Morocco and Panama showed that compared with Support Vector Machine (SVM), LSTM, mixed model CNN-LSTM and CNN-LSTM based on attention (CNN-LSTM-attention), the Mean Absolute Percentage Error (MAPE) of GCNLS-STLF was reduced by 1.94, 0.90, 0.49 and 0.37 percentage points, respectively, on the entire Morocco power load test set; the MAPE of GCNLS-STLF on the Panama power load test dataset decreased by 1.39, 0.94, 0.38 and 0.29 percentage points respectively in March and 1.40, 0.99, 0.35 and 0.28 percentage points respectively in June. Experimental results show that GCNLS-STLF can effectively extract key features of power load, and forecasting effects are satisfactory.

    Table and Figures | Reference | Related Articles | Metrics
    New dish recognition network based on lightweight YOLOv5
    Chenghanyu ZHANG, Yuzhe LIN, Chengke TAN, Junfan WANG, Yeting GU, Zhekang DONG, Mingyu GAO
    Journal of Computer Applications    2024, 44 (2): 638-644.   DOI: 10.11772/j.issn.1001-9081.2023030271
    Abstract438)   HTML22)    PDF (2914KB)(421)       Save

    In order to better meet the accuracy and timeliness requirements of Chinese food dish recognition, a new type of dish recognition network was designed. The original YOLOv5 model was pruned by combining Supermask method and structured channel pruning method, and lightweighted finally by Int8 quantization technology. This ensured that the proposed model could balance accuracy and speed in dish recognition, achieving a good trade-off while improving the model portability. Experimental results show that the proposed model achieves a mean Average Precision (mAP) of 99.00% and an average recognition speed of 59.54 ms /frame at an Intersection over Union (IoU) of 0.5, which is 20 ms/frame faster than that of the original YOLOv5 model while maintaining the same level of accuracy. In addition, the new dish recognition network was ported to the Renesas RZ/G2L board by Qt. Based on this, an intelligent service system was constructed to realize the whole process of ordering, generating orders, and automatic meal distribution. A theoretical and practical foundation was provided for the future construction and application of truly intelligent service systems in restaurants.

    Table and Figures | Reference | Related Articles | Metrics
    Deep spectral clustering algorithm with L1 regularization
    Wenbo LI, Bo LIU, Lingling TAO, Fen LUO, Hang ZHANG
    Journal of Computer Applications    2023, 43 (12): 3662-3667.   DOI: 10.11772/j.issn.1001-9081.2022121822
    Abstract433)   HTML44)    PDF (1465KB)(374)       Save

    Aiming at the problems that the deep spectral clustering models perform poorly in training stability and generalization capability, a Deep Spectral Clustering algorithm with L1 Regularization (DSCLR) was proposed. Firstly, L1 regularization was introduced into the objective function of deep spectral clustering to sparsify the eigen vectors of the Laplacian matrix generated by the deep neural network model. And the generalization capability of the model was enhanced. Secondly, the network structure of the spectral clustering algorithm based on deep neural network was improved by using the Parametric Rectified Linear Unit activation function (PReLU) to solve the problems of model training instability and underfitting. Experimental results on MNIST dataset show that the proposed algorithm improves Clustering Accuracy (CA), Normalized Mutual Information (NMI) index, and Adjusted Rand Index (ARI) by 11.85, 7.75, and 17.19 percentage points compared to the deep spectral clustering algorithm, respectively. Furthermore, the proposed algorithm also significantly improves the three evaluation metrics, CA, NMI and ARI, compared to algorithms such as Deep Embedded Clustering (DEC) and Deep Spectral Clustering using Dual Autoencoder Network (DSCDAN).

    Table and Figures | Reference | Related Articles | Metrics
    Cloth-changing person re-identification model based on semantic-guided self-attention network
    Jianhua ZHONG, Chuangyi QIU, Jianshu CHAO, Ruicheng MING, Jianfeng ZHONG
    Journal of Computer Applications    2023, 43 (12): 3719-3726.   DOI: 10.11772/j.issn.1001-9081.2022121875
    Abstract428)   HTML19)    PDF (2046KB)(252)       Save

    Focused on the difficulty of extracting effective information in the cloth-changing person Re-identification (ReID) task, a cloth-changing person re-identification model based on semantic-guided self-attention network was proposed. Firstly, semantic information was used to segment an original image into a cloth-free image. Both images were input into a two-branch multi-head self-attention network to extract cloth-independent features and complete person features, respectively. Then, a Global Feature Reconstruction module (GFR) was designed to reconstruct two global features, in which the clothing region contained head features with better robustness, which made the saliency information in the global features more prominent. And a Local Feature Reorganization and Reconstruction module (LFRR) was proposed to extract the head and shoe features from the original image and the cloth-free image, emphasizing the detailed information about the head and shoe features and reducing the interference caused by changing shoes. Finally, in addition to the identity loss and triplet loss commonly used in person re-identification, Feature Pull Loss (FPL) was proposed to close the distances among local and global features, complete image features and costume-free image features. On the PRCC (Person ReID under moderate Clothing Change) and VC-Clothes (Virtually Changing-Clothes) datasets, the mean Average Precision (mAP) of the proposed model improved by 4.6 and 0.9 percentage points respectively compared to the Clothing-based Adversarial Loss (CAL) model. On the Celeb-reID (Celebrities re-IDentification) and Celeb-reID-light (a light version of Celebrities re-IDentification) datasets, the mAP of the proposed model improved by 0.2 and 5.0 percentage points respectively compared with the Joint Loss Capsule Network (JLCN) model. The experimental results show that the proposed method has certain advantages in highlighting effective information expression in the cloth-changing scenarios.

    Table and Figures | Reference | Related Articles | Metrics
    Overview of research and application of knowledge graph in equipment fault diagnosis
    Jie WU, Ansi ZHANG, Maodong WU, Yizong ZHANG, Congbao WANG
    Journal of Computer Applications    2024, 44 (9): 2651-2659.   DOI: 10.11772/j.issn.1001-9081.2023091280
    Abstract426)   HTML43)    PDF (2858KB)(346)       Save

    Useful knowledge can be extracted from equipment fault diagnosis data for construction of a knowledge graph, which can effectively manage complex equipment fault diagnosis information in the form of triples (entity, relationship, entity). This enables the rapid diagnosis of equipment faults. Firstly, the related concepts of knowledge graph for equipment fault diagnosis were introduced, and the framework of knowledge graph for equipment fault diagnosis domain was analyzed. Secondly, the research status at home and abroad about several key technologies, such as knowledge extraction, knowledge fusion and knowledge reasoning for equipment fault diagnosis knowledge graph, was summarized. Finally, the applications of knowledge graph in equipment fault diagnosis were summarized, some shortcomings and challenges in the construction of knowledge graph in this field were proposed, and some new ideas were provided for the field of equipment fault diagnosis in the future.

    Table and Figures | Reference | Related Articles | Metrics
    EEG classification based on channel selection and multi-dimensional feature fusion
    Shuying YANG, Haiming GUO, Xin LI
    Journal of Computer Applications    2023, 43 (11): 3418-3427.   DOI: 10.11772/j.issn.1001-9081.2022101590
    Abstract408)   HTML18)    PDF (3363KB)(247)       Save

    To solve the problems of the mutual interference of multi-channel ElectroEncephaloGraphy (EEG), the different classification results caused by individual differences, and the low recognition rate of single domain features, a method of channel selection and feature fusion was proposed. Firstly, the acquired EEG was preprocessed, and the important channels were selected by using Gradient Boosting Decision Tree (GBDT). Secondly, the Generalized Predictive Control (GPC) model was used to construct the prediction signals of important channels and distinguish the subtle differences among multi-dimensional correlation signals, then the SE-TCNTA (Squeeze and Excitation block-Temporal Convolutional Network-Temporal Attention) model was used to extract temporal features between different frames. Thirdly, the Pearson correlation coefficient was used to calculate the relationship between channels, the frequency domain features of EEG and the control values of prediction signals were extracted as inputs, the spatial graph structure was established, and the Graph Convolutional Network (GCN) was used to extract the features of frequency domain and spatial domain. Finally, the above two features were input to the fully connected layer for feature fusion in order to realize the classification of EEG. Experimental results on public dataset BCICIV_2a show that in the case of channel selection, compared with the first EEG-inception model for ERP detection and DSCNN (Shallow Double-branch Convolutional Neural Network) model that also uses double branch feature extraction, the proposed method has the classification accuracy increased by 1.47% and 1.69% respectively, and has the Kappa value increased by 1.25% and 2.53% respectively. The proposed method can improve the classification accuracy of EEG and reduce the influence of redundant data on feature extraction, so it is more suitable for Brain-Computer Interface (BCI) systems.

    Table and Figures | Reference | Related Articles | Metrics
    Time series classification method based on multi-scale cross-attention fusion in time-frequency domain
    Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG
    Journal of Computer Applications    2024, 44 (6): 1842-1847.   DOI: 10.11772/j.issn.1001-9081.2023060731
    Abstract401)   HTML9)    PDF (2511KB)(282)       Save

    To address the problem of low classification accuracy caused by insufficient potential information interaction between time series subsequences, a time series classification method based on multi-scale cross-attention fusion in time-frequency domain called TFFormer (Time-Frequency Transformer) was proposed. First, time and frequency spectrums of the original time series were divided into subsequences with the same length respectively, and the point-value coupling problem was solved by adding positional embedding after linear projection. Then, the long-term time series dependency problem was solved because the model was made to focus on more important time series features by Improved Multi-Head self-Attention (IMHA) mechanism. Finally, a multi-scale Cross-Modality Attention (CMA) module was proposed to enhance the interaction between the time domain and frequency domain, so that the model could further mine the frequency information of the time series. The experimental results show that compared with Fully Convolutional Network (FCN), the classification accuracy of the proposed method on Trace, StarLightCurves and UWaveGestureLibraryAll datasets increased by 0.3, 0.9 and 1.4 percentage points. It is proved that by enhancing the information interaction between time domain and frequency domain of the time series, the model convergence speed and classification accuracy can be improved.

    Table and Figures | Reference | Related Articles | Metrics
    Chinese word segmentation method in electric power domain based on improved BERT
    Fei XIA, Shuaiqi CHEN, Min HUA, Bihong JIANG
    Journal of Computer Applications    2023, 43 (12): 3711-3718.   DOI: 10.11772/j.issn.1001-9081.2022121897
    Abstract395)   HTML17)    PDF (1953KB)(225)       Save

    To solve the problem of poor performance in segmenting a large number of proprietary words in Chinese text in electric power domain, an improved Chinese Word Segmentation (CWS) method in electric power domain based on improved BERT (Bidirectional Encoder Representations from Transformer) was proposed. Firstly, two lexicons were built covering general words and domain words respectively, and a dual-lexicon matching and integration mechanism was designed to directly integrate the word features into BERT model, enabling more effective utilization of external knowledge by the model. Then, DEEPNORM method was introduced to improve the model’s ability to extract features, and the optimal depth of the model was determined by Bayesian Information Criterion (BIC), which made BERT model stable up to 40 layers. Finally, the classical self-attention layer in BERT model was replaced by the ProbSparse self-attention layer, and the best value of sampling factor was determined by using Particle Swarm Optimization (PSO) algorithm to reduce the model complexity while ensuring the model performance. The test of word segmentation was carried out on a hand-labeled patent text dataset in electric power domain. Experimental results show that the proposed method achieves the F1 score of 92.87%, which is 14.70, 9.89 and 3.60 percentage points higher than those of the methods to be compared such as Hidden Markov Model (HMM), multi-standard word segmentation model METASEG (pre-training model with META learning for Chinese word SEGmentation) and Lexicon Enhanced BERT (LEBERT) model, verifying that the proposed method effectively improves the quality of Chinese text word segmentation in electric power domain.

    Table and Figures | Reference | Related Articles | Metrics
    Book spine segmentation algorithm based on improved DeepLabv3+ network
    Xiaofei JI, Kexin ZHANG, Lirong TANG
    Journal of Computer Applications    2023, 43 (12): 3927-3932.   DOI: 10.11772/j.issn.1001-9081.2022121887
    Abstract394)   HTML11)    PDF (2364KB)(206)       Save

    The location of books is one of the critical technologies to realize the intelligent development of libraries, and the accurate book spine segmentation algorithm has become a major challenge to achieve this goal. Based on the above solution, an improved book spine segmentation algorithm based on improved DeepLabv3+ network was proposed, aiming to solve the difficulties in book spine segmentation caused by dense arrangement, skew angles of books, and extremely similar book spine textures. Firstly, to extract more dense pyramid features of book images, the Atrous Spatial Pyramid Pooling (ASPP) in the original DeepLabv3+ network was replaced by the multi-dilation rate and multi-scale DenseASPP (Dense Atrous Spatial Pyramid Pooling) module. Secondly, to solve the problem of insensitivity of the original DeepLabv3+ network to the segmentation boundaries of objects with large aspect ratios, Strip Pooling (SP) module was added to the branch of the DenseASPP module to enhance the strip features of book spines. Finally, based on the Multi-Head Self-Attention (MHSA) mechanism in ViT (Vision Transformer), a global information enhancement-based self-attention mechanism was proposed to enhance the network’s ability to obtain long-distance features. The proposed algorithm was tested and compared on an open-source database, and the experimental results show that compared with the original DeepLabv3+ network segmentation algorithm, the proposed algorithm improves the Mean Intersection over Union (MIoU) by 1.8 percentage points on the nearly vertical book spine database and by 4.1 percentage points on the skewed book spine database, and the latter MIoU of the proposed algorithm achieves 93.3%. The above confirms that the proposed algorithm achieves accurate segmentation of book spine targets with certain skew angles, dense arrangement, and large aspect ratios.

    Table and Figures | Reference | Related Articles | Metrics
    Heterogeneous hypernetwork representation learning method with hyperedge constraint
    Keke WANG, Yu ZHU, Xiaoying WANG, Jianqiang HUANG, Tengfei CAO
    Journal of Computer Applications    2023, 43 (12): 3654-3661.   DOI: 10.11772/j.issn.1001-9081.2022121908
    Abstract394)   HTML34)    PDF (2264KB)(221)       Save

    Compared with ordinary networks, hypernetworks have complex tuple relationships, namely hyperedges. However, most existing network representation learning methods cannot capture the tuple relationships. To solve the above problem, a Heterogeneous hypernetwork Representation learning method with Hyperedge Constraint (HRHC) was proposed. Firstly, a method combining clique extension and star extension was introduced to transform the heterogeneous hypernetwork into the heterogeneous network. Then, the meta-path walk method that was aware of semantic relevance among the nodes was introduced to capture the semantic relationships among the heterogeneous nodes. Finally, the tuple relationships among the nodes were captured by means of the hyperedge constraint to obtain high-quality node representation vectors. Experimental results on three real-world datasets show that, for the link prediction task, the proposed method obtaines good results on drug, GPS and MovieLens datasets. For the hypernetwork reconstruction task, when the hyperedge reconstruction ratio is more than 0.6, the ACCuracy (ACC) of the proposed method is better than the suboptimal method Hyper2vec(biased 2nd order random walks in Hyper-networks), and the average ACC of the proposed method outperforms the suboptimal method, that is heterogeneous hypernetwork representation learning method with hyperedge constraint based on incidence graph (HRHC-incidence graph) by 15.6 percentage points on GPS dataset.

    Table and Figures | Reference | Related Articles | Metrics
    Review of evolutionary multitasking from the perspective of optimization scenarios
    Jiawei ZHAO, Xuefeng CHEN, Liang FENG, Yaqing HOU, Zexuan ZHU, Yew‑Soon Ong
    Journal of Computer Applications    2024, 44 (5): 1325-1337.   DOI: 10.11772/j.issn.1001-9081.2024020208
    Abstract393)   HTML66)    PDF (1383KB)(551)       Save

    Due to the escalating complexity of optimization problems, traditional evolutionary algorithms increasingly struggle with high computational costs and limited adaptability. Evolutionary MultiTasking Optimization (EMTO) algorithms have emerged as a novel solution, leveraging knowledge transfer to tackle multiple optimization issues concurrently, thereby enhancing evolutionary algorithms’ efficiency in complex scenarios. The current progression of evolutionary multitasking optimization research was summarized, and different research perspectives were explored by reviewing existing literature and highlighting the notable absence of optimization scenario analysis. By focusing on the application scenarios of optimization problems, the scenarios suitable for evolutionary multitasking optimization and their fundamental solution strategies were systematically outlined. This study thus could aid researchers in selecting the appropriate methods based on specific application needs. Moreover, an in-depth discussion on the current challenges and future directions of EMTO were also presented to provide guidance and insights for advancing research in this field.

    Table and Figures | Reference | Related Articles | Metrics
    Hyperparameter optimization for neural network based on improved real coding genetic algorithm
    Wei SHE, Yang LI, Lihong ZHONG, Defeng KONG, Zhao TIAN
    Journal of Computer Applications    2024, 44 (3): 671-676.   DOI: 10.11772/j.issn.1001-9081.2023040441
    Abstract392)   HTML51)    PDF (1532KB)(509)       Save

    To address the problems of poor effects, easily falling into suboptimal solutions, and inefficiency in neural network hyperparameter optimization, an Improved Real Coding Genetic Algorithm (IRCGA) based hyperparameter optimization algorithm for the neural network was proposed, which was named IRCGA-DNN (IRCGA for Deep Neural Network). Firstly, a real-coded form was used to represent the values of hyperparameters, which made the search space of hyperparameters more flexible. Then, a hierarchical proportional selection operator was introduced to enhance the diversity of the solution set. Finally, improved single-point crossover and variational operators were designed to explore the hyperparameter space more thoroughly and improve the efficiency and quality of the optimization algorithm, respectively. Two simulation datasets were used to show IRCGA’s performance in damage effectiveness prediction and convergence efficiency. The experimental results on two datasets indicate that, compared to GA-DNN(Genetic Algorithm for Deep Neural Network), the proposed algorithm reduces the convergence iterations by 8.7% and 13.6% individually, and the MSE (Mean Square Error) is not much different; compared to IGA-DNN(Improved Genetic Algorithm for Deep Neural Network), IRCGA-DNN achieves reductions of 22.2% and 13.6% in convergence iterations respectively. Experimental results show that the proposed algorithm is better in both convergence speed and prediction performance, and is suitable for hyperparametric optimization of neural networks.

    Table and Figures | Reference | Related Articles | Metrics
    Logo detection algorithm based on improved YOLOv5
    Yeheng LI, Guangsheng LUO, Qianmin SU
    Journal of Computer Applications    2024, 44 (8): 2580-2587.   DOI: 10.11772/j.issn.1001-9081.2023081113
    Abstract385)   HTML3)    PDF (4682KB)(211)       Save

    To address the challenges posed by complex background and varying size of logo images, an improved detection algorithm based on YOLOv5 was proposed. Firstly, in combination with the Channel Block Attention Module (CBAM), compression was applied in both image channels and spatial dimensions to extract critical information and significant regions within the image. Subsequently, the Switchable Atrous Convolution (SAC) was employed to allow the network to adaptively adjust the receptive field size in feature maps at different scales, improving the detection effects of objects across multiple scales. Finally, the Normalized Wasserstein Distance (NWD) was embedded into the loss function. The bounding boxes were modeled as 2D Gaussian distributions, the similarity between corresponding Gaussian distributions was calculated to better measure the similarity among objects, thereby enhancing the detection performance for small objects, and improving model robustness and stability. Compared to the original YOLOv5 algorithm: in small dataset FlickrLogos?32, the improved algorithm achieved a mean of Average Precision (mAP@0.5) of 90.6%, with an increase of 1 percentage point; in large dataset QMULOpenLogo, the improved algorithm achieved an mAP@0.5 of 62.7%, with an increase of 2.3 percentage points; in LogoDet3K for three types of logos, the improved algorithm increased the mAP@0.5 by 1.2, 1.4, and 1.4 percentage points respectively. Experimental results demonstrate that the improved algorithm has better small object detection ability of logo images.

    Table and Figures | Reference | Related Articles | Metrics
    Incentive mechanism for federated learning based on generative adversarial network
    Sunjie YU, Hui ZENG, Shiyu XIONG, Hongzhou SHI
    Journal of Computer Applications    2024, 44 (2): 344-352.   DOI: 10.11772/j.issn.1001-9081.2023020244
    Abstract380)   HTML27)    PDF (2639KB)(273)       Save

    Focused on the current lack of fair and reasonable incentive mechanism for federated learning, and the difficulty in measuring the contribution to federated learning by participant nodes with different data volumes, different data qualities, and different data distributions, a new incentive mechanism for federated learning based on Generative Adversarial Network (GAN) was proposed. Firstly, a GAN with Trained model (GANT) was proposed to achieve high-precision sample generation. Then, the contribution evaluation algorithm of the incentive mechanism was implemented based on GANT. The algorithm filtered samples and generated data labels through the joint model, and introduced the local data labels of the participant nodes to balance the impact of non-independent identically distributed data labels on the contribution evaluation. Finally, a two-stage Stackelberg game was used to realize the federated learning incentive process. The security analysis results show that the proposed incentive mechanism ensures data security and system stability in the process of federated learning. The experimental results show that the proposed incentive mechanism is correct, and the contribution evaluation algorithm has good performance under different data volumes, different data qualities and different data distributions.

    Table and Figures | Reference | Related Articles | Metrics
    Path planning algorithm of manipulator based on path imitation and SAC reinforcement learning
    Ziyang SONG, Junhuai LI, Huaijun WANG, Xin SU, Lei YU
    Journal of Computer Applications    2024, 44 (2): 439-444.   DOI: 10.11772/j.issn.1001-9081.2023020132
    Abstract376)   HTML17)    PDF (2673KB)(342)       Save

    In the training process of manipulator path planning algorithm, the training efficiency of manipulator path planning is low due to the huge action space and state space leading to sparse rewards, and it becomes challenging to evaluate the value of both states and actions given the immense number of states and actions. To address the above problems, a robotic manipulator planning algorithm based on SAC (Soft Actor-Critic) reinforcement learning was proposed. The learning efficiency was improved by incorporating the demonstrated path into the reward function so that the manipulator imitated the demonstrated path during reinforcement learning, and the SAC algorithm was used to make the training of the manipulator path planning algorithm faster and more stable. The proposed algorithm and Deep Deterministic Policy Gradient (DDPG) algorithm were used to plan 10 paths respectively, and the average distances between paths planned by the proposed algorithm and the DDPG algorithm and the reference paths were 0.8 cm and 1.9 cm respectively. The experimental results show that the path imitation mechanism can improve the training efficiency, and the proposed algorithm can better explore the environment and make the planned paths more reasonable than DDPG algorithm.

    Table and Figures | Reference | Related Articles | Metrics
    Deep shadow defense scheme of federated learning based on generative adversarial network
    Hui ZHOU, Yuling CHEN, Xuewei WANG, Yangwen ZHANG, Jianjiang HE
    Journal of Computer Applications    2024, 44 (1): 223-232.   DOI: 10.11772/j.issn.1001-9081.2023010088
    Abstract371)   HTML7)    PDF (4561KB)(173)       Save

    Federated Learning (FL) allows users to share and interact with multiple parties without directly uploading the original data, effectively reducing the risk of privacy leaks. However, existing research suggests that the adversary can still reconstruct raw data through shared gradient information. To further protect the privacy of federated learning, a deep shadow defense scheme of federated learning based on Generative Adversarial Network (GAN) was proposed. The original real data distribution features were learned by GAN and replaceable shadow data was generated. Then, the original model trained on real data was replaced by a shadow model trained on shadow data and was not directly accessible to the adversary. Finally, the real gradient was replaced by the shadow gradient generated by the shadow data in the shadow model and was not accessible to the adversary. Experiments were conducted on CIFAR10 and CIFAR100 datasets for comparison of the proposed scheme with the five defense schemes of adding noise, gradient clipping, gradient compression, representation perturbation and local regularization and sparsification. On CIFAR10 dataset, the Mean Square Error (MSE) and the Feature Mean Square Error (FMSE) of the proposed scheme were 1.18-5.34 and 4.46-1.03×107 times, and the Peak Signal-to-Noise Ratio (PSNR) of the proposed scheme was 49.9%-90.8%. On CIFAR100 dataset, the MSE and the FMSE of the proposed scheme were 1.04-1.06 and 5.93-4.24×103 times, and the PSNR of the proposed scheme was 96.0%-97.6%. Compared with the deep shadow defense method, the proposed scheme takes into account the actual attack capability of the adversary and the problems in shadow model training, and designs threat models and shadow model generation algorithms. It performs better in theory analysis and experiment result that of the comparsion schemes, and it can effectively reduce the risk of federated learning privacy leaks while ensuring accuracy.

    Table and Figures | Reference | Related Articles | Metrics
    Correlation filtering based target tracking with nonlinear temporal consistency
    Wentao JIANG, Wanxuan LI, Shengchong ZHANG
    Journal of Computer Applications    2024, 44 (8): 2558-2570.   DOI: 10.11772/j.issn.1001-9081.2023081121
    Abstract370)   HTML0)    PDF (7942KB)(69)       Save

    Concerning the problem that existing target tracking algorithms mainly use the linear constraint mechanism LADCF (Learning Adaptive Discriminative Correlation Filters), which easily causes model drift, a correlation filtering based target tracking algorithm with nonlinear temporal consistency was proposed. First, a nonlinear temporal consistency term was proposed based on Stevens’ Law, which aligned closely with the characteristics of human visual perception. The nonlinear temporal consistency term allowed the model to track the target relatively smoothly, thus ensuring tracking continuity and preventing model drift. Next, the Alternating Direction Method of Multipliers (ADMM) was employed to compute the optimal function value, ensuring real-time tracking of the algorithm. Lastly, Stevens’ Law was used for nonlinear filter updating, enabling the filter update factor to enhance and suppress the filter according to the change of the target, thereby adapting to target changes and preventing filter degradation. Comparison experiments with mainstream correlation filtering and deep learning algorithms were performed on four standard datasets. Compared with the baseline algorithm LADCF, the tracking precision and success rate of the proposed algorithm were improved by 2.4 and 3.8 percentage points on OTB100 dataset, and 1.5 and 2.5 percentage points on UAV123 dataset. The experimental results show that the proposed algorithm effectively avoids tracking model drift, reduces the likelihood of filter degradation, has higher tracking precision and success rate, and stronger robustness in complicated situations such as occlusion and illumination changes.

    Table and Figures | Reference | Related Articles | Metrics
    Semantically enhanced sentiment classification model based on multi-level attention
    Jianle CAO, Nana LI
    Journal of Computer Applications    2023, 43 (12): 3703-3710.   DOI: 10.11772/j.issn.1001-9081.2022121894
    Abstract363)   HTML29)    PDF (1638KB)(196)       Save

    The existing text sentiment classification methods face serious challenges due to the complex semantics of natural language, the multiple sentiment polarities of words, and the long-term dependency of text. To solve these problems, a semantically enhanced sentiment classification model based on multi-level attention was proposed. Firstly, the contextualized dynamic word embedding technology was used to mine the multiple semantic information of words, and the context semantics was modeled. Secondly, the long-term dependency within the text was captured by the multi-layer parallel multi-head self-attention in the internal attention layer to obtain comprehensive text feature information. Thirdly, in the external attention layer, the summary information in the review metadata was integrated into the review features through a multi-level attention mechanism to enhance the sentiment information and semantic expression ability of the review features. Finally, the global average pooling layer and Softmax function were used to realize sentiment classification. Experimental results on four Amazon review datasets show that, compared with the best-performing TE-GRU (Transformer Encoder with Gated Recurrent Unit) in the baseline models, the proposed model improves the sentiment classification accuracy on App, Kindle, Electronic and CD datasets by at least 0.36, 0.34, 0.58 and 0.66 percentage points, which verifies that the proposed model can further improve the sentiment classification performance.

    Table and Figures | Reference | Related Articles | Metrics
    Video dynamic scene graph generation model based on multi-scale spatial-temporal Transformer
    Jia WANG-ZHU, Zhou YU, Jun YU, Jianping FAN
    Journal of Computer Applications    2024, 44 (1): 47-57.   DOI: 10.11772/j.issn.1001-9081.2023060861
    Abstract361)   HTML16)    PDF (2900KB)(297)       Save

    To address the challenge of dynamic changes in object relationships over time in videos, a video dynamic scene graph generation model based on multi-scale spatial-temporal Transformer was proposed. The multi-scale modeling idea was introduced into the classic Transformer architecture to precisely model dynamic fine-grained semantics in videos. First, in the spatial dimension, the attention was given to both the global spatial correlations of objects, similar to traditional models, and the local spatial correlations among objects’ relative positions, which facilitated a better understanding of interactive dynamics between people and objects, leading to more accurate semantic analysis results. Then, in the temporal dimension, not only the traditional short-term temporal correlations of objects in videos were modeled, but also the long-term temporal correlations of the same object pairs throughout the entire videos were emphasized. Comprehensive modeling of long-term relationships between objects assisted in generating more accurate and coherent scene graphs, mitigating issues arising from occlusions, overlaps, etc. during scene graph generation. Finally, through the collaborative efforts of the spatial encoder and temporal encoder, dynamic fine-grained semantics in videos were captured more accurately by the model, avoiding limitations inherent in traditional single-scale approaches. The experimental results show that, compared to the baseline model STTran, the proposed model achieves an increase of 5.0 percentage points, 2.8 percentage points, and 2.9 percentage points in terms of Recall@10 for the tasks of predicate classification, scene graph classification, and scene graph detection, respectively, on the Action Genome benchmark dataset. This demonstrates that the multi-scale modeling concept can enhance precision and effectively boost performance in dynamic video scene graph generation tasks.

    Table and Figures | Reference | Related Articles | Metrics
    Survey on Hypergraph Application Methods: Research Problem, Progress, and Challenges
    Journal of Computer Applications    DOI: 10.11772/j.issn.1001-9081.2023111629
    Online available: 29 January 2024

    Semantic segmentation method for remote sensing images based on multi-scale feature fusion
    Ning WU, Yangyang LUO, Huajie XU
    Journal of Computer Applications    2024, 44 (3): 737-744.   DOI: 10.11772/j.issn.1001-9081.2023040439
    Abstract355)   HTML22)    PDF (2809KB)(362)       Save

    To improve the accuracy of semantic segmentation for remote sensing images and address the loss problem of small-sized target information during feature extraction by Deep Convolutional Neural Network (DCNN), a semantic segmentation method based on multi-scale feature fusion named FuseSwin was proposed. Firstly, an Attention Enhancement Module (AEM) was introduced in the Swin Transformer to highlight the target area and suppress background noise. Secondly, the Feature Pyramid Network (FPN) was used to fuse the detailed information and high-level semantic information of the multi-scale features to complement the features of the target. Finally, the Atrous Spatial Pyramid Pooling (ASPP) module was used to capture the contextual information of the target from the fused feature map and further improve the model segmentation accuracy. Experimental results demonstrate that the proposed method outperforms current mainstream segmentation methods.The mean Pixel Accuracy (mPA) and mean Intersection over Union (mIoU) of the proposed method on Potsdam remote sensing dataset are 2.34 and 3.23 percentage points higher than those of DeepLabV3 method, and 1.28 and 1.75 percentage points higher than those of SegFormer method. Additionally, the proposed method was applied to identify and segment oyster rafts in high-resolution remote sensing images of the Maowei Sea in Qinzhou, Guangxi, and achieved Pixel Accuracy (PA) and Intersection over Union (IoU) of 96.21% and 91.70%, respectively.

    Table and Figures | Reference | Related Articles | Metrics
    Scene graph-aware cross-modal image captioning model
    Zhiping ZHU, Yan YANG, Jie WANG
    Journal of Computer Applications    2024, 44 (1): 58-64.   DOI: 10.11772/j.issn.1001-9081.2022071109
    Abstract355)   HTML14)    PDF (1879KB)(213)       Save

    Aiming at the forgetting and underutilization of the text information of image in image captioning methods, a Scene Graph-aware Cross-modal Network (SGC-Net) was proposed. Firstly, the scene graph was utilized as the image’s visual features, and the Graph Convolutional Network (GCN) was utilized for feature fusion, so that the visual and textual features were in the same feature space. Then, the text sequence generated by the model was stored, and the corresponding position information was added as the textual features of the image, so as to solve the problem of text feature loss brought by the single-layer Long Short-Term Memory (LSTM) Network. Finally, to address the issue of over dependence on image information and underuse of text information, the self-attention mechanism was utilized to extract significant image information and text information and fuse then. Experimental results on Flickr30K and MS-COCO (MicroSoft Common Objects in COntext) datasets demonstrate that SGC-Net outperforms Sub-GC on the indicators BLEU1 (BiLingual Evaluation Understudy with 1-gram), BLEU4 (BiLingual Evaluation Understudy with 4-grams), METEOR (Metric for Evaluation of Translation with Explicit ORdering), ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and SPICE (Semantic Propositional Image Caption Evaluation) with the improvements of 1.1,0.9,0.3,0.7,0.4 and 0.3, 0.1, 0.3, 0.5, 0.6, respectively. It can be seen that the method used by SGC-Net can increase the model’s image captioning performance and the fluency of the generated description effectively.

    Table and Figures | Reference | Related Articles | Metrics
2024 Vol.44 No.10

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF