Most Read articles

    Published in last 1 year |  In last 2 years |  In last 3 years |  All

    All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Federated learning survey:concepts, technologies, applications and challenges
    Tiankai LIANG, Bi ZENG, Guang CHEN
    Journal of Computer Applications    2022, 42 (12): 3651-3662.   DOI: 10.11772/j.issn.1001-9081.2021101821
    Abstract1309)   HTML34)    PDF (2464KB)(788)       Save

    Under the background of emphasizing data right confirmation and privacy protection, federated learning, as a new machine learning paradigm, can solve the problem of data island and privacy protection without exposing the data of all participants. Since the modeling methods based on federated learning have become mainstream and achieved good effects at present, it is significant to summarize and analyze the concepts, technologies, applications and challenges of federated learning. Firstly, the development process of machine learning and the inevitability of the appearance of federated learning were elaborated, and the definition and classification of federated learning were given. Secondly, three federated learning methods (including horizontal federated learning, vertical federated learning and federated transfer learning) which were recognized by the industry currently were introduced and analyzed. Thirdly, concerning the privacy protection issue of federated learning, the existing common privacy protection technologies were generalized and summarized. In addition, the recent mainstream open-source frameworks were introduced and compared, and the application scenarios of federated learning were given at the same time. Finally, the challenges and future research directions of federated learning were prospected.

    Table and Figures | Reference | Related Articles | Metrics
    Stock market volatility prediction method based on graph neural network with multi-attention mechanism
    Xiaohan LI, Jun WANG, Huading JIA, Liu XIAO
    Journal of Computer Applications    2022, 42 (7): 2265-2273.   DOI: 10.11772/j.issn.1001-9081.2021081487
    Abstract720)   HTML17)    PDF (2246KB)(236)       Save

    Stock market is an essential element of financial market, therefore, the study on volatility of stock market plays a significant role in taking effective control of financial market risks and improving returns on investment. For this reason, it has attracted widespread attention from both academic circle and related industries. However, there are multiple influencing factors for stock market. Facing the multi-source and heterogeneous information in stock market, it is challenging to find how to mine and fuse multi-source and heterogeneous data of stock market efficiently. To fully explain the influence of different information and information interaction on the price changes in stock market, a graph neural network based on multi-attention mechanism was proposed to predict price fluctuation in stock market. First of all, the relationship dimension was introduced to construct heterogeneous subgraphs for the transaction data and news text of stock market, and multi-attention mechanism was adopted for fusion of the graph data. Then, the graph neural network Gated Recurrent Unit (GRU) was applied to perform graph classification. On this basis, prediction was made for the volatility of three important indexes: Shanghai Composite Index, Shanghai and Shenzhen 300 Index, Shenzhen Component Index. Experimental results show that from the perspective of heterogeneous information characteristics, compared with the transaction data of stock market, the news information of stock market has the lagged influence on stock volatility; from the perspective of heterogeneous information fusion, compared with algorithms such as Support Vector Machine (SVM), Random Forest (RF) and Multiple Kernel k-Means (MKKM) clustering, the proposed method has the prediction accuracy improved by 17.88 percentage points, 30.00 percentage points and 38.00 percentage points respectively; at the same time, the quantitative investment simulation was performed according to the model trading strategy.

    Table and Figures | Reference | Related Articles | Metrics
    Survey of multimodal pre-training models
    Huiru WANG, Xiuhong LI, Zhe LI, Chunming MA, Zeyu REN, Dan YANG
    Journal of Computer Applications    2023, 43 (4): 991-1004.   DOI: 10.11772/j.issn.1001-9081.2022020296
    Abstract709)   HTML86)    PDF (5539KB)(559)    PDF(mobile) (3280KB)(55)    Save

    By using complex pre-training targets and a large number of model parameters, Pre-Training Model (PTM) can effectively obtain rich knowledge from unlabeled data. However, the development of the multimodal PTMs is still in its infancy. According to the difference between modals, most of the current multimodal PTMs were divided into the image-text PTMs and video-text PTMs. According to the different data fusion methods, the multimodal PTMs were divided into two types: single-stream models and two-stream models. Firstly, common pre-training tasks and downstream tasks used in validation experiments were summarized. Secondly, the common models in the area of multimodal pre-training were sorted out, and the downstream tasks of each model and the performance and experimental data of the models were listed in tables for comparison. Thirdly, the application scenarios of M6 (Multi-Modality to Multi-Modality Multitask Mega-transformer) model, Cross-modal Prompt Tuning (CPT) model, VideoBERT (Video Bidirectional Encoder Representations from Transformers) model, and AliceMind (Alibaba’s collection of encoder-decoders from Mind) model in specific downstream tasks were introduced. Finally, the challenges and future research directions faced by related multimodal PTM work were summed up.

    Table and Figures | Reference | Related Articles | Metrics
    Time series classification by LSTM based on multi-scale convolution and attention mechanism
    Yinglü XUAN, Yuan WAN, Jiahui CHEN
    Journal of Computer Applications    2022, 42 (8): 2343-2352.   DOI: 10.11772/j.issn.1001-9081.2021061062
    Abstract692)   HTML47)    PDF (711KB)(322)       Save

    The multi-scale features of time series contain abundant category information which has different importance for classification. However, the existing univariate time series classification models conventionally extract series features by convolutions with a fixed kernel size, resulting in being unable to acquire and focus on important multi-scale features effectively. In order to solve the above problem, a Multi-scale Convolution and Attention mechanism (MCA) based Long Short-Term Memory (LSTM) model (MCA-LSTM) was proposed, which was capable of concentrating and fusing important multi-scale features to achieve more accurate classification effect. In this structure, by using LSTM, the transmission of series information was controlled through memory cells and gate mechanism, and the correlation information of time series was extracted fully; by using Multi-scale Convolution Module (MCM), the multi-scale features of the series were extracted through Convolutional Neural Networks (CNNs) with different kernel sizes; by using Attention Module (AM), the channel information was fused to obtain the importance of features and assign attention weights, which enabled the network to focus on important time series features. Experimental results on 65 univariate time series datasets of UCR archive show that compared with the state-of-the-art time series classification methods: Unsupervised Scalable Representation Learning-FordA (USRL-FordA), Unsupervised Scalable Representation Learning-Combined (1-Nearest Neighbor) (USRL-Combined (1-NN)), Omni-Scale Convolutional Neural Network (OS-CNN), Inception-Time and Robust Temporal Feature Network for time series classification (RTFN),MCA-LSTM has the Mean Error (ME) reduced by 7.48, 9.92, 2.43, 2.09 and 0.82 percentage points, respectively; and achieved the highest Arithmetic Mean Rank (AMR) and Geometric Mean Rank (GMR), which are 2.14 and 3.23 respectively. These results fully demonstrate the effectiveness of MCA-LSTM in the classification of univariate time series.

    Table and Figures | Reference | Related Articles | Metrics
    Transformer based U-shaped medical image segmentation network: a survey
    Liyao FU, Mengxiao YIN, Feng YANG
    Journal of Computer Applications    2023, 43 (5): 1584-1595.   DOI: 10.11772/j.issn.1001-9081.2022040530
    Abstract676)   HTML7)    PDF (1887KB)(444)       Save

    U-shaped Network (U-Net) based on Fully Convolutional Network (FCN) is widely used as the backbone of medical image segmentation models, but Convolutional Neural Network (CNN) is not good at capturing long-range dependency, which limits the further performance improvement of segmentation models. To solve the above problem, researchers have applied Transformer to medical image segmentation models to make up for the deficiency of CNN, and U-shaped segmentation networks combining Transformer have become the hot research topics. After a detailed introduction of U-Net and Transformer, the related medical image segmentation models were categorized by the position in which the Transformer module was located, including only in the encoder or decoder, both in the encoder and decoder, as a skip-connection, and others, the basic contents, design concepts and possible improvement aspects about these models were discussed, the advantages and disadvantages of having Transformer in different positions were also analyzed. According to the analysis results, it can be seen that the biggest factor to decide the position of Transformer is the characteristics of the target segmentation task, and the segmentation models of Transformer combined with U-Net can make better use of the advantages of CNN and Transformer to improve segmentation performance of models, which has great development prospect and research value.

    Table and Figures | Reference | Related Articles | Metrics
    Research progress of blockchain‑based federated learning
    Rui SUN, Chao LI, Wei WANG, Endong TONG, Jian WANG, Jiqiang LIU
    Journal of Computer Applications    2022, 42 (11): 3413-3420.   DOI: 10.11772/j.issn.1001-9081.2021111934
    Abstract623)   HTML28)    PDF (1086KB)(407)       Save

    Federated Learning (FL) is a novel privacy?preserving learning paradigm that can keep users' data locally. With the progress of the research on FL, the shortcomings of FL, such as single point of failure and lack of credibility, are gradually gaining attention. In recent years, the blockchain technology originated from Bitcoin has achieved rapid development, which pioneers the construction of decentralized trust and provides a new possibility for the development of FL. The existing research works on blockchain?based FL were reviewed, the frameworks for blockchain?based FL were compared and analyzed. Then, key points of FL solved by the combination of blockchain and FL were discussed. Finally, the application prospects of blockchain?based FL were presented in various fields, such as Internet of Things (IoT), Industrial Internet of Things (IIoT), Internet of Vehicles (IoV) and medical services.

    Table and Figures | Reference | Related Articles | Metrics
    Survey of single target tracking algorithms based on Siamese network
    Mengting WANG, Wenzhong YANG, Yongzhi WU
    Journal of Computer Applications    2023, 43 (3): 661-673.   DOI: 10.11772/j.issn.1001-9081.2022010150
    Abstract563)   HTML101)    PDF (2647KB)(486)       Save

    Single object tracking is an important research direction in the field of computer vision, and has a wide range of applications in video surveillance, autonomous driving and other fields. For single object tracking algorithms, although a large number of summaries have been conducted, most of them are based on correlation filter or deep learning. In recent years, Siamese network-based tracking algorithms have received extensive attention from researchers for their balance between accuracy and speed, but there are relatively few summaries of this type of algorithms and it lacks systematic analysis of the algorithms at the architectural level. In order to deeply understand the single object tracking algorithms based on Siamese network, a large number of related literatures were organized and analyzed. Firstly, the structures and applications of the Siamese network were expounded, and each tracking algorithm was introduced according to the composition classification of the Siamese tracking algorithm architectures. Then, the commonly used datasets and evaluation metrics in the field of single object tracking were listed, the overall and each attribute performance of 25 mainstream tracking algorithms was compared and analyzed on OTB 2015 (Object Tracking Benchmark) dataset, and the performance and the reasoning speed of 23 Siamese network-based tracking algorithms on LaSOT (Large-scale Single Object Tracking) and GOT-10K (Generic Object Tracking) test sets were listed. Finally, the research on Siamese network-based tracking algorithms was summarized, and the possible future research directions of this type of algorithms were prospected.

    Table and Figures | Reference | Related Articles | Metrics
    Survey on imbalanced multi‑class classification algorithms
    Mengmeng LI, Yi LIU, Gengsong LI, Qibin ZHENG, Wei QIN, Xiaoguang REN
    Journal of Computer Applications    2022, 42 (11): 3307-3321.   DOI: 10.11772/j.issn.1001-9081.2021122060
    Abstract562)   HTML65)    PDF (1861KB)(393)       Save

    Imbalanced data classification is an important research content in machine learning, but most of the existing imbalanced data classification algorithms foucus on binary classification, and there are relatively few studies on imbalanced multi?class classification. However, datasets in practical applications usually have multiple classes and imbalanced data distribution, and the diversity of classes further increases the difficulty of imbalanced data classification, so the multi?class classification problem has become a research topic to be solved urgently. The imbalanced multi?class classification algorithms proposed in recent years were reviewed. According to whether the decomposition strategy was adopted, imbalanced multi?class classification algorithms were divided into decomposition methods and ad?hoc methods. Furthermore, according to the different adopted decomposition strategies, the decomposition methods were divided into two frameworks: One Vs. One (OVO) and One Vs. All (OVA). And according to different used technologies, the ad?hoc methods were divided into data?level methods, algorithm?level methods, cost?sensitive methods, ensemble methods and deep network?based methods. The advantages and disadvantages of these methods and their representative algorithms were systematically described, the evaluation indicators of imbalanced multi?class classification methods were summarized, the performance of the representative methods were deeply analyzed through experiments, and the future development directions of imbalanced multi?class classification were discussed.

    Table and Figures | Reference | Related Articles | Metrics
    Network representation learning model based on node attribute bipartite graph
    Le ZHOU, Tingting DAI, Chun LI, Jun XIE, Boce CHU, Feng LI, Junyi ZHANG, Qiao LIU
    Journal of Computer Applications    2022, 42 (8): 2311-2318.   DOI: 10.11772/j.issn.1001-9081.2021060972
    Abstract527)   HTML133)    PDF (843KB)(394)       Save

    It is an important task to carry out reasoning and calculation on graph structure data. The main challenge of this task is how to represent graph-structured knowledge so that machines can easily understand and use graph structure data. After comparing the existing representation learning models, it is found that the models based on random walk methods are likely to ignore the special effect of attributes on the association between nodes. Therefore, a hybrid random walk method based on node adjacency and attribute association was proposed. Firstly the attribute weights were calculated through the common attribute distribution among adjacent nodes, and the sampling probability from the node to each attribute was obtained. Then, the network information was extracted from adjacent nodes and non-adjacent nodes with common attributes respectively. Finally, the network representation learning model based on node attribute bipartite graph was constructed, and the node vector representations were obtained through the above sampling sequence learning. Experimental results on Flickr, BlogCatalog and Cora public datasets show that the Micro-F1 average accuracy of node classification by the node vector representations obtained by the proposed model is 89.38%, which is 2.02 percentage points higher than that of GraphRNA (Graph Recurrent Networks with Attributed random walk) and 21.12 percentage points higher than that of classical work DeepWalk. At the same time, by comparing different random walk methods, it is found that increasing the sampling probabilities of attributes that promote node association can improve the information contained in the sampling sequence.

    Table and Figures | Reference | Related Articles | Metrics
    Parameter calculation algorithm of structural graph clustering driven by instance clusters
    Chuanyu ZONG, Chao XIAN, Xiufeng XIA
    Journal of Computer Applications    2023, 43 (2): 398-406.   DOI: 10.11772/j.issn.1001-9081.2022010082
    Abstract526)   HTML4)    PDF (2584KB)(37)       Save

    Clustering results of the pSCAN (pruned Structural Clustering Algorithm for Network) algorithm are influenced by the density constraint parameter and the similarity threshold parameter. If the requirements cannot be satisfied by the clustering results obtained by the clustering parameters provided by the user, then the user’s own clustering requirements can be expressed through instance clusters. Aiming at the problem of instance clusters expressing clustering query requirements, an instance cluster-driven structural graph clustering parameter calculation algorithm PART and its improved algorithm ImPART were proposed. Firstly, the influences of two clustering parameters on the clustering results were analyzed, and correlation subgraph of instance cluster was extracted. Secondly, the feasible interval of the density constraint parameter was obtained by analyzing the correlation subgraph, and the nodes in the instance cluster were divided into core nodes and non-core nodes according to the current density constraint parameter and the structural similarity between nodes. Finally, according to the node division result, the optimal similarity threshold parameter corresponding to the current density constraint parameter was calculated, and the obtained parameters were verified and optimized on the relevant subgraph until the clustering parameters that satisfy the requirements of the instance cluster were obtained. Experimental results on real datasets show that a set of effective parameters can be returned for user instance clusters by using the proposed algorithm, and the proposed improved algorithm ImPART is more than 20% faster than the basic algorithm PART, and can return the optimal clustering parameters that satisfy the requirements of instance clusters quickly and effectively for the user.

    Table and Figures | Reference | Related Articles | Metrics
    Survey of event extraction
    Chunming MA, Xiuhong LI, Zhe LI, Huiru WANG, Dan YANG
    Journal of Computer Applications    2022, 42 (10): 2975-2989.   DOI: 10.11772/j.issn.1001-9081.2021081542
    Abstract516)   HTML88)    PDF (3054KB)(325)       Save

    The event that the user is interested in is extracted from the unstructured information, and then displayed to the user in a structured way, that is event extraction. Event extraction has a wide range of applications in information collection, information retrieval, document synthesis, and information questioning and answering. From the overall perspective, event extraction algorithms can be divided into four categories: pattern matching algorithms, trigger lexical methods, ontology-based algorithms, and cutting-edge joint model methods. In the research process, different evaluation methods and datasets can be used according to the related needs, and different event representation methods are also related to event extraction research. Distinguished by task type, meta-event extraction and subject event extraction are the two basic tasks of event extraction. Among them, meta-event extraction has three methods based on pattern matching, machine learning and neural network respectively, while there are two ways to extract subjective events: based on the event framework and based on ontology respectively. Event extraction research has achieved excellent results in single languages such as Chinese and English, but cross-language event extraction still faces many problems. Finally, the related works of event extraction were summarized and the future research directions were prospected in order to provide guidelines for subsequent research.

    Table and Figures | Reference | Related Articles | Metrics
    Review on privacy-preserving technologies in federated learning
    Teng WANG, Zheng HUO, Yaxin HUANG, Yilin FAN
    Journal of Computer Applications    2023, 43 (2): 437-449.   DOI: 10.11772/j.issn.1001-9081.2021122072
    Abstract498)   HTML57)    PDF (2014KB)(328)       Save

    In recent years, federated learning has become a new way to solve the problems of data island and privacy leakage in machine learning. Federated learning architecture does not require multiple parties to share data resources, in which participants only needed to train local models on local data and periodically upload parameters to the server to update the global model, and then a machine learning model can be built on large-scale global data. Federated learning architecture has the privacy-preserving nature and is a new scheme for large-scale data machine learning in the future. However, the parameter interaction mode of this architecture may lead to data privacy disclosure. At present, strengthening the privacy-preserving mechanism in federated learning architecture has become a new research hotspot. Starting from the privacy disclosure problem in federated learning, the attack models and sensitive information disclosure paths in federated learning were discussed, and several types of privacy-preserving techniques in federated learning were highlighted and reviewed, such as privacy-preserving technology based on differential privacy, privacy-preserving technology based on homomorphic encryption, and privacy-preserving technology based on Secure Multiparty Computation (SMC). Finally, the key issues of privacy protection in federated learning were discussed, the future research directions were prospected.

    Table and Figures | Reference | Related Articles | Metrics
    Temporal convolutional knowledge tracing model with attention mechanism
    Xiaomeng SHAO, Meng ZHANG
    Journal of Computer Applications    2023, 43 (2): 343-348.   DOI: 10.11772/j.issn.1001-9081.2022010024
    Abstract491)   HTML28)    PDF (2110KB)(249)       Save

    To address the problems of insufficient interpretability and long sequence dependency in the deep knowledge tracing model based on Recurrent Neural Network (RNN), a model named Temporal Convolutional Knowledge Tracing with Attention mechanism (ATCKT) was proposed. Firstly, the student historical interactions embedded representations were learned in the training process. Then, the exercise problem-based attention mechanism was used to learn a specific weight matrix to identify and strengthen the influences of student historical interactions on the knowledge state at each moment. Finally, the student knowledge states were extracted by Temporal Convolutional Network (TCN), in which dilated convolution and deep neural network were used to expand the scope of sequence learning, and alleviate the problem of long sequence dependency. Experimental results show that compared with four models such as Deep Knowledge Tracing (DKT) and Convolutional Knowledge Tracing (CKT) on four datasets (ASSISTments2009、ASSISTments2015、Statics2011 and Synthetic-5), ATCKT model has the Area Under the Curve (AUC) and Accuracy (ACC) significantly improved, especially on ASSISTments2015 dataset, with an increase of 6.83 to 20.14 percentage points and 7.52 to 11.22 percentage points respectively, at the same time, the training time of the proposed model is decreased by 26% compared with that of DKT model. In summary, this model can accurately capture the student knowledge states and efficiently predict student future performance.

    Table and Figures | Reference | Related Articles | Metrics
    Review of interactive machine translation
    Xingbin LIAO, Xiaolin QIN, Siqi ZHANG, Yangge QIAN
    Journal of Computer Applications    2023, 43 (2): 329-334.   DOI: 10.11772/j.issn.1001-9081.2021122067
    Abstract480)   HTML71)    PDF (1870KB)(374)       Save

    With the development and maturity of deep learning, the quality of neural machine translation has increased, yet it is still not perfect and requires human post-editing to achieve acceptable translation results. Interactive Machine Translation (IMT) is an alternative to this serial work, that is performing human interaction during the translation process, where the user verifies the candidate translations produced by the translation system and, if necessary, provides new input, and the system generates new candidate translations based on the current feedback of users, this process repeats until a satisfactory output is produced. Firstly, the basic concept and the current research progresses of IMT were introduced. Then, some common methods and state-of-the-art works were suggested in classification, while the background and innovation of each work were briefly described. Finally, the development trends and research difficulties of IMT were discussed.

    Table and Figures | Reference | Related Articles | Metrics
    Chinese event detection based on data augmentation and weakly supervised adversarial training
    Ping LUO, Ling DING, Xue YANG, Yang XIANG
    Journal of Computer Applications    2022, 42 (10): 2990-2995.   DOI: 10.11772/j.issn.1001-9081.2021081521
    Abstract466)   HTML43)    PDF (720KB)(229)       Save

    The existing event detection models rely heavily on human-annotated data, and supervised deep learning models for event detection task often suffer from over-fitting when there is only limited labeled data. Methods of replacing time-consuming human annotation data with auto-labeled data typically rely on sophisticated pre-defined rules. To address these issues, a BERT (Bidirectional Encoder Representations from Transformers) based Mix-text ADversarial training (BMAD) method for Chinese event detection was proposed. In the proposed method, a weakly supervised learning scene was set on the basis of data augmentation and adversarial learning, and a span extraction model was used to solve event detection task. Firstly, to relieve the problem of insufficient data, various data augmentation methods such as back-translation and Mix-Text were applied to augment data and create weakly supervised learning scene for event detection. And then an adversarial training mechanism was applied to learn with noise and improve the robustness of the whole model. Several experiments were conducted on commonly used real-world dataset Automatic Context Extraction (ACE) 2005. The results show that compared with algorithms such as Nugget Proposal Network (NPN), Trigger-aware Lattice Neural Network (TLNN) and Hybrid-Character-Based Neural Network (HCBNN), the proposed method has the F1 score improved by at least 0.84 percentage points.

    Table and Figures | Reference | Related Articles | Metrics
    Federated learning algorithm for communication cost optimization
    Sai ZHENG, Tianrui LI, Wei HUANG
    Journal of Computer Applications    2023, 43 (1): 1-7.   DOI: 10.11772/j.issn.1001-9081.2021122054
    Abstract465)   HTML29)    PDF (934KB)(299)       Save

    Federated Learning (FL) is a machine learning setting that can protect data privacy, however, the problems of high communication cost and client heterogeneity hinder the large?scale implementation of federated learning. To solve these two problems, a federated learning algorithm for communication cost optimization was proposed. First, the generative models from the clients were received and simulated data were generated by the server. Then, the simulated data were used by the server to train the global model and send it to the clients, and the final models were obtained by the clients through fine?tuning the global model. In the proposed algorithm only one round of communication between clients and the server was needed, and the fine?tuning of the client models was used to solve the problem of client heterogeneity. When the number of clients is 20, experiments were carried out on MNIST and CIFAR?10 dataset. The results show that the proposed algorithm can reduce the amount of communication data to 1/10 of that of Federated Averaging (FedAvg) algorithm on the MNIST dataset, and can reduce the amount of communication data to 1/100 of that of Federated Averaging (FedAvg) algorithm on the CIFAR-10 dataset with the premise of ensuring accuracy.

    Table and Figures | Reference | Related Articles | Metrics
    Lightweight object detection algorithm based on improved YOLOv4
    Zhifeng ZHONG, Yifan XIA, Dongping ZHOU, Yangtian YAN
    Journal of Computer Applications    2022, 42 (7): 2201-2209.   DOI: 10.11772/j.issn.1001-9081.2021050734
    Abstract443)   HTML12)    PDF (5719KB)(327)       Save

    YOLOv4 (You Only Look Once version 4) object detection network has complex structure, many parameters, high configuration required for training and low Frames Per Second (FPS) for real-time detection. In order to solve the above problems, a lightweight object detection algorithm based on YOLOv4, named ML-YOLO (MobileNetv3Lite-YOLO), was proposed. Firstly, MobileNetv3 was used to replace the backbone feature extraction network of YOLOv4, which greatly reduced the amount of backbone network parameters through the depthwise separable convolution in MobileNetv3. Then, a simplified weighted Bi-directional Feature Pyramid Network (Bi-FPN) structure was used to replace the feature fusion network of YOLOv4. Therefore, the object detection accuracy was optimized by the attention mechanism in Bi-FPN. Finally, the final prediction box was generated through the YOLOv4 decoding algorithm, and the object detection was realized. Experimental results on VOC (Visual Object Classes) 2007 dataset show that the mean Average Precision (mAP) of the ML-YOLO algorithm reaches 80.22%, which is 3.42 percentage points lower than that of the YOLOv4 algorithm, and 2.82 percentage points higher than that of the YOLOv5m algorithm; at the same time, the model size of the ML-YOLO algorithm is only 44.75 MB, compared with the YOLOv4 algorithm, it is reduced by 199.54 MB, and compared with the YOLOv5m algorithm, it is only 2.85 MB larger. Experimental results prove that the proposed ML-YOLO model greatly reduces the size of the model compared with the YOLOv4 model while maintaining a higher detection accuracy, indicating that the proposed algorithm can meet the lightweight and accuracy requirements of mobile or embedded devices for object detection.

    Table and Figures | Reference | Related Articles | Metrics
    Time series prediction model based on multimodal information fusion
    Minghui WU, Guangjie ZHANG, Canghong JIN
    Journal of Computer Applications    2022, 42 (8): 2326-2332.   DOI: 10.11772/j.issn.1001-9081.2021061053
    Abstract423)   HTML42)    PDF (658KB)(250)       Save

    Aiming at the problem that traditional single factor methods cannot make full use of the relevant information of time series and has the poor accuracy and reliability of time series prediction, a time series prediction model based on multimodal information fusion,namely Skip-Fusion, was proposed to fuse the text data and numerical data in multimodal data. Firstly, different types of text data were encoded by pre-trained Bidirectional Encoder Representations from Transformers (BERT) model and one-hot encoding. Then, the single vector representation of the multi-text feature fusion was obtained by using the pre-trained model based on global attention mechanism. After that, the obtained single vector representation was aligned with the numerical data in time order. Finally, the fusion of text and numerical features was realized through Temporal Convolutional Network (TCN) model, and the shallow and deep features of multimodal data were fused again through skip connection. Experiments were carried out on the dataset of stock price series, Skip-Fusion model obtains the results of 0.492 and 0.930 on the Root Mean Square Error (RMSE) and daily Return (R) respectively, which are better than the results of the existing single-modal and multimodal fusion models. Experimental results show that Skip-Fusion model obtains the goodness of fit of 0.955 on the R-squared, indicating that Skip-Fusion model can effectively carry out multimodal information fusion and has high accuracy and reliability of prediction.

    Table and Figures | Reference | Related Articles | Metrics
    Review of fine-grained image categorization
    Zhijun SHEN, Lina MU, Jing GAO, Yuanhang SHI, Zhiqiang LIU
    Journal of Computer Applications    2023, 43 (1): 51-60.   DOI: 10.11772/j.issn.1001-9081.2021122090
    Abstract414)   HTML11)    PDF (2455KB)(149)       Save

    The fine-grained image has characteristics of large intra-class variance and small inter-class variance, which makes Fine-Grained Image Categorization (FGIC) much more difficult than traditional image classification tasks. The application scenarios, task difficulties, algorithm development history and related common datasets of FGIC were described, and an overview of related algorithms was mainly presented. Classification methods based on local detection usually use operations of connection, summation and pooling, and the model training was complex and had many limitations in practical applications. Classification methods based on linear features simulated two neural pathways of human vision for recognition and localization respectively, and the classification effect is relatively better. Classification methods based on attention mechanism simulated the mechanism of human observation of external things, scanning the panorama first, and then locking the key attention area and forming the attention focus, and the classification effect was further improved. For the shortcomings of the current research, the next research directions of FGIC were proposed.

    Table and Figures | Reference | Related Articles | Metrics
    Lightweight human pose estimation based on attention mechanism
    Kun LI, Qing HOU
    Journal of Computer Applications    2022, 42 (8): 2407-2414.   DOI: 10.11772/j.issn.1001-9081.2021061103
    Abstract402)   HTML50)    PDF (876KB)(255)       Save

    To solve the problems such as large number of parameters and high computational complexity of the high-resolution human pose estimation networks, a lightweight Sandglass Coordinate Attention Network (SCANet) based on High-Resolution Network (HRNet) was proposed for human pose estimation. The Sandglass module and the Coordinate Attention (CoordAttention) module were first introduced; then two lightweight modules, the Sandglass Coordinate Attention bottleneck (SCAneck) module and the Sandglass Coordinate Attention basicblock (SCAblock) module, were built on this basis to obtain the long-range dependence and accurate position information of the spatial direction of the feature map while reducing the amount of model parameters and computational complexity. Experimental results show that with the same image resolution and environmental configuration, SCANet model reduces the number of parameters by 52.6% and the computational complexity by 60.6% compared with HRNet model on Common Objects in COntext (COCO) validation set; the number of parameters and computational complexity of SCANet model are reduced by 52.6% and 61.1% respectively compared with those of HRNet model on Max Planck Institute for Informatics (MPII) validation set; compared with common human pose estimation networks such as Stacked Hourglass Network (Hourglass), Cascaded Pyramid Network (CPN) and SimpleBaseline, SCANet model can still achieve high-precision prediction of key points of the human body with fewer parameters and lower computational complexity.

    Table and Figures | Reference | Related Articles | Metrics
    Improved practical Byzantine fault tolerance consensus algorithm based on Raft algorithm
    Jindong WANG, Qiang LI
    Journal of Computer Applications    2023, 43 (1): 122-129.   DOI: 10.11772/j.issn.1001-9081.2021111996
    Abstract397)   HTML14)    PDF (2615KB)(155)       Save

    Since Practical Byzantine Fault Tolerance (PBFT) consensus algorithm applied to consortium blockchain has the problems of insufficient scalability and high communication overhead, an improved practical Byzantine fault tolerance consensus algorithm based on Raft algorithm named K-RPBFT (K-medoids Raft based Practical Byzantine Fault Tolerance) was proposed. Firstly, blockchain was sharded based on K-medoids clustering algorithm, all nodes were divided into multiple node clusters and each node cluster constituted to a single shard, so that global consensus was improved to hierarchical multi-center consensus. Secondly, the consus between the cluster central nodes of each shard was performed by adopting PBFT algorithm, and the improved Raft algorithm based on supervision nodes was used for intra-shard consensus. The supervision mechanism in each shard gave a certain ability of Byzantine fault tolerance to Raft algorithm and improved the security of the algorithm. Experimental analysis shows that compared with PBFT algorithm, K-RPBFT algorithm greatly reduces the communication overhead and consensus latency, improves the consensus efficiency and throughput while having Byzantine fault tolerance ability, and has good scalability and dynamics, so that the consortium blockchain can be applied to a wider range of fields.

    Table and Figures | Reference | Related Articles | Metrics
    Survey of label noise learning algorithms based on deep learning
    Boyi FU, Yuncong PENG, Xin LAN, Xiaolin QIN
    Journal of Computer Applications    2023, 43 (3): 674-684.   DOI: 10.11772/j.issn.1001-9081.2022020198
    Abstract390)   HTML39)    PDF (2083KB)(274)    PDF(mobile) (733KB)(19)    Save

    In the field of deep learning, a large number of correctly labeled samples are essential for model training. However, in practical applications, labeling data requires high labeling cost. At the same time, the quality of labeled samples is affected by subjective factors or tool and technology of manual labeling, which inevitably introduces label noise in the annotation process. Therefore, existing training data available for practical applications is subject to a certain amount of label noise. How to effectively train training data with label noise has become a research hotspot. Aiming at label noise learning algorithms based on deep learning, firstly, the source, classification and impact of label noise learning strategies were elaborated; secondly, four label noise learning strategies based on data, loss function, model and training method were analyzed according to different elements of machine learning; then, a basic framework for learning label noise in various application scenarios was provided; finally, some optimization ideas were given, and challenges and future development directions of label noise learning algorithms were proposed.

    Table and Figures | Reference | Related Articles | Metrics
    Unsupervised time series anomaly detection model based on re-encoding
    Chunyong YIN, Liwen ZHOU
    Journal of Computer Applications    2023, 43 (3): 804-811.   DOI: 10.11772/j.issn.1001-9081.2022010006
    Abstract383)   HTML19)    PDF (1769KB)(173)       Save

    In order to deal with the problem of low accuracy of anomaly detection caused by data imbalance and highly complex temporal correlation of time series, a re-encoding based unsupervised time series anomaly detection model based on Generative Adversarial Network (GAN), named RTGAN (Re-encoding Time series based on GAN), was proposed. Firstly, multiple generators with cycle consistency were used to ensure the diversity of generated samples and thereby learning different anomaly patterns. Secondly, the stacked Long Short-Term Memory-dropout Recurrent Neural Network (LSTM-dropout RNN) was used to capture temporal correlation. Thirdly, the differences between the generated samples and the real samples were compared in the latent space by improved re-encoding. As the re-encoding errors, these differences were served as a part of anomaly score to improve the accuracy of anomaly detection. Finally, the new anomaly score was used to detect anomalies on univariate and multivariate time series datasets. The proposed model was compared with seven baseline anomaly detection models on univariate and multivariate time series. Experimental results show that the proposed model obtains the highest average F1-score (0.815) on all datasets. And the overall performance of the proposed model is 36.29% and 8.52% respectively higher than those of the original AutoEncoder (AE) model Dense-AE (Dense-AutoEncoder) and latest benchmark model USAD (UnSupervised Anomaly Detection on multivariate time series). The robustness of the model was detected by different Signal-to-Noise Ratio (SNR). The results show that the proposed model consistently outperforms LSTM-VAE (Variational Autoencoder based on LSTM), USAD and OmniAnomaly, especially in the case of 30% SNR, the F1-score of RTGAN is 13.53% and 10.97% respectively higher than those of USAD and OmniAnomaly. It can be seen that RTGAN can effectively improve the accuracy and robustness of anomaly detection.

    Table and Figures | Reference | Related Articles | Metrics
    Adversarial example generation method based on image flipping transform
    Bo YANG, Hengwei ZHANG, Zheming LI, Kaiyong XU
    Journal of Computer Applications    2022, 42 (8): 2319-2325.   DOI: 10.11772/j.issn.1001-9081.2021060993
    Abstract376)   HTML47)    PDF (1609KB)(217)       Save

    In the face of adversarial example attack, deep neural networks are vulnerable. These adversarial examples result in the misclassification of deep neural networks by adding human-imperceptible perturbations on the original images, which brings a security threat to deep neural networks. Therefore, before the deployment of deep neural networks, the adversarial attack is an important method to evaluate the robustness of models. However, under the black-box setting, the attack success rates of adversarial examples need to be improved, that is, the transferability of adversarial examples need to be increased. To address this issue, an adversarial example method based on image flipping transform, namely FT-MI-FGSM (Flipping Transformation Momentum Iterative Fast Gradient Sign Method), was proposed. Firstly, from the perspective of data augmentation, in each iteration of the adversarial example generation process, the original input image was flipped randomly. Then, the gradient of the transformed images was calculated. Finally, the adversarial examples were generated based on this gradient, so as to alleviate the overfitting in the process of adversarial example generation and to improve the transferability of adversarial examples. In addition, the method of attacking ensemble models was used to further enhance the transferability of adversarial examples. Extensive experiments on ImageNet dataset demonstrated the effectiveness of the proposed algorithm. Compared with I-FGSM (Iterative Fast Gradient Sign Method) and MI-FGSM (Momentum I-FGSM), the average black-box attack success rate of FT-MI-FGSM on the adversarially training networks is improved by 26.0 and 8.4 percentage points under the attacking ensemble model setting, respectively.

    Table and Figures | Reference | Related Articles | Metrics
    Real‑time detection method of traffic information based on lightweight YOLOv4
    Keyou GUO, Xue LI, Min YANG
    Journal of Computer Applications    2023, 43 (1): 74-80.   DOI: 10.11772/j.issn.1001-9081.2021101849
    Abstract373)   HTML12)    PDF (3019KB)(244)       Save

    Aiming at the problem of vehicle objection detection in daily road scenes, a real?time detection method of traffic information based on lightweight YOLOv4 (You Only Look Once version 4) was proposed. Firstly, a multi?scene and multi?period vehicle object dataset was constructed, which was preprocessed by K?means++ algorithm. Secondly, a lightweight YOLOv4 detection model was proposed, in which the backbone network was replaced by MobileNet?v3 to reduce the number of parameters of the model, and the depth separable convolution was introduced to replace the standard convolution in the original network. Finally, combined with label smoothing and annealing cosine algorithms, the activation function Leaky Rectified Linear Unit (LeakyReLU) was used to replace the original activation function in the shallow network of MobileNet?v3 in order to optimize the convergence effect of the model. Experimental results show that the lightweight YOLOv4 has the weight file of 56.4 MB, the detection rate of 85.6 FPS (Frames Per Second), and the detection precision of 93.35%, verifying that the proposed method can provide the reference for the real?time traffic information detection and its applications in real road scenes.

    Table and Figures | Reference | Related Articles | Metrics
    Semi-supervised representation learning method combining graph auto-encoder and clustering
    Hangyuan DU, Sicong HAO, Wenjian WANG
    Journal of Computer Applications    2022, 42 (9): 2643-2651.   DOI: 10.11772/j.issn.1001-9081.2021071354
    Abstract363)   HTML52)    PDF (1000KB)(282)       Save

    Node label is widely existed supervision information in complex networks, and it plays an important role in network representation learning. Based on this fact, a Semi-supervised Representation Learning method combining Graph Auto-Encoder and Clustering (GAECSRL) was proposed. Firstly, the Graph Convolutional Network (GCN) and inner product function were used as the encoder and the decoder respectively, and the graph auto-encoder was constructed to form an information dissemination framework. Then, the k-means clustering module was added to the low-dimensional representation generated by the encoder, so that the training process of the graph auto-encoder and the category classification of the nodes were used to form a self-supervised mechanism. Finally, the category classification of the low-dimensional representation of the network was guided by using the discriminant information of the node labels. The network representation generation, category classification, and the training of the graph auto-encoder were built into a unified optimization model, and an effective network representation result that integrates node label information was obtained. In the simulation experiment, the GAECSRL method was used for node classification and link prediction tasks. Experimental results show that compared with DeepWalk, node2vec, learning Graph Representations with global structural information (GraRep), Structural Deep Network Embedding (SDNE) and Planetoid (Predicting labels and neighbors with embeddings transductively or inductively from data), GAECSRL has the Micro?F1 index increased by 0.9 to 24.46 percentage points, and the Macro?F1 index increased by 0.76 to 24.20 percentage points in the node classification task; in the link prediction task, GAECSRL has the AUC (Area under Curve) index increased by 0.33 to 9.06 percentage points, indicating that the network representation results obtained by GAECSRL effectively improve the performance of node classification and link prediction tasks.

    Table and Figures | Reference | Related Articles | Metrics
    Machine reading comprehension model based on event representation
    Yuanlong WANG, Xiaomin LIU, Hu ZHANG
    Journal of Computer Applications    2022, 42 (7): 1979-1984.   DOI: 10.11772/j.issn.1001-9081.2021050719
    Abstract348)   HTML71)    PDF (916KB)(274)       Save

    In order to truly understand a piece of text, it is very important to grasp the main clues of the original text in the process of reading comprehension. Aiming at the questions of main clues in machine reading comprehension, a machine reading comprehension method based on event representation was proposed. Firstly, the textual event graph including the representation of events, the extraction of event elements and the extraction of event relations was extracted from the reading material by clue phrases. Secondly, after considering the time elements, emotional elements of events and the importance of each word in the document, the TextRank algorithm was used to select the events related to the clues. Finally, the answers of the questions were constructed based on the selected clue events. Experimental results show that on the test set composed of the collected 339 questions of clues, the proposed method is better than the sentence ranking method based on TextRank algorithm on BiLingual Evaluation Understudy (BLEU) and Consensus-based Image Description Evaluation (CIDEr) evaluation indexes. In specific, BLEU-4 index is increased by 4.1 percentage points and CIDEr index is increased by 9 percentage points.

    Table and Figures | Reference | Related Articles | Metrics
    Ship detection algorithm based on improved RetinaNet
    Wenjun FAN, Shuguang ZHAO, Lizheng GUO
    Journal of Computer Applications    2022, 42 (7): 2248-2255.   DOI: 10.11772/j.issn.1001-9081.2021050831
    Abstract348)   HTML7)    PDF (4946KB)(91)    PDF(mobile) (3371KB)(48)    Save

    At present, the target detection technology based on deep learning algorithm has achieved the remarkable results in ship detection of Synthetic Aperture Radar (SAR) images. However, there is still the problem of poor detection effect of small target ships and densely arranged ships near shore. To solve the above problem, a new ship detection algorithm based on improved RetinaNet was proposed. On the basis of traditional RetinaNet algorithm, firstly, the convolution in the residual block of feature extraction network was improved to grouped convolution, thereby increasing the network width and improving the feature extraction ability of the network. Then, the attention mechanism was added in the last two stages of feature extraction network to make the network more focus on the target area and improve the target detection ability. Finally, the Soft Non-Maximum Suppression (Soft-NMS) was added to the algorithm to reduce the missed detection rate of the algorithm for the detection of densely arranged ships near shore. Experimental results on High-Resolution SAR Images Dataset (HRSID) and SAR Ship Detection Dataset (SSDD) show that, the proposed algorithm effectively improves the detection effect of small target ships and near-shore ships, is superior in detection precision and speed compared with the current excellent object detection models such as Faster Region-based Convolutional Neural Network (R-CNN), You Only Look Once version 3 (YOLOv3) and CenterNet.

    Table and Figures | Reference | Related Articles | Metrics
    Survey on interpretability research of deep learning
    Lingmin LI, Mengran HOU, Kun CHEN, Junmin LIU
    Journal of Computer Applications    2022, 42 (12): 3639-3650.   DOI: 10.11772/j.issn.1001-9081.2021091649
    Abstract347)   HTML33)    PDF (4239KB)(278)       Save

    In recent years, deep learning has been widely used in many fields. However, due to the highly nonlinear operation of deep neural network models, the interpretability of these models is poor, these models are often referred to as “black box” models, and cannot be applied to some key fields with high performance requirements. Therefore, it is very necessary to study the interpretability of deep learning. Firstly, deep learning was introduced briefly. Then, around the interpretability of deep learning, the existing research work was analyzed from eight aspects, including hidden layer visualization, Class Activation Mapping (CAM), sensitivity analysis, frequency principle, robust disturbance test, information theory, interpretable module and optimization method. At the same time, the applications of deep learning in the fields of network security, recommender system, medical and social networks were demonstrated. Finally, the existing problems and future development directions of deep learning interpretability research were discussed.

    Table and Figures | Reference | Related Articles | Metrics
    Bimodal emotion recognition method based on graph neural network and attention
    Lubao LI, Tian CHEN, Fuji REN, Beibei LUO
    Journal of Computer Applications    2023, 43 (3): 700-705.   DOI: 10.11772/j.issn.1001-9081.2022020216
    Abstract346)   HTML38)    PDF (1917KB)(271)       Save

    Considering the issues of physiological signal emotion recognition, a bimodal emotion recognition method based on Graph Neural Network (GNN) and attention was proposed. Firstly, the GNN was used to classify ElectroEncephaloGram (EEG) signals. Secondly, an attention-based Bi-directional Long Short-Term Memory (Bi-LSTM) network was used to classify ElectroCardioGram (ECG) signals. Finally, the results of EEG and ECG classification were fused by Dempster-Shafer evidence theory, thus improving the comprehensive performance of the emotion recognition task. To verify the effectiveness of the proposed method, 20 subjects were invited to participate in the emotion elicitation experiment, and the EEG signals and ECG signals of the subjects were collected. Experimental results show that the binary classification accuracies of the proposed method are 91.82% and 88.24% in the valence dimension and arousal dimension, respectively, which are 2.65% and 0.40% higher than those of the single-modal EEG method respectively, and are 19.79% and 24.90% higher than those of the single-modal ECG method respectively. It can be seen that the proposed method can effectively improve the accuracy of emotion recognition and provide decision support for medical diagnosis and other fields.

    Table and Figures | Reference | Related Articles | Metrics
    Animation video generation model based on Chinese impressionistic style transfer
    Wentao MAO, Guifang WU, Chao WU, Zhi DOU
    Journal of Computer Applications    2022, 42 (7): 2162-2169.   DOI: 10.11772/j.issn.1001-9081.2021050836
    Abstract345)   HTML6)    PDF (5691KB)(83)       Save

    At present, Generative Adversarial Network (GAN) has been used for image animation style transformation. However, most of the existing GAN-based animation generation models mainly focus on the extraction and generation of realistic style with the targets of Japanese animations and American animations. Very little attention of the model is paid to the transfer of impressionistic style in Chinese-style animations, which limits the application of GAN in the domestic animation production market. To solve the problem, a new Chinese-style animation GAN model, namely Chinese Cartoon GAN (CCGAN), was proposed for the automatic generation of animation videos with Chinese impressionistic style by integrating Chinese impressionistic style into GAN model. Firstly, by adding the inverted residual blocks into the generator, a lightweight deep neural network model was constructed to reduce the computational cost of video generation. Secondly, in order to extract and transfer the characteristics of Chinese impressionistic style, such as sharp image edges, abstract content structure and stroke lines with ink texture, the gray-scale style loss and color reconstruction loss were constructed in the generator to constrain the high-level semantic consistency in style between the real images and the Chinese-style sample images. Moreover, in the discriminator, the gray-scale adversarial loss and edge-promoting adversarial loss were constructed to constrain the reconstructed image for maintaining the same edge characteristics of the sample images. Finally, the Adam algorithm was used to minimize the above loss functions to realize style transfer, and the reconstructed images were combined into video. Experimental results show that, compared with the current representative style transfer models such as CycleGAN and CartoonGAN, the proposed CCGAN can effectively learn the Chinese impressionistic style from Chinese-style animations such as Chinese Choir and significantly reduce the computational cost, indicating that the proposed CCGAN is suitable for the rapid generation of animation videos with large quantities.

    Table and Figures | Reference | Related Articles | Metrics
    Decision optimization of traffic scenario problem based on reinforcement learning
    Fei LUO, Mengwei BAI
    Journal of Computer Applications    2022, 42 (8): 2361-2368.   DOI: 10.11772/j.issn.1001-9081.2021061012
    Abstract338)   HTML17)    PDF (735KB)(131)       Save

    The traditional reinforcement learning algorithm has limitations in convergence speed and solution accuracy when solving the taxi path planning problem and the traffic signal control problem in traffic scenarios. Therefore, an improved reinforcement learning algorithm was proposed to solve this kind of problems. Firstly, by applying the optimized Bellman equation and Speedy Q-Learning (SQL) mechanism, and introducing experience pool technology and direct strategy, an improved reinforcement learning algorithm, namely Generalized Speedy Q-Learning with Direct Strategy and Experience Pool (GSQL-DSEP), was proposed. Then, GSQL-DSEP algorithm was applied to optimize the path length in the taxi path planning decision problem and the total waiting time of vehicles in the traffic signal control problem. The error of GSQL-DSEP algorithm was reduced at least 18.7% than those of the algorithms such as Q-learning, SQL, Generalized Speedy Q-Learning (GSQL) and Dyna-Q, the decision path length determined by GSQL-DSEP algorithm was reduced at least 17.4% than those determined by the compared algorithms, and the total waiting time of vehicles determined by GSQL-DSEP algorithm was reduced at most 51.5% than those determined by compared algorithms for the traffic signal control problem. Experimental results show that, GSQL-DSEP algorithm has advantages in solving traffic scenario problems over the compared algorithms.

    Table and Figures | Reference | Related Articles | Metrics
    Traffic sign detection algorithm based on improved attention mechanism
    Xinyu ZHANG, Sheng DING, Zhipei YANG
    Journal of Computer Applications    2022, 42 (8): 2378-2385.   DOI: 10.11772/j.issn.1001-9081.2021061005
    Abstract332)   HTML27)    PDF (1664KB)(219)       Save

    In some scenes, the low resolution, coverage and other environmental factors of traffic signs lead to missed and false detections in object detection tasks. Therefore, a traffic sign detection algorithm based on improved attention mechanism was proposed. First of all, in response to the problem of low image resolution due to damage, lighting and other environmental impacts of traffic signs, which leaded to the limited extraction of image feature information by the network, an attention module was added to the backbone network to enhance the key features of the object area. Secondly, the local features between adjacent channels in the feature map had a certain correlation due to the overlap of the receptive fields, a one-dimensional convolution of size k was used to replace the fully connected layer in the channel attention module to aggregate different channel information and reduce the number of additional parameters. Finally, the receptive field module was introduced in the medium- and small-scale feature layers of Path Aggregation Network (PANet) to increase the receptive field of the feature map to fuse the context information of the object area and improve the network’s ability to detect traffic signs. Experimental results on CSUST Chinese Traffic Sign Detection Benchmark (CCTSDB) dataset show that the proposed improved You Only Look Once v4 (YOLOv4) algorithm achieve an average detection speed with a small amount of parameters introduced and the detection speed is not much different from that of the original algorithm. The mean Accuracy Precision (mAP) reached 96.88%, which was increased by 1.48%; compared with the lightweight network YOLOv5s, with the single frame detection speed of 10?ms slower, the mAP of the proposed algorithm is 3.40 percentage points higher than that of YOLOv5s, and the speed reached 40?frame/s, indicating that the algorithm meets the real-time requirements of object detection completely.

    Table and Figures | Reference | Related Articles | Metrics
    Named entity recognition method combining multiple semantic features
    Yayao ZUO, Haoyu CHEN, Zhiran CHEN, Jiawei HONG, Kun CHEN
    Journal of Computer Applications    2022, 42 (7): 2001-2008.   DOI: 10.11772/j.issn.1001-9081.2021050861
    Abstract330)   HTML20)    PDF (2326KB)(147)       Save

    Aiming at the common non-linear relationship between characters in languages, in order to capture richer semantic features, a Named Entity Recognition (NER) method based on Graph Convolutional Network (GCN) and self-attention mechanism was proposed. Firstly, with the help of the effective extraction ability of character features of deep learning methods, the GCN was used to learn the global semantic features between characters, and the Bidirectional Long Short-Term Memory network (BiLSTM) was used to extract the context-dependent features of the characters. Secondly, the above features were fused and their internal importance was calculated by introducing a self-attention mechanism. Finally, the Conditional Random Field (CRF) was used to decode the optimal coding sequence from the fused features, which was used as the result of entity recognition. Experimental results show that compared with the method that only uses BiLSTM or CRF, the proposed method has the recognition precision increased by 2.39% and 15.2% respectively on MicroSoft Research Asia (MSRA) dataset and Biomedical Natural Language Processing/Natural Language Processing in Biomedical Applications (BioNLP/NLPBA) 2004 dataset, indicating that this method has good sequence labeling capability on both Chinese and English datasets, and has strong generalization capability.

    Table and Figures | Reference | Related Articles | Metrics
    Graph convolutional network method based on hybrid feature modeling
    Zhuoran LI, Zhonglin YE, Haixing ZHAO, Jingjing LIN
    Journal of Computer Applications    2022, 42 (11): 3354-3363.   DOI: 10.11772/j.issn.1001-9081.2021111981
    Abstract329)   HTML14)    PDF (3410KB)(96)       Save

    For the complex information contained in the network, more ways are needed to extract useful information from it, but the relevant characteristics in the network cannot be completely described by the existing single?feature Graph Neural Network (GNN). To resolve the above problems, a Hybrid feature?based Dual Graph Convolutional Network (HDGCN) was proposed. Firstly, the structure feature vectors and semantic feature vectors of nodes were obtained by Graph Convolutional Network (GCN). Secondly, the features of nodes were aggregated selectively so that the feature expression ability of nodes was enhanced by the aggregation function based on attention mechanism or gating mechanism. Finally, the hybrid feature vectors of nodes were gained by the fusion mechanism based on a feasible dual?channel GCN, and the structure features and semantic features of nodes were modeled jointly to make the features be supplement for each other and promote the method's performance on subsequent machine learning tasks. Verification was performed on the datasets CiteSeer, DBLP (DataBase systems and Logic Programming) and SDBLP (Simplified DataBase systems and Logic Programming). Experimental results show that compared with the graph convolutional network model based on structure feature training, the dual channel graph convolutional network model based on hybrid feature training has the average value of Micro?F1 increased by 2.43, 2.14, 1.86 and 2.13 percentage points respectively, and the average value of Macro?F1 increased by 1.38, 0.33, 1.06 and 0.86 percentage points respectively when the training set proportion is 20%, 40%, 60% and 80%. The difference in accuracy is no more than 0.5 percentage points when using concat or mean as the fusion strategy, which shows that both concat and mean can be used as the fusion strategy. HDGCN has higher accuracy on node classification and clustering tasks than models trained by structure or semantic network alone, and has the best results when the output dimension is 64, the learning rate is 0.001, the graph convolutional layer number is 2 and the attention vector dimension is 128.

    Table and Figures | Reference | Related Articles | Metrics
    Handwritten English text recognition based on convolutional neural network and Transformer
    Xianjie ZHANG, Zhiming ZHANG
    Journal of Computer Applications    2022, 42 (8): 2394-2400.   DOI: 10.11772/j.issn.1001-9081.2021091564
    Abstract322)   HTML39)    PDF (703KB)(186)       Save

    Handwritten text recognition technology can transcribe handwritten documents into editable digital documents. However, due to the problems of different writing styles, ever-changing document structures and low accuracy of character segmentation recognition, handwritten English text recognition based on neural networks still faces many challenges. To solve the above problems, a handwritten English text recognition model based on Convolutional Neural Network (CNN) and Transformer was proposed. Firstly, CNN was used to extract features from the input image. Then, the features were input into the Transformer encoder to obtain the prediction of each frame of the feature sequence. Finally, the Connectionist Temporal Classification (CTC) decoder was used to obtain the final prediction result. A large number of experiments were conducted on the public Institut für Angewandte Mathematik (IAM) handwritten English word dataset. Experimental results show that this model obtains a Character Error Rate (CER) of 3.60% and a Word Error Rate (WER) of 12.70%, which verify the feasibility of the proposed model.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-UAV real-time tracking algorithm based on improved PP-YOLO and Deep-SORT
    Jun MA, Zhen YAO, Cuifeng XU, Shouhong CHEN
    Journal of Computer Applications    2022, 42 (9): 2885-2892.   DOI: 10.11772/j.issn.1001-9081.2021071146
    Abstract317)   HTML8)    PDF (2914KB)(369)       Save

    The target size of the Unmanned Aerial Vehicle (UAV) is small, and the characteristics among multiple UAVs are not obvious. At the same time, the interference of birds and flying insects brings a huge challenge to the accurate detection and stable tracking of the UAV targets. Aiming at the problem of poor detection performance and unstable tracking of small target UAVs by using traditional target detection algorithms, a real-time tracking algorithm for multiple UAVs based on improved PaddlePaddle-YOLO (PP-YOLO) and Simple Online and Realtime Tracking with a Deep association metric (Deep-SORT) was proposed. Firstly, the squeeze-excitation module was integrated into PP-YOLO detection algorithm to achieve feature extraction and detection of UAV targets. Secondly, the Mish activation function was introduced into ResNet50-vd structure to solve the problem of vanishing gradient in the back propagation process and further improve the detection precision. Thirdly, Deep-SORT algorithm was used to track UAV targets in real time, and the backbone network that extracts appearance features was replaced with ResNet50, thereby improving the original network’s weak perceptual ability of small appearances. Finally, the loss function Margin Loss was introduced, which not only improved the class separability, but also strengthened the tightness within the class and the difference between classes. Experimental results show that the detection mean Average Precision (mAP) of the proposed algorithm is increased by 2.27 percentage points compared to that of the original PP-YOLO algorithm, and the tracking accuracy of the proposed algorithm is increased by 4.5 percentage points compared to that of the original Deep-SORT algorithm. The proposed algorithm has a tracking accuracy of 91.6%, can track multiple UAV targets within 600 m in real time, and effectively solves the problem of "frame loss" in the tracking process.

    Table and Figures | Reference | Related Articles | Metrics
    Novelty detection method based on dual autoencoders and Transformer network
    Jiahang ZHOU, Hongjie XING
    Journal of Computer Applications    2023, 43 (1): 22-29.   DOI: 10.11772/j.issn.1001-9081.2021111983
    Abstract312)   HTML15)    PDF (1853KB)(145)       Save

    AutoEncoder (AE) based novelty detection method utilizes reconstruction error to classify the test samples to be normal or novel data. However, the above method produces very close reconstruction errors on normal data and novel data. Therefore, some novel data are easy to be misclassified as normal data. To solve the above problem, a novelty detection method composed of two parallel AEs and one Transformer network was proposed, namely Novelty Detection based on Dual Autoencoders and Transformer Network (DATN-ND). Firstly, the bottleneck features of input samples were used by Transformer network to generate the bottleneck features with pseudo-novel data, thereby increasing the novel data information in the training set. Secondly, the bottleneck features with novel data information were reconstructed by the dual AEs to normal data as much as possible, increasing the reconstruction error difference between novel and normal data. Compared with MemAE (Memory-augmented AE), DATN-ND has the Area Under the Receiver Operating Characteristic curve (AUC) improved by 6.8 percentage points, 12.0 percentage points, and 2.5 percentage points respectively on MNIST, Fashion-MNIST, and CIFAR-10 datasets. Experimental results show that DATN-ND can effectively make the difference of reconstruction error between normal data and abnormal data bigger.

    Table and Figures | Reference | Related Articles | Metrics
    Key node mining in complex network based on improved local structural entropy
    Peng LI, Shilin WANG, Guangwu CHEN, Guanghui YAN
    Journal of Computer Applications    2023, 43 (4): 1109-1114.   DOI: 10.11772/j.issn.1001-9081.2022040562
    Abstract312)   HTML23)    PDF (1367KB)(151)       Save

    The identification of key nodes in complex network plays an important role in the optimization of network structure and effective propagation of information. Local structural Entropy (LE) can be used to identify key nodes by using the influence of the local network on the whole network instead of the influence of nodes on the whole network. However, the cases of the highly aggregative network and nodes forming a loop with neighbor nodes are not considered in LE, which leads to some limitations. To address these limitations, firstly, an improved LE based node importance evaluation method, namely PLE (Penalized Local structural Entropy), was proposed, in which based on the LE, the Clustering Coefficient (CC) was introduced as a penalty term to penalize the highly aggregative nodes in the network appropriately. Secondly, due to the fact that the penalty of PLE penalizing the nodes in triadic closure structure is too much, an improved method of PLE, namely PLEA (Penalized Local structural Entropy Advancement) was proposed, in which control coefficient was introduced in front of the penalty term to control the penalty strength. Selective attack experiments on five real networks with different sizes were conducted. Experimental results show that in the western US states grid and the US Airlines, PLEA has the identification accuracy improved by 26.3% and 3.2% compared with LE respectively, by 380% and 5.43% compared with K-Shell (KS) method respectively, and by 14.4% and 24% compared with DCL (Degree and Clustering coefficient and Location) method respectively. The key nodes identified by PLEA can cause more damage to the network, verifying the rationality of introducing the CC as a penalty term, and the effectiveness and superiority of PLEA. The integration of the number of neighbors and the local network structure of nodes with the simplicity of computation makes it more effective in describing the reliability and invulnerability of large-scale networks.

    Table and Figures | Reference | Related Articles | Metrics
    Multi-scale object detection algorithm based on improved YOLOv3
    Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI
    Journal of Computer Applications    2022, 42 (8): 2423-2431.   DOI: 10.11772/j.issn.1001-9081.2021060984
    Abstract311)   HTML18)    PDF (1714KB)(169)       Save

    In order to further improve the speed and precision of multi-scale object detection, and to solve the situations such as miss detection, wrong detection and repeated detection caused by small object detection, an object detection algorithm based on improved You Only Look Once v3 (YOLOv3) was proposed to realize automatic detection of multi-scale object. Firstly, the network structure was improved in the feature extraction network, and the attention mechanism was introduced into the spatial dimensions of residual module to pay attention to small objects. Then, Dense Convulutional Network (DenseNet) was used to fully integrate shallow information of the network, and the depthwise separable convolution was used to replace the normal convolution of the backbone network, thereby reducing the number of model parameters and improving the detection speed. In the feature fusion network, the bidirectional fusion of the shallow and deep features was realized through the bidirectional feature pyramid structure, and the 3-scale prediction was changed to 4-scale prediction, which improved the learning ability of multi-scale features. In terms of loss function, Generalized Intersection over Union (GIoU) was selected as the loss function, so that the precision of identifying objects was increased, and the object miss rate was reduced. Experimental results show that on Pascal VOC datasets, the mean Average Precision (mAP) of the improved YOLOv3 algorithm is as high as 83.26%, which is 5.89 percentage points higher than that of the original YOLOv3 algorithm, and the detection speed of the improved algorithm reaches 22.0 frame/s. Compared with the original YOLOv3 algorithm on Common Objects in COntext (COCO) dataset, the improved algorithm has the mAP improved by 3.28 percentage points. At the same time, in multi-scale object detection, the mAP of the algorithm has been improved, which verifies the effectiveness of the object detection algorithm based on the improved YOLOv3.

    Table and Figures | Reference | Related Articles | Metrics
2023 Vol.43 No.5

Current Issue
Archive
Honorary Editor-in-Chief: ZHANG Jingzhong
Editor-in-Chief: XU Zongben
Associate Editor: SHEN Hengtao XIA Zhaohui
Domestic Post Distribution Code: 62-110
Foreign Distribution Code: M4616
Address:
No. 9, 4th Section of South Renmin Road, Chengdu 610041, China
Tel: 028-85224283-803
  028-85222239-803
Website: www.joca.cn
E-mail: bjb@joca.cn
WeChat
Join CCF