Artificial intelligence

Select

Knowledge graph survey： representation， construction， reasoning and knowledge hypergraph theory

TIAN Ling, ZHANG Jinchuan, ZHANG Jinhao, ZHOU Wangtao, ZHOU Xue

Journal of Computer Applications 2021, 41 (8): 2161-2186. DOI: 10.11772/j.issn.1001-9081.2021040662

Abstract （3261）

PDF （2811KB）（4260）

Save

Knowledge Graph (KG) strongly support the research of knowledge-driven artificial intelligence. Aiming at this fact, the existing technologies of knowledge graph and knowledge hypergraph were analyzed and summarized. At first, from the definition and development history of knowledge graph, the classification and architecture of knowledge graph were introduced. Second, the existing knowledge representation and storage methods were explained. Then, based on the construction process of knowledge graph, several knowledge graph construction techniques were analyzed. Specifically, aiming at the knowledge reasoning, an important part of knowledge graph, three typical knowledge reasoning approaches were analyzed, which are logic rule-based, embedding representation-based, and neural network-based. Furthermore, the research progress of knowledge hypergraph was introduced along with heterogeneous hypergraph. To effectively present and extract hyper-relational characteristics and realize the modeling of hyper-relation data as well as the fast knowledge reasoning, a three-layer architecture of knowledge hypergraph was proposed. Finally, the typical application scenarios of knowledge graph and knowledge hypergraph were summed up, and the future researches were prospected.

Reference | Related Articles | Metrics

Select

Review of event causality extraction based on deep learning

WANG Zhujun, WANG Shi, LI Xueqing, ZHU Junwu

Journal of Computer Applications 2021, 41 (5): 1247-1255. DOI: 10.11772/j.issn.1001-9081.2020071080

Abstract （3082）

PDF （1460KB）（4007）

Save

Causality extraction is a kind of relation extraction task in Natural Language Processing (NLP), which mines event pairs with causality from text by constructing event graph, and play important role in applications of finance, security, biology and other fields. Firstly, the concepts such as event extraction and causality were introduced, and the evolution of mainstream methods and the common datasets of causality extraction were described. Then, the current mainstream causality extraction models were listed. Based on the detailed analysis of pipeline based models and joint extraction models, the advantages and disadvantages of various methods and models were compared. Furthermore, the experimental performance and related experimental data of the models were summarized and analyzed. Finally, the research difficulties and future key research directions of causality extraction were given.

Reference | Related Articles | Metrics

Select

Federated learning survey：concepts， technologies， applications and challenges

Tiankai LIANG, Bi ZENG, Guang CHEN

Journal of Computer Applications 2022, 42 (12): 3651-3662. DOI: 10.11772/j.issn.1001-9081.2021101821

Abstract （2994）

HTML （205）

PDF （2464KB）（2213）

Save

Under the background of emphasizing data right confirmation and privacy protection， federated learning， as a new machine learning paradigm， can solve the problem of data island and privacy protection without exposing the data of all participants. Since the modeling methods based on federated learning have become mainstream and achieved good effects at present， it is significant to summarize and analyze the concepts， technologies， applications and challenges of federated learning. Firstly， the development process of machine learning and the inevitability of the appearance of federated learning were elaborated， and the definition and classification of federated learning were given. Secondly， three federated learning methods （including horizontal federated learning， vertical federated learning and federated transfer learning） which were recognized by the industry currently were introduced and analyzed. Thirdly， concerning the privacy protection issue of federated learning， the existing common privacy protection technologies were generalized and summarized. In addition， the recent mainstream open-source frameworks were introduced and compared， and the application scenarios of federated learning were given at the same time. Finally， the challenges and future research directions of federated learning were prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Summarization of natural language generation

LI Xueqing, WANG Shi, WANG Zhujun, ZHU Junwu

Journal of Computer Applications 2021, 41 (5): 1227-1235. DOI: 10.11772/j.issn.1001-9081.2020071069

Abstract （2787）

PDF （1165KB）（3908）

Save

Natural Language Generation (NLG) technologies use artificial intelligence and linguistic methods to automatically generate understandable natural language texts. The difficulty of communication between human and computer is reduced by NLG, which is widely used in machine news writing, chatbot and other fields, and has become one of the research hotspots of artificial intelligence. Firstly, the current mainstream methods and models of NLG were listed, and the advantages and disadvantages of these methods and models were compared in detail. Then, aiming at three NLG technologies:text-to-text, data-to-text and image-to-text, the application fields, existing problems and current research progresses were summarized and analyzed respectively. Furthermore, the common evaluation methods and their application scopes of the above generation technologies were described. Finally, the development trends and research difficulties of NLG technologies were given.

Reference | Related Articles | Metrics

Select

Survey of communication overhead of federated learning

Xinyuan QIU, Zecong YE, Xiaolong CUI, Zhiqiang GAO

Journal of Computer Applications 2022, 42 (2): 333-342. DOI: 10.11772/j.issn.1001-9081.2021020232

Abstract （1980）

HTML （305）

PDF （1356KB）（2552）

Save

To solve the irreconcilable contradiction between data sharing demands and requirements of privacy protection， federated learning was proposed. As a distributed machine learning， federated learning has a large number of model parameters needed to be exchanged between the participants and the central server， resulting in higher communication overhead. At the same time， federated learning is increasingly deployed on mobile devices with limited communication bandwidth and limited power， and the limited network bandwidth and the sharply raising client amount will make the communication bottleneck worse. For the communication bottleneck problem of federated learning， the basic workflow of federated learning was analyzed at first， and then from the perspective of methodology， three mainstream types of methods based on frequency reduction of model updating， model compression and client selection respectively as well as special methods such as model partition were introduced， and a deep comparative analysis of specific optimization schemes was carried out. Finally， the development trends of federated learning communication overhead technology research were summarized and prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of multi-modal medical image segmentation based on deep learning

Meng DOU, Zhebin CHEN, Xin WANG, Jitao ZHOU, Yu YAO

Journal of Computer Applications 2023, 43 (11): 3385-3395. DOI: 10.11772/j.issn.1001-9081.2022101636

Abstract （1881）

HTML （93）

PDF （3904KB）（2333）

Save

Multi-modal medical images can provide clinicians with rich information of target areas （such as tumors， organs or tissues）. However， effective fusion and segmentation of multi-modal images is still a challenging problem due to the independence and complementarity of multi-modal images. Traditional image fusion methods have difficulty in addressing this problem， leading to widespread research on deep learning-based multi-modal medical image segmentation algorithms. The multi-modal medical image segmentation task based on deep learning was reviewed in terms of principles， techniques， problems， and prospects. Firstly， the general theory of deep learning and multi-modal medical image segmentation was introduced， including the basic principles and development processes of deep learning and Convolutional Neural Network （CNN）， as well as the importance of the multi-modal medical image segmentation task. Secondly， the key concepts of multi-modal medical image segmentation was described， including data dimension， preprocessing， data enhancement， loss function， and post-processing， etc. Thirdly， different multi-modal segmentation networks based on different fusion strategies were summarized and analyzed. Finally， several common problems in medical image segmentation were discussed， the summary and prospects for future research were given.

Table and Figures | Reference | Related Articles | Metrics

Select

Time series classification by LSTM based on multi-scale convolution and attention mechanism

Yinglü XUAN, Yuan WAN, Jiahui CHEN

Journal of Computer Applications 2022, 42 (8): 2343-2352. DOI: 10.11772/j.issn.1001-9081.2021061062

Abstract （1845）

HTML （77）

PDF （711KB）（915）

Save

The multi-scale features of time series contain abundant category information which has different importance for classification. However， the existing univariate time series classification models conventionally extract series features by convolutions with a fixed kernel size， resulting in being unable to acquire and focus on important multi-scale features effectively. In order to solve the above problem， a Multi-scale Convolution and Attention mechanism （MCA） based Long Short-Term Memory （LSTM） model （MCA-LSTM） was proposed， which was capable of concentrating and fusing important multi-scale features to achieve more accurate classification effect. In this structure， by using LSTM， the transmission of series information was controlled through memory cells and gate mechanism， and the correlation information of time series was extracted fully； by using Multi-scale Convolution Module （MCM）， the multi-scale features of the series were extracted through Convolutional Neural Networks （CNNs） with different kernel sizes； by using Attention Module （AM）， the channel information was fused to obtain the importance of features and assign attention weights， which enabled the network to focus on important time series features. Experimental results on 65 univariate time series datasets of UCR archive show that compared with the state-of-the-art time series classification methods： Unsupervised Scalable Representation Learning-FordA （USRL-FordA）， Unsupervised Scalable Representation Learning-Combined （1-Nearest Neighbor）（USRL-Combined （1-NN））， Omni-Scale Convolutional Neural Network （OS-CNN）， Inception-Time and Robust Temporal Feature Network for time series classification （RTFN），MCA-LSTM has the Mean Error （ME） reduced by 7.48， 9.92， 2.43， 2.09 and 0.82 percentage points， respectively； and achieved the highest Arithmetic Mean Rank （AMR） and Geometric Mean Rank （GMR）， which are 2.14 and 3.23 respectively. These results fully demonstrate the effectiveness of MCA-LSTM in the classification of univariate time series.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of multimodal pre-training models

Huiru WANG, Xiuhong LI, Zhe LI, Chunming MA, Zeyu REN, Dan YANG

Journal of Computer Applications 2023, 43 (4): 991-1004. DOI: 10.11772/j.issn.1001-9081.2022020296

Abstract （1733）

HTML （148）

PDF （5539KB）（1401）

PDF（mobile）（3280KB）（111）

Save

By using complex pre-training targets and a large number of model parameters， Pre-Training Model （PTM） can effectively obtain rich knowledge from unlabeled data. However， the development of the multimodal PTMs is still in its infancy. According to the difference between modals， most of the current multimodal PTMs were divided into the image-text PTMs and video-text PTMs. According to the different data fusion methods， the multimodal PTMs were divided into two types： single-stream models and two-stream models. Firstly， common pre-training tasks and downstream tasks used in validation experiments were summarized. Secondly， the common models in the area of multimodal pre-training were sorted out， and the downstream tasks of each model and the performance and experimental data of the models were listed in tables for comparison. Thirdly， the application scenarios of M6 （Multi-Modality to Multi-Modality Multitask Mega-transformer） model， Cross-modal Prompt Tuning （CPT） model， VideoBERT （Video Bidirectional Encoder Representations from Transformers） model， and AliceMind （Alibaba’s collection of encoder-decoders from Mind） model in specific downstream tasks were introduced. Finally， the challenges and future research directions faced by related multimodal PTM work were summed up.

Table and Figures | Reference | Related Articles | Metrics

Select

Review on interpretability of deep learning

Xia LEI, Xionglin LUO

Journal of Computer Applications 2022, 42 (11): 3588-3602. DOI: 10.11772/j.issn.1001-9081.2021122118

Abstract （1637）

HTML （94）

PDF （1703KB）（1283）

Save

With the widespread application of deep learning， human beings are increasingly relying on a large number of complex systems that adopt deep learning techniques. However， the black?box property of deep learning models offers challenges to the use of these models in mission?critical applications and raises ethical and legal concerns. Therefore， making deep learning models interpretable is the first problem to be solved to make them trustworthy. As a result， researches in the field of interpretable artificial intelligence have emerged. These researches mainly focus on explaining model decisions or behaviors explicitly to human observers. A review of interpretability for deep learning was performed to build a good foundation for further in?depth research and establishment of more efficient and interpretable deep learning models. Firstly， the interpretability of deep learning was outlined， the requirements and definitions of interpretability research were clarified. Then， several typical models and algorithms of interpretability research were introduced from the three aspects of explaining the logic rules， decision attribution and internal structure representation of deep learning models. In addition， three common methods for constructing intrinsically interpretable models were pointed out. Finally， the four evaluation indicators of fidelity， accuracy， robustness and comprehensibility were introduced briefly， and the possible future development directions of deep learning interpretability were discussed.

Table and Figures | Reference | Related Articles | Metrics

Select

Time series prediction model based on multimodal information fusion

Minghui WU, Guangjie ZHANG, Canghong JIN

Journal of Computer Applications 2022, 42 (8): 2326-2332. DOI: 10.11772/j.issn.1001-9081.2021061053

Abstract （1568）

HTML （96）

PDF （658KB）（811）

Save

Aiming at the problem that traditional single factor methods cannot make full use of the relevant information of time series and has the poor accuracy and reliability of time series prediction， a time series prediction model based on multimodal information fusion，namely Skip-Fusion， was proposed to fuse the text data and numerical data in multimodal data. Firstly， different types of text data were encoded by pre-trained Bidirectional Encoder Representations from Transformers （BERT） model and one-hot encoding. Then， the single vector representation of the multi-text feature fusion was obtained by using the pre-trained model based on global attention mechanism. After that， the obtained single vector representation was aligned with the numerical data in time order. Finally， the fusion of text and numerical features was realized through Temporal Convolutional Network （TCN） model， and the shallow and deep features of multimodal data were fused again through skip connection. Experiments were carried out on the dataset of stock price series， Skip-Fusion model obtains the results of 0.492 and 0.930 on the Root Mean Square Error （RMSE） and daily Return （R） respectively， which are better than the results of the existing single-modal and multimodal fusion models. Experimental results show that Skip-Fusion model obtains the goodness of fit of 0.955 on the R-squared， indicating that Skip-Fusion model can effectively carry out multimodal information fusion and has high accuracy and reliability of prediction.

Table and Figures | Reference | Related Articles | Metrics

Select

Stock trend prediction method based on temporal hypergraph convolutional neural network

Xiaojie LI, Chaoran CUI, Guangle SONG, Yaxi SU, Tianze WU, Chunyun ZHANG

Journal of Computer Applications 2022, 42 (3): 797-803. DOI: 10.11772/j.issn.1001-9081.2021050748

Abstract （1564）

HTML （66）

PDF （742KB）（850）

Save

Traditional stock prediction methods are mostly based on time-series models， which ignore the complex relations among stocks， and the relations often exceed pairwise connections， such as stocks in the same industry or multiple stocks held by the same fund. To solve this problem， a stock trend prediction method based on temporal HyperGraph Convolutional neural Network （HGCN） was proposed， and a hypergraph model based on financial investment facts was constructed to fit multiple relations among stocks. The model was composed of two major components： Gated Recurrent Unit （GRU） network and HGCN. GRU network was used for performing time-series modeling on historical data to capture long-term dependencies. HGCN was used to model high-order relations among stocks to learn intrinsic relation attributes， and introduce the multiple relation information among stocks into traditional time-series modeling for end-to-end trend prediction. Experiments on real dataset of China A-share market show that compared with existing stock prediction methods， the proposed model improves prediction performance， e.g. compared with the GRU network， the proposed model achieves the relative increases in ACC and F1_score of 9.74% and 8.13%， respectively， and is more stable. In addition， the simulation back-testing results show that the trading strategy based on the proposed model is more profitable， with an annual return of 11.30%， which is 5 percentage points higher than that of Long Short-Term Memory （LSTM） network.

Table and Figures | Reference | Related Articles | Metrics

Select

Text multi-label classification method incorporating BERT and label semantic attention

Xueqiang LYU, Chen PENG, Le ZHANG, Zhi’an DONG, Xindong YOU

Journal of Computer Applications 2022, 42 (1): 57-63. DOI: 10.11772/j.issn.1001-9081.2021020366

Abstract （1550）

HTML （82）

PDF （577KB）（1395）

Save

Multi-Label Text Classification （MLTC） is one of the important subtasks in the field of Natural Language Processing （NLP）. In order to solve the problem of complex correlation between multiple labels， an MLTC method TLA-BERT was proposed by incorporating Bidirectional Encoder Representations from Transformers （BERT） and label semantic attention. Firstly， the contextual vector representation of the input text was learned by fine-tuning the self-coding pre-training model. Secondly， the labels were encoded individually by using Long Short-Term Memory （LSTM） neural network. Finally， the contribution of text to each label was explicitly highlighted with the use of an attention mechanism in order to predict the multi-label sequences. Experimental results show that compared with Sequence Generation Model （SGM） algorithm， the proposed method improves the F value by 2.8 percentage points and 1.5 percentage points on the Arxiv Academic Paper Dataset （AAPD） and Reuters Corpus Volume I （RCV1）-v2 public dataset respectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Multiscale residual UNet based on attention mechanism to realize breast cancer lesion segmentation

Shengqin LUO, Jinyi CHEN, Hongjun LI

Journal of Computer Applications 2022, 42 (3): 818-824. DOI: 10.11772/j.issn.1001-9081.2021040948

Abstract （1533）

HTML （54）

PDF （1860KB）（384）

Save

Concerning the characteristics of breast cancer in Magnetic Resonance Imaging （MRI）， such as different shapes and sizes， and fuzzy boundaries， an algorithm based on multiscale residual U Network （UNet） with attention mechanism was proposed in order to avoid error segmentation and improve segmentation accuracy. Firstly， the multiscale residual units were used to replace two adjacent convolution blocks in the down-sampling process of UNet， so that the network could pay more attention to the difference of shape and size. Then， in the up-sampling stage， layer-crossed attention was used to guide the network to focus on the key regions， avoiding the error segmentation of healthy tissues. Finally， in order to enhance the ability of representing the lesions， the atrous spatial pyramid pooling was introduced as a bridging module to the network. Compared with UNet， the proposed algorithm improved the Dice coefficient， Intersection over Union （IoU）， SPecificity （SP） and ACCuracy （ACC） by 2.26， 2.11， 4.16 and 0.05 percentage points， respectively. The experimental results show that the algorithm can improve the segmentation accuracy of lesions and effectively reduce the false positive rate of imaging diagnosis.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of fine-grained image categorization

SHEN Zhijun, MU Lina, GAO Jing, SHI Yuanhang, LIU Zhiqiang

Journal of Computer Applications 2023, 43 (1): 51-60. DOI: 10.11772/j.issn.1001-9081.2021122090

Abstract （1202）

HTML （64）

PDF （2674KB）（692）

Save

The fine-grained image has characteristics of large intra-class variance and small inter-class variance， which makes Fine-Grained Image Categorization （FGIC） much more difficult than traditional image classification tasks. The application scenarios， task difficulties， algorithm development history and related common datasets of FGIC were described， and an overview of related algorithms was mainly presented. Classification methods based on local detection usually use operations of connection， summation and pooling， and the model training was complex and had many limitations in practical applications. Classification methods based on linear features simulated two neural pathways of human vision for recognition and localization respectively， and the classification effect is relatively better. Classification methods based on attention mechanism simulated the mechanism of human observation of external things， scanning the panorama first， and then locking the key attention area and forming the attention focus， and the classification effect was further improved. For the shortcomings of the current research， the next research directions of FGIC were proposed.

Reference | Related Articles | Metrics

Select

Data augmentation method based on improved deep convolutional generative adversarial networks

GAN Lan, SHEN Hongfei, WANG Yao, ZHANG Yuejin

Journal of Computer Applications 2021, 41 (5): 1305-1313. DOI: 10.11772/j.issn.1001-9081.2020071059

Abstract （1194）

PDF （1499KB）（1686）

Save

In order to solve the training difficulty of small sample data in deep learning and increase the training efficiency of DCGAN (Deep Convolutional Generative Adversarial Network), an improved DCGAN algorithm was proposed to perform the augmentation of small sample data. In the method, Wasserstein distance was used to replace the loss model in the original model at first. Then, spectral normalization was added in the generation network, and discrimination network to acquire a stable network structure. Finally, the optimal noise input dimension of sample was obtained by the maximum likelihood estimation and experimental estimation, so that the generated samples became more diversified. Experimental result on three datasets MNIST, CelebA and Cartoon indicated that the improved DCGAN could generate samples with higher definition and recognition rate compared to that before improvement. In particular, the average recognition rate on these datasets were improved by 8.1%, 16.4% and 16.7% respectively, and several definition evaluation indices on the datasets were increased with different degrees, suggesting that the method can realize the small sample data augmentation effectively.

Reference | Related Articles | Metrics

Select

Multimodal sentiment analysis based on feature fusion of attention mechanism-bidirectional gated recurrent unit

LAI Xuemei, TANG Hong, CHEN Hongyu, LI Shanshan

Journal of Computer Applications 2021, 41 (5): 1268-1274. DOI: 10.11772/j.issn.1001-9081.2020071092

Abstract （1152）

PDF （960KB）（1657）

Save

Aiming at the problem that the cross-modality interaction and the impact of the contribution of each modality on the final sentiment classification results are not considered in multimodal sentiment analysis of video, a multimodal sentiment analysis model of Attention Mechanism based feature Fusion-Bidirectional Gated Recurrent Unit (AMF-BiGRU) was proposed. Firstly, Bidirectional Gated Recurrent Unit (BiGRU) was used to consider the interdependence between utterances in each modality and obtain the internal information of each modality. Secondly, through the cross-modality attention interaction network layer, the internal information of the modalities were combined with the interaction between modalities. Thirdly, an attention mechanism was introduced to determine the attention weight of each modality, and the features of the modalities were effectively fused together. Finally, the sentiment classification results were obtained through the fully connected layer and softmax layer. Experiments were conducted on open CMU-MOSI (CMU Multimodal Opinion-level Sentiment Intensity) and CMU-MOSEI (CMU Multimodal Opinion Sentiment and Emotion Intensity) datasets. The experimental results show that compared with traditional multimodal sentiment analysis methods (such as Multi-Attention Recurrent Network (MARN)), the AMF-BiGRU model has the accuracy and F1-Score on CMU-MOSI dataset improved by 6.01% and 6.52% respectively, and the accuracy and F1-Score on CMU-MOSEI dataset improved by 2.72% and 2.30% respectively. AMF-BiGRU model can effectively improve the performance of multimodal sentiment classification.

Reference | Related Articles | Metrics

Select

Temporal convolutional knowledge tracing model with attention mechanism

Xiaomeng SHAO, Meng ZHANG

Journal of Computer Applications 2023, 43 (2): 343-348. DOI: 10.11772/j.issn.1001-9081.2022010024

Abstract （1144）

HTML （57）

PDF （2110KB）（526）

Save

To address the problems of insufficient interpretability and long sequence dependency in the deep knowledge tracing model based on Recurrent Neural Network （RNN）， a model named Temporal Convolutional Knowledge Tracing with Attention mechanism （ATCKT） was proposed. Firstly， the student historical interactions embedded representations were learned in the training process. Then， the exercise problem-based attention mechanism was used to learn a specific weight matrix to identify and strengthen the influences of student historical interactions on the knowledge state at each moment. Finally， the student knowledge states were extracted by Temporal Convolutional Network （TCN）， in which dilated convolution and deep neural network were used to expand the scope of sequence learning， and alleviate the problem of long sequence dependency. Experimental results show that compared with four models such as Deep Knowledge Tracing （DKT） and Convolutional Knowledge Tracing （CKT） on four datasets （ASSISTments2009、ASSISTments2015、Statics2011 and Synthetic-5）， ATCKT model has the Area Under the Curve （AUC） and Accuracy （ACC） significantly improved， especially on ASSISTments2015 dataset， with an increase of 6.83 to 20.14 percentage points and 7.52 to 11.22 percentage points respectively， at the same time， the training time of the proposed model is decreased by 26% compared with that of DKT model. In summary， this model can accurately capture the student knowledge states and efficiently predict student future performance.

Table and Figures | Reference | Related Articles | Metrics

Select

Unmanned aerial vehicle path planning based on improved genetic algorithm

HUANG Shuzhao, TIAN Junwei, QIAO Lu, WANG Qin, SU Yu

Journal of Computer Applications 2021, 41 (2): 390-397. DOI: 10.11772/j.issn.1001-9081.2020060797

Abstract （1104）

PDF （1487KB）（1505）

Save

In order to solve the problems such as slow convergence speed, falling into local optimum easily, unsmooth planning path and high cost of traditional genetic algorithm, an Unmanned Aerial Vehicle (UAV) path planning method based on improved Genetic Algorithm (GA) was proposed. The selection operator, crossover operator and mutation operator of genetic algorithm were improved to planning a smooth and effective flight path. Firstly, an environment model suitable for the field information acquisition of UAV was established, and a more complex and accurate mathematical model suitable for this scene was established by considering the objective function and constraints of UAV. Secondly, the hybrid non-multi-string selection operator, asymmetric mapping crossover operator and heuristic multi-mutation operator were proposed to find the optimal path and expand the search range of the population. Finally, a cubic B-spline curve was used to smooth the planned path to obtain a smooth flight path and reduce the calculation time of the algorithm. Experimental results show that, compared with the traditional GA, the cost value of the proposed algorithm was reduced by 68%, and the number of convergence iterations was reduced by 67%; compared with the Ant Colony Optimization (ACO) algorithm, its cost value was reduced by 55% and the number of convergence iterations was reduced by 58%. Through a large number of comparison experiments, it is concluded that when the value of the crossover rate is the reciprocal of chromosome size, the proposed algorithm has the best convergence effect. After testing the algorithm performance in different environments, it can be seen that the proposed algorithm has good environmental adaptability and is suitable for path planning in complex environments.

Reference | Related Articles | Metrics

Select

Survey on interpretability research of deep learning

Lingmin LI, Mengran HOU, Kun CHEN, Junmin LIU

Journal of Computer Applications 2022, 42 (12): 3639-3650. DOI: 10.11772/j.issn.1001-9081.2021091649

Abstract （1089）

HTML （77）

PDF （4239KB）（731）

Save

In recent years， deep learning has been widely used in many fields. However， due to the highly nonlinear operation of deep neural network models， the interpretability of these models is poor， these models are often referred to as “black box” models， and cannot be applied to some key fields with high performance requirements. Therefore， it is very necessary to study the interpretability of deep learning. Firstly， deep learning was introduced briefly. Then， around the interpretability of deep learning， the existing research work was analyzed from eight aspects， including hidden layer visualization， Class Activation Mapping （CAM）， sensitivity analysis， frequency principle， robust disturbance test， information theory， interpretable module and optimization method. At the same time， the applications of deep learning in the fields of network security， recommender system， medical and social networks were demonstrated. Finally， the existing problems and future development directions of deep learning interpretability research were discussed.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of label noise learning algorithms based on deep learning

Boyi FU, Yuncong PENG, Xin LAN, Xiaolin QIN

Journal of Computer Applications 2023, 43 (3): 674-684. DOI: 10.11772/j.issn.1001-9081.2022020198

Abstract （1081）

HTML （83）

PDF （2083KB）（765）

PDF（mobile）（733KB）（49）

Save

In the field of deep learning， a large number of correctly labeled samples are essential for model training. However， in practical applications， labeling data requires high labeling cost. At the same time， the quality of labeled samples is affected by subjective factors or tool and technology of manual labeling， which inevitably introduces label noise in the annotation process. Therefore， existing training data available for practical applications is subject to a certain amount of label noise. How to effectively train training data with label noise has become a research hotspot. Aiming at label noise learning algorithms based on deep learning， firstly， the source， classification and impact of label noise learning strategies were elaborated； secondly， four label noise learning strategies based on data， loss function， model and training method were analyzed according to different elements of machine learning； then， a basic framework for learning label noise in various application scenarios was provided； finally， some optimization ideas were given， and challenges and future development directions of label noise learning algorithms were proposed.

Table and Figures | Reference | Related Articles | Metrics

Select

Human skeleton-based action recognition algorithm based on spatiotemporal attention graph convolutional network model

LI Yangzhi, YUAN Jiazheng, LIU Hongzhe

Journal of Computer Applications 2021, 41 (7): 1915-1921. DOI: 10.11772/j.issn.1001-9081.2020091515

Abstract （1074）

PDF （1681KB）（1226）

Save

Aiming at the problem that the existing human skeleton-based action recognition algorithms cannot fully explore the temporal and spatial characteristics of motion, a human skeleton-based action recognition algorithm based on Spatiotemporal Attention Graph Convolutional Network (STA-GCN) model was proposed, which consisted of spatial attention mechanism and temporal attention mechanism. The spatial attention mechanism used the instantaneous motion information of the optical flow features to locate the spatial regions with significant motion on the one hand, and introduced the global average pooling and auxiliary classification loss during the training process to enable the model to focus on the non-motion regions with discriminability ability on the other hand. While the temporal attention mechanism automatically extracted the discriminative time-domain segments from the long-term complex video. Both of spatial and temporal attention mechanisms were integrated into a unified Graph Convolution Network (GCN) framework to enable the end-to-end training. Experimental results on Kinetics and NTU RGB+D datasets show that the proposed algorithm based on STA-GCN has strong robustness and stability, and compared with the benchmark algorithm based on Spatial Temporal Graph Convolutional Network (ST-GCN) model, the Top-1 and Top-5 on Kinetics are improved by 5.0 and 4.5 percentage points, respectively, and the Top-1 on CS and CV of NTU RGB+D dataset are also improved by 6.2 and 6.7 percentage points, respectively; it also outperforms the current State-Of-the-Art (SOA) methods in action recognition, such as Res-TCN (Residue Temporal Convolutional Network), STA-LSTM, and AS-GCN (Actional-Structural Graph Convolutional Network). The results indicate that the proposed algorithm can better meet the practical application requirements of human action recognition.

Reference | Related Articles | Metrics

Select

YOLOv3 compression and acceleration based on ZYNQ platform

GUO Wenxu, SU Yuanqi, LIU Yuehu

Journal of Computer Applications 2021, 41 (3): 669-676. DOI: 10.11772/j.issn.1001-9081.2020060994

Abstract （1064）

PDF （1391KB）（1604）

Save

The object detection networks with high accuracy are hard to be directly deployed on end-devices such as vehicles and drones due to their significant increase of parameters and computational cost. In order to solve the problem, by considering network compression and computation acceleration, a new compression scheme for residual networks was proposed to compress YOLOv3 (You Only Look Once v3), and this compressed network was then accelerated on ZYNQ platform. Firstly, a network compression algorithm containing both network pruning and network quantization was proposed. In the aspect of network pruning, a strategy for residual structure was introduced to divide the network pruning into two granularities:channel pruning and residual connection pruning, which overcame the limitations of the channel pruning on residual connections and further reduced the parameter number of the model. In the aspect of network quantization, a relative entropy-based simulated quantization was utilized to quantize the parameters channel by channel, and perform the online statistics of the parameter distribution and the information loss caused by the parameter quantization, so as to assist to choose the best quantization strategy to reduce the precision loss during the quantization process. Secondly, the 8-bit convolution acceleration module was designed and optimized on ZYNQ platform, which optimized the on-chip cache structure and accelerate the compressed YOLOv3 with combining the Winograd algorithm. Experimental results show that the proposed solution can achieve smaller model scale and higher accuracy (7 percent points increased) compared to YOLOv3 tiny. Meanwhile, the hardware acceleration method on ZYNQ platform achieves higher energy efficiency ratio than other platforms, thus helping the actual deployment of YOLOv3 and other residual networks on the end sides of ZYNQ.

Reference | Related Articles | Metrics

Select

Survey of single target tracking algorithms based on Siamese network

Mengting WANG, Wenzhong YANG, Yongzhi WU

Journal of Computer Applications 2023, 43 (3): 661-673. DOI: 10.11772/j.issn.1001-9081.2022010150

Abstract （1064）

HTML （132）

PDF （2647KB）（784）

Save

Single object tracking is an important research direction in the field of computer vision， and has a wide range of applications in video surveillance， autonomous driving and other fields. For single object tracking algorithms， although a large number of summaries have been conducted， most of them are based on correlation filter or deep learning. In recent years， Siamese network-based tracking algorithms have received extensive attention from researchers for their balance between accuracy and speed， but there are relatively few summaries of this type of algorithms and it lacks systematic analysis of the algorithms at the architectural level. In order to deeply understand the single object tracking algorithms based on Siamese network， a large number of related literatures were organized and analyzed. Firstly， the structures and applications of the Siamese network were expounded， and each tracking algorithm was introduced according to the composition classification of the Siamese tracking algorithm architectures. Then， the commonly used datasets and evaluation metrics in the field of single object tracking were listed， the overall and each attribute performance of 25 mainstream tracking algorithms was compared and analyzed on OTB 2015 （Object Tracking Benchmark） dataset， and the performance and the reasoning speed of 23 Siamese network-based tracking algorithms on LaSOT （Large-scale Single Object Tracking） and GOT-10K （Generic Object Tracking） test sets were listed. Finally， the research on Siamese network-based tracking algorithms was summarized， and the possible future research directions of this type of algorithms were prospected.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of event extraction

Chunming MA, Xiuhong LI, Zhe LI, Huiru WANG, Dan YANG

Journal of Computer Applications 2022, 42 (10): 2975-2989. DOI: 10.11772/j.issn.1001-9081.2021081542

Abstract （1057）

HTML （149）

PDF （3054KB）（605）

Save

The event that the user is interested in is extracted from the unstructured information， and then displayed to the user in a structured way， that is event extraction. Event extraction has a wide range of applications in information collection， information retrieval， document synthesis， and information questioning and answering. From the overall perspective， event extraction algorithms can be divided into four categories： pattern matching algorithms， trigger lexical methods， ontology-based algorithms， and cutting-edge joint model methods. In the research process， different evaluation methods and datasets can be used according to the related needs， and different event representation methods are also related to event extraction research. Distinguished by task type， meta-event extraction and subject event extraction are the two basic tasks of event extraction. Among them， meta-event extraction has three methods based on pattern matching， machine learning and neural network respectively， while there are two ways to extract subjective events： based on the event framework and based on ontology respectively. Event extraction research has achieved excellent results in single languages such as Chinese and English， but cross-language event extraction still faces many problems. Finally， the related works of event extraction were summarized and the future research directions were prospected in order to provide guidelines for subsequent research.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of pre-trained models for natural language processing tasks

LIU Ruiheng, YE Xia, YUE Zengying

Journal of Computer Applications 2021, 41 (5): 1236-1246. DOI: 10.11772/j.issn.1001-9081.2020081152

Abstract （1023）

PDF （1296KB）（3221）

Save

In recent years, deep learning technology has developed rapidly. In Natural Language Processing (NLP) tasks, with text representation technology rising from the word level to the document level, the unsupervised pre-training method using a large-scale corpus has been proved to be able to effectively improve the performance of models in downstream tasks. Firstly, according to the development of text feature extraction technology, typical models were analyzed from word level and document level. Secondly, the research status of the current pre-trained models was analyzed from the two stages of pre-training target task and downstream application, and the characteristics of the representative models were summed up. Finally, the main challenges faced by the development of pre-trained models were summarized and the prospects were proposed.

Reference | Related Articles | Metrics

Select

Sparrow search algorithm based on Sobol sequence and crisscross strategy

Yuxian DUAN, Changyun LIU

Journal of Computer Applications 2022, 42 (1): 36-43. DOI: 10.11772/j.issn.1001-9081.2021010187

Abstract （1016）

HTML （36）

PDF （771KB）（465）

Save

For the shortcomings of falling into the local optimum easily and slow convergence in Sparrow Search Algorithm （SSA）， a Sparrow Search Algorithm based on Sobol sequence and Crisscross strategy （SSASC） was proposed. Firstly， the Sobol sequence was introduced in the initialization stage to enhance the diversity and ergodicity of the population. Secondly， the nonlinear inertia weight in exponential form was proposed to improve the convergence efficiency of the algorithm. Finally， the crisscross strategy was applied to improve the algorithm. In specific， the horizontal crossover was used to enhance the global search ability， while the vertical crossover was used to maintain the diversity of the population and avoid the algorithm from trapping into the local optimum. Thirteen benchmark functions were selected for simulation experiments， and the performance of the algorithm was evaluated by Wilcoxon rank sum test and Friedman test. In comparison experiments with other metaheuristic algorithms， the mean and standard deviation generated by SSASC are always better than other algorithms when the benchmark functions extending from 10 dimensions to 100 dimensions. Experimental results show that SSASC achieves certain superiority in both convergence speed and solution accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Bamboo strip surface defect detection method based on improved CenterNet

GAO Qinquan, HUANG Bingcheng, LIU Wenzhe, TONG Tong

Journal of Computer Applications 2021, 41 (7): 1933-1938. DOI: 10.11772/j.issn.1001-9081.2020081167

Abstract （1011）

PDF （1734KB）（650）

Save

In bamboo strip surface defect detection, the bamboo strip defects have different shapes and messy imaging environment, and the existing target detection model based on Convolutional Neural Network (CNN) does not take advantage of the neural network when facing such specific data; moreover, the sources of bamboo strips are complicated and there exist other limited conditions, so that it is impossible to collect all types of data, resulting in a small amount of bamboo strip defect data that CNN cannot fully learn. To address these problems, a special detection network aiming at bamboo strip defects was proposed. The basic framework of the proposed network is CenterNet. In order to improve the detection performance of CenterNet in less bamboo strip defect data, an auxiliary detection module based on training from scratch was designed:when the network started training, the CenterNet part that uses the pre-training model was frozen, and the auxiliary detection module was trained from scratch according to the defect characteristics of the bamboo strips; when the loss of the auxiliary detection module stabilized, the module was intergrated with the pre-trained main part by a connection method of attention mechanism. The proposed detection network was trained and tested on the same training sets with CenterNet and YOLO v3 which is currently commonly used in industrial detection. Experimental results show that on the bamboo strip defect detection dataset, the mean Average Precision (mAP) of the proposed method is 16.45 and 9.96 percentage points higher than those of YOLO v3 and CenterNet, respectively. The proposed method can effectively detect the different shaped defects of bamboo strips without increasing too much time consumption, and has a good effect in actual industrial applications.

Reference | Related Articles | Metrics

Select

Multi-person classroom action recognition in classroom teaching videos based on deep spatiotemporal residual convolution neural network

Yongkang HUANG, Meiyu LIANG, Xiaoxiao WANG, Zheng CHEN, Xiaowen CAO

Journal of Computer Applications 2022, 42 (3): 736-742. DOI: 10.11772/j.issn.1001-9081.2021040845

Abstract （991）

HTML （46）

PDF （2130KB）（527）

Save

In view of the problems that classroom teaching scene is obscured seriously and has numerous students， the current video action recognition algorithm is not suitable for classroom teaching scene， and there is no public dataset of student classroom action， a classroom teaching video library and a student classroom action library were constructed， and a real-time multi-person student classroom action recognition algorithm based on deep spatiotemporal residual convolution neural network was proposed. Firstly， combined with real-time object detection and tracking to get the real-time picture stream of each student， and then the deep spatiotemporal residual convolution neural network was used to learn the spatiotemporal characteristics of each student’s action， so as to realize the real-time recognition of classroom behavior for multiple students in classroom teaching scenes. In addition， an intelligent teaching evaluation model was constructed， and an intelligent teaching evaluation system based on the recognition of students’ classroom actions was designed and implemented， which can help improve the teaching quality and realize the intelligent education. By making experimental comparison and analysis on the classroom teaching video dataset， it is verified that the proposed real-time classroom action recognition model for multiple students in classroom teaching video can achieve high accuracy of 88.5%， and the intelligent teaching evaluation system based on classroom action recognition has also achieved good results in classroom teaching video dataset.

Table and Figures | Reference | Related Articles | Metrics

Select

Sequential multimodal sentiment analysis model based on multi-task learning

ZHANG Sun, YIN Chunyong

Journal of Computer Applications 2021, 41 (6): 1631-1639. DOI: 10.11772/j.issn.1001-9081.2020091416

Abstract （977）

PDF （1150KB）（1398）

Save

Considering the issues of unimodal feature representation and cross-modal feature fusion in sequential multimodal sentiment analysis, a multi-task learning based sentiment analysis model was proposed by combining with multi-head attention mechanism. Firstly, Convolution Neural Network (CNN), Bidirectional Gated Recurrent Unit (BiGRU) and Multi-Head Self-Attention (MHSA) were used to realize the sequential unimodal feature representation. Secondly, the bidirectional cross-modal information was fused by multi-head attention. Finally, based on multi-task learning, the sentiment polarity classification and sentiment intensity regression were added as auxiliary tasks to improve the comprehensive performance of the main task of sentiment score regression. Experimental results demonstrate that the proposed model improves the accuracy of binary classification by 7.8 percentage points and 3.1 percentage points respectively on CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) and CMU Multimodal Opinion level Sentiment Intensity (CMU-MOSI) datasets compared with multimodal factorization model. Therefore, the proposed model is applicable for the sentiment analysis problems under multimodal scenarios, and can provide the decision supports for product recommendation, stock market forecasting, public opinion monitoring and other relevant applications.

Reference | Related Articles | Metrics

Select

Survey of sentiment analysis based on image and text fusion

MENG Xiangrui, YANG Wenzhong, WANG Ting

Journal of Computer Applications 2021, 41 (2): 307-317. DOI: 10.11772/j.issn.1001-9081.2020060923

Abstract （929）

PDF （1277KB）（1821）

Save

With the continuous improvement of information technology, the amount of image-text data with orientation on various social platforms is growing rapidly, and the sentiment analysis with image and text fusion is widely concerned. The single sentiment analysis method can no longer meet the demand of multi-modal data. Aiming at the technical problems of image and text sentiment feature extraction and fusion, firstly, the widely used image and text emotional analysis datasets were listed, and the extraction methods of text features and image features were introduced. Then, the current fusion modes of image features and text features were focused on and the problems existing in the process of image-text sentiment analysis were briefly described. Finally, the research directions of sentiment analysis in the future were summarized and prospected for. In order to have a deeper understanding of image-text fusion technology, literature research method was adopted to review the study of image-text sentiment analysis, which is helpful to compare the differences between different fusion methods and find more valuable research schemes.

Reference | Related Articles | Metrics

Project Articles