Artificial intelligence

Select

Aspect sentiment triplet extraction based on aspect-aware attention enhancement

Longtao GAO, Nana LI

Journal of Computer Applications 2024, 44 (4): 1049-1057. DOI: 10.11772/j.issn.1001-9081.2023040411

Abstract （197）

HTML （21）

PDF （2126KB）（110）

Save

For fine-grained sentiment analysis in Natural Language Processing （NLP）， in order to explore the influence of Pre-trained Language Models （PLMs） with structural biases on the end-to-end sentiment triple extraction task， and solve the problem of low fault tolerance rate of aspect semantic feature dependence that is common in previous studies， combining aspect-aware attention mechanism and Graph Convolutional Network （GCN）， an Aspect-aware attention Enhanced GCN （AE-GCN） model was proposed for aspect sentiment triple extraction tasks. Firstly， multiple types of relations were introduced for the aspect sentiment triple extraction task. Then， these relations were embedded into the adjacent tensors between words in the sentence by using the double affine attention mechanism. At the same time， the aspect-aware attention mechanism was introduced to obtain the sentence attention scoring matrix， and the aspect-related semantic features were further mined. Next， a sentence was converted into a multi-channel graph through the graph convolutional neural network， to learn a relation-aware node representation by treating words and relation adjacent tensors as edges and nodes， respectively. Finally， an effective word pair representation refinement strategy was used to determine whether word pairs matched， which was used to consider the implicit results of aspect and opinion extraction. Experimental results show that， on ASTE-D1 benchmark dataset， the F1 values of the proposed model on the 14res， 14lap， 15res and 16res sub-datasets are improved by 0.20， 0.21， 1.25 and 0.26 percentage points compared with the Enhanced Multi-Channel Graph Convolutional Network （EMC-GCN） model； on ASTE-D2 benchmark dataset， the F1 values of the proposed model on the 14lap， 15res and 16res sub-datasets are increased by 0.42， 0.31 and 2.01 percentage points compared with the EMC-GCN model. It can be seen that the proposed model has great improvement in precision and effectiveness compared with the EMC-GCN model.

Table and Figures | Reference | Related Articles | Metrics

Select

Aspect-level sentiment analysis model based on alternating‑attention mechanism and graph convolutional network

Xianfeng YANG, Yilei TANG, Ziqiang LI

Journal of Computer Applications 2024, 44 (4): 1058-1064. DOI: 10.11772/j.issn.1001-9081.2023040497

Abstract （167）

HTML （15）

PDF （943KB）（207）

Save

Aspect-level sentiment analysis aims to predict the sentiment polarity of specific target in given text. Aiming at the problem of ignoring the syntactic relationship between aspect words and context and reducing the attention difference caused by average pooling， an aspect-level sentiment analysis model based on Alternating-Attention （AA） mechanism and Graph Convolutional Network （AA-GCN） was proposed. Firstly， the Bidirectional Long Short-Term Memory （Bi-LSTM） network was used to semantically model context and aspect words. Secondly， the GCN based on syntactic dependency tree was used to learn location information and dependencies， and the AA mechanism was used for multi-level interactive learning to adaptively adjust the attention to the target word. Finally， the final classification basis was obtained by splicing the corrected aspect features and context features. Compared with the Target-Dependent Graph Attention Network （TD-GAT）， the accuracies of the proposed model on four public datasets increased by 1.13%-2.67%， and the F1 values on five public datasets increased by 0.98%-4.89%， indicating the effectiveness of using syntactic relationships and increasing keyword attention.

Table and Figures | Reference | Related Articles | Metrics

Select

Offensive speech detection with irony mechanism

Haihan WANG, Yan ZHU

Journal of Computer Applications 2024, 44 (4): 1065-1071. DOI: 10.11772/j.issn.1001-9081.2023040533

Abstract （191）

HTML （15）

PDF （2696KB）（77）

Save

Offensive speech on the internet seriously disrupts the normal network order and destroys the network environment for healthy communication. Existing detection technologies focus on the distinctive features in the text， and are difficult to discover more implicit attack methods. For the above problems， an offensive speech detection model BSWD （Bidirectional Encoder Representation from Transformers-based Sarcasm and Word Detection） incorporating irony mechanism was proposed. First， a model based on irony mechanism Sarcasm-BERT was proposed to detect semantic conflicts in speech. Secondly， a fine-grained word offensive feature extraction model WordsDetect was proposed to detect offensive words in speech. Finally， the model BSWD was obtained by fusing the above two models. The experimental results show that the accuracy， precision， recall， and F1 score indicators of the proposed model are generally improved by 2%， compared with the BERT（Bidirectional Encoder Representation from Transformers） and HateBERT methods. BSWD significantly improves the detection performance and can better detect implicit offensive speech. Compared with the SKS （Sentiment Knowledge Sharing） and BiCHAT （Bi-LSTM with deep CNN and Hierarchical ATtention） methods， BSWD has stronger generalization ability and robustness. The above results verify that BSWD can effectively detect the implicit offensive speech.

Table and Figures | Reference | Related Articles | Metrics

Select

Technology term recognition with comprehensive constituency parsing

Junjie ZHU, Li YU, Shengwen LI, Changzheng ZHOU

Journal of Computer Applications 2024, 44 (4): 1072-1079. DOI: 10.11772/j.issn.1001-9081.2023040532

Abstract （142）

HTML （9）

PDF （1342KB）（259）

Save

Technology terms are used to communicate information accurately in the field of science and technology. Automatically recognizing technology terms from text can help experts and the public to discover， recognize， and apply new technologies， which is great of value， but unsupervised technology term recognition methods still have some limitations， such as complex rules and poor adaptability. To enhance the ability to recognize technology terms from text， an unsupervised technology term recognition method was proposed. Firstly， a syntactic structure tree was constructed through constituency parsing. Then， the candidate technology terms were extracted from both top-down and bottom-up perspectives. Finally， the statistical frequency and semantic information were combined to determine the most appropriate technology terms. Besides， a technology term dataset was constructed to validate the effectiveness of the proposed method. Experimental results on the proposed dataset show that the proposed method with top-down extraction has the F1 score improved by 4.55 percentage points compared to the dependency-based method. Meanwhile， the analysis results conducted on case study in the field of 3D printing show that the recognized technology terms by the proposed method are in line with the development of the field， which can be used to trace the development process of technology and depict the evolution path of technology， so as to provide references for understanding， discovering， and exploring future technologies of the field.

Table and Figures | Reference | Related Articles | Metrics

Select

Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

Tian CHEN, Conghu CAI, Xiaohui YUAN, Beibei LUO

Journal of Computer Applications 2024, 44 (2): 369-376. DOI: 10.11772/j.issn.1001-9081.2023020185

Abstract （304）

HTML （24）

PDF （2138KB）（269）

Save

Emotion recognition based on physiological signals is affected by noise and other factors， resulting in low accuracy and weak cross-individual generalization ability. Concerning the issue， a multimodal emotion recognition method based on ElectroEncephaloGram （EEG）， ElectroCardioGram （ECG）， and eye movement signals was proposed. Firstly， physiological signals were performed multi-scale convolution to obtain higher-dimensional signal features and reduce parameter size. Secondly， self-attention was employed in the fusion of multimodal signal features to enhance the weights of key features and reduce feature interference between modalities. Finally， a Bi-directional Long Short-Term Memory （Bi-LSTM） network was used for extraction of temporal information of fused features and classification. Experimental results show that， the proposed method achieves recognition accuracies of 90.29%， 91.38%， and 83.53% for valence， arousal， and valence/arousal four-class recognition tasks， respectively， with improvements of 3.46-7.11 and 0.92-3.15 percentage points compared to the EEG single-modality and EEG+ECG bimodal methods. The proposed method can accurately recognize emotion with better recognition stability between individuals.

Table and Figures | Reference | Related Articles | Metrics

Select

Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement

Xinran LUO, Tianrui LI, Zhen JIA

Journal of Computer Applications 2024, 44 (2): 385-392. DOI: 10.11772/j.issn.1001-9081.2023020179

Abstract （221）

HTML （18）

PDF （2158KB）（263）

Save

To address the difficulty of word boundary recognition stemming from nested entities in Chinese medical texts， as well as significant semantic information loss in existing Lattice-LSTM structures with integrated lexical features， an adaptive lexical information enhancement model for Chinese Medical Named Entity Recognition （MNER） was proposed. First， the BiLSTM （Bi-directional Long-Short Term Memory） network was utilized to encode the contextual information of the character sequence and capture the long-distance dependencies. Next， potential word information of each character was modeled as character-word pairs， and the self-attention mechanism was utilized to realize internal interactions between different words. Finally， a lexicon adapter based on bilinear-attention mechanism was used to integrate lexical information into each character in the text sequence， enhancing semantic information effectively while fully utilizing the rich boundary information of words and suppressing words with low correlation. Experimental results demonstrate that the average F1 value of the proposed model increases by 1.37 to 2.38 percentage points compared to the character-based baseline model， and its performance is further optimized when combined with BERT.

Table and Figures | Reference | Related Articles | Metrics

Select

Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism

Fuqin DENG, Huifeng GUAN, Chaoen TAN, Lanhui FU, Hongmin WANG, Tinlun LAM, Jianmin ZHANG

Journal of Computer Applications 2024, 44 (2): 432-438. DOI: 10.11772/j.issn.1001-9081.2023020193

Abstract （211）

HTML （13）

PDF （1916KB）（813）

Save

To reduce the blocking rate of multi-robot path planning in dynamic environments， a Distributed Communication and local Attention based Multi-Agent Path Finding （DCAMAPF） was proposed based on Actor-Critic deep reinforcement learning method framework， using request-response communication mechanism and local attention mechanism. In the Actor network， local observation and action information was requested by each robot from other robots in its field of view based on the request-response communication mechanism， and a coordinated action strategy was planned accordingly. In the Critic network， attention weights were dynamically allocated by each robot to the local observation and action information of other robots that had successfully responded within its field of view based on the local attention mechanism. The experimental results showed that， the blocking rate was reduced by approximately 6.91， 4.97， and 3.56 percentage points， respectively， in a discrete initialization environment， compared with traditional dynamic path planning methods such as D^* Lite， the latest distributed reinforcement learning method MAPPER， and the latest centralized reinforcement learning method AB-MAPPER （Attention and BicNet based MAPPER）； in a centralized initialization environment， the mean blocking rate was reduced by approximately 15.86， 11.71 and 5.54 percentage points； while the occupied computing cache was also reduced. Therefore， the proposed method ensures the efficiency of path planning and is applicable for solving multi-robot path planning tasks in different dynamic environments.

Table and Figures | Reference | Related Articles | Metrics

Select

Video prediction model combining involution and convolution operators

Junhong ZHU, Junyu LAI, Lianqiang GAN, Zhiyong CHEN, Huashuo LIU, Guoyao XU

Journal of Computer Applications 2024, 44 (1): 113-122. DOI: 10.11772/j.issn.1001-9081.2023060853

Abstract （182）

HTML （10）

PDF （4036KB）（123）

Save

To address the inadequate feature extraction from data space and low prediction accuracy in traditional deep learning based video prediction， a video prediction model Combining Involution and Convolution Operators （CICO） was proposed. The model enhanced video prediction performance through three aspects. Firstly， convolutions with varying kernel sizes were adopted to enhance extraction ability of multi-granularity spatial features and enable multi-angle representational learning of targets. In particular， larger kernels were applied to extract features from broader spatial ranges， while smaller kernels were employed to capture motion details more precisely. Secondly， large-kernel convolutions were replaced by the computationally efficient involution operators with fewer parameters in order to achieve efficient inter-channel interaction， avoid redundant parameters， decrease computational and storage costs. The predictive capacity of the model was enhanced at the same time. Finally， convolutions with kernel size 1×1 were introduced for linear mapping to strengthen joint expression between distinct features， improve parameter utilization efficiency， and strengthen prediction robustness. The proposed model’s superiority was validated through comprehensive experiments on various datasets， resulting in significant improvements over the state-of-the-art SimVP （Simpler yet Better Video Prediction） model. On Moving MNIST dataset， the Mean Squared Error （MSE） and Mean Absolute Error （MAE） were reduced by 25.2% and 17.4%， respectively. On Traffic Beijing dataset， the MSE was reduced by 1.2%. On KTH dataset， the Structure Similarity Index Measure （SSIM） and Peak Signal-to-Noise Ratio （PSNR） were improved by 0.66% and 0.47%， respectively. It can be seen that the proposed model is very effective in improving accuracy of video prediction.

Table and Figures | Reference | Related Articles | Metrics

Select

Few-shot news topic classification method based on knowledge enhancement and prompt learning

Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING

Journal of Computer Applications 2024, 44 (6): 1767-1774. DOI: 10.11772/j.issn.1001-9081.2023050709

Abstract （238）

HTML （11）

PDF （2029KB）（115）

Save

Classification methods based on fine-tuning pre-trained models usually require a large amount of annotated data， resulting in the inability to be used for few-shot classification tasks. Therefore， a Knowledge enhancement and Prompt Learning （KPL） method was proposed for Chinese few-shot news topic classification. Firstly， an optimal prompt template was learned from the training set by using a pre-trained model. Then the template was integrated with the input text， effectively transforming the classification task into a cloze-filling task， simultaneously external knowledge was utilized to expand the label word space， enhancing the semantic richness of label words. Finally， predicted label words were subsequently mapped back to the original labels. Experiments were conducted on a few-shot training set and a few-shot validation set randomly sampled from three news datasets， THUCNews， SHNews and Toutiao. The experimental results show that the proposed method improves the overall performance on the 1-shot， 5-shot， 10-shot and 20-shot tasks on the above datasets. Notably， a significant improvement is observed in the 1-shot task. Compared to baseline few-shot classification method， the accuracy increases by at least 7.59， 2.11 and 3.10 percentage points， respectively， confirming the effectiveness of KPL in few-shot news topic classification tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Bird recognition algorithm based on attention mechanism

Tianhua CHEN, Jiaxuan ZHU, Jie YIN

Journal of Computer Applications 2024, 44 (4): 1114-1120. DOI: 10.11772/j.issn.1001-9081.2023081042

Abstract （286）

HTML （23）

PDF （2874KB）（366）

Save

Aiming at the low accuracy problem of existing algorithms for fine-grained target bird recognition tasks， a target detection algorithm for bird targets called YOLOv5-Bird， was proposed. Firstly， a mixed domain based Coordinate Attention （CA） mechanism was introduced in the backbone of YOLOv5 to increase the weights of valuable channels and distinguish the features of the target from the redundant features in the background. Secondly， Bi-level Routing Attention （BRA） modules were used to replace part C3 modules in the original backbone to filter the low correlated key-value pair information and obtain efficient long-distance dependencies. Finally， WIoU （Wise-Intersection over Union） function was used as loss function to enhance the localization ability of algorithm. Experimental results show that the detection precision of YOLOv5-Bird reaches 82.8%， and the recall reaches 77.0% on the self-constructed dataset， which are 4.3 and 7.6 percentage points higher than those of YOLOv5 algorithm. Compared with the algorithms adding other attention mechanisms， YOLOv5-Bird also has performance advantages.It is verified that YOLOv5-Bird has better performance in bird target detection scenarios.

Table and Figures | Reference | Related Articles | Metrics

Select

Session-based recommendation model by graph neural network fused with item influence

Xuanyu SUN, Yancui SHI

Journal of Computer Applications 2023, 43 (12): 3689-3696. DOI: 10.11772/j.issn.1001-9081.2022121812

Abstract （287）

HTML （22）

PDF （2062KB）（248）

Save

Aiming at the problem that it is difficult for the existing session-based recommendation models to explicitly express the influence of items on the recommendation results， a Session-based Recommendation model by graph neural network fused with Item Influence （SR-II） was proposed. Firstly， a new edge weight calculation method was proposed to construct a graph structure， in which the calculated result was used as the influence weight of the transition relationship in the graph， and the features of the graph were extracted through the influence graph gated layer by using Graph Neural Network （GNN）. Then， an improved shortcut graph was proposed to connect related items， effectively capture long-range dependencies， and enrich the information expressed by the graph structure； and the features of the graph were extracted through the shortcut graph attention layer by using the attention mechanism. Finally， a recommendation model was constructed by combining the above two layers. In the experimental results on Diginetica and Gowalla datasets， the highest HR@20 of SR-II is reaching 53.12%， and the highest MRR@20 of SR-II is reaching 25.79%. On Diginetica dataset， compared with CORE-trm （simple and effective session-based recommendation within COnsistent REpresentation space-transformer）， SR-II has the HR@20 improved by 1.10% ，and the MRR@20 improved by 1.21%； On Gowalla dataset， compared with SR-SAN（Session-based Recommendation with Self-Attention Networks）， SR-II has the HR@20 improved by 1.73%.Compared with the recommendation model called LESSR （Lossless Edge-order preserving aggregation and Shortcut graph attention for Session-based Recommendation）， SR-II has the MRR@20 improved by 1.14%. The experimental results show that the performance of SR-II is better than that of the comparison models， and SR-II has a higher recommendation accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Hyperparameter optimization for neural network based on improved real coding genetic algorithm

Wei SHE, Yang LI, Lihong ZHONG, Defeng KONG, Zhao TIAN

Journal of Computer Applications 2024, 44 (3): 671-676. DOI: 10.11772/j.issn.1001-9081.2023040441

Abstract （446）

HTML （58）

PDF （1532KB）（548）

Save

To address the problems of poor effects， easily falling into suboptimal solutions， and inefficiency in neural network hyperparameter optimization， an Improved Real Coding Genetic Algorithm （IRCGA） based hyperparameter optimization algorithm for the neural network was proposed， which was named IRCGA-DNN （IRCGA for Deep Neural Network）. Firstly， a real-coded form was used to represent the values of hyperparameters， which made the search space of hyperparameters more flexible. Then， a hierarchical proportional selection operator was introduced to enhance the diversity of the solution set. Finally， improved single-point crossover and variational operators were designed to explore the hyperparameter space more thoroughly and improve the efficiency and quality of the optimization algorithm， respectively. Two simulation datasets were used to show IRCGA’s performance in damage effectiveness prediction and convergence efficiency. The experimental results on two datasets indicate that， compared to GA-DNN（Genetic Algorithm for Deep Neural Network）， the proposed algorithm reduces the convergence iterations by 8.7% and 13.6% individually， and the MSE （Mean Square Error） is not much different； compared to IGA-DNN（Improved Genetic Algorithm for Deep Neural Network）， IRCGA-DNN achieves reductions of 22.2% and 13.6% in convergence iterations respectively. Experimental results show that the proposed algorithm is better in both convergence speed and prediction performance， and is suitable for hyperparametric optimization of neural networks.

Table and Figures | Reference | Related Articles | Metrics

Select

Joint approach of intent detection and slot filling based on multi-task learning

Aiguo SHANG, Xinjuan ZHU

Journal of Computer Applications 2024, 44 (3): 690-695. DOI: 10.11772/j.issn.1001-9081.2023040443

Abstract （330）

HTML （19）

PDF （1281KB）（462）

Save

With the application of pre-trained language models in Natural Language Processing （NLP） tasks， joint modeling of Intent Detection （ID） and Slot Filling （SF） has improved the performance of Spoken Language Understanding （SLU）. Existing methods mostly focus on the interaction between intents and slots， neglecting the influence of modeling differential text sequences on SLU tasks. A joint method for Intent Detection and Slot Filling based on Multi-task Learning （IDSFML） was proposed. Firstly， differential texts were constructed using random mask strategy， and a neural network structure combining AutoEncoder and Attention mechanism （AEA） was designed to incorporate the features of differential text sequences into the SLU task. Secondly， a similarity distribution task was designed to make the representations of differential texts and original texts similar. Finally， three tasks of ID， SF and differential text sequence similarity distribution were jointly trained. Experimental results on Airline Travel Information Systems （ATIS） and SNIPS datasets show that， compared with the suboptimal baseline method SASGBC （Self-Attention and Slot-Gated on top of BERT with CRF）， IDSFML improves the F1 scores of slot filling by 1.9 and 1.6 percentage points respectively， and improves the accuracy of intent detection by 0.2 and 0.4 percentage points respectively， enhancing the accuracy of spoken language understanding tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

Ning WU, Yangyang LUO, Huajie XU

Journal of Computer Applications 2024, 44 (3): 737-744. DOI: 10.11772/j.issn.1001-9081.2023040439

Abstract （393）

HTML （25）

PDF （2809KB）（1183）

Save

To improve the accuracy of semantic segmentation for remote sensing images and address the loss problem of small-sized target information during feature extraction by Deep Convolutional Neural Network （DCNN）， a semantic segmentation method based on multi-scale feature fusion named FuseSwin was proposed. Firstly， an Attention Enhancement Module （AEM） was introduced in the Swin Transformer to highlight the target area and suppress background noise. Secondly， the Feature Pyramid Network （FPN） was used to fuse the detailed information and high-level semantic information of the multi-scale features to complement the features of the target. Finally， the Atrous Spatial Pyramid Pooling （ASPP） module was used to capture the contextual information of the target from the fused feature map and further improve the model segmentation accuracy. Experimental results demonstrate that the proposed method outperforms current mainstream segmentation methods.The mean Pixel Accuracy （mPA） and mean Intersection over Union （mIoU） of the proposed method on Potsdam remote sensing dataset are 2.34 and 3.23 percentage points higher than those of DeepLabV3 method， and 1.28 and 1.75 percentage points higher than those of SegFormer method. Additionally， the proposed method was applied to identify and segment oyster rafts in high-resolution remote sensing images of the Maowei Sea in Qinzhou， Guangxi， and achieved Pixel Accuracy （PA） and Intersection over Union （IoU） of 96.21% and 91.70%， respectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Twice attention mechanism distantly supervised relation extraction based on BERT

Quan YUAN, Changping CHEN, Ze CHEN, Linfeng ZHAN

Journal of Computer Applications 2024, 44 (4): 1080-1085. DOI: 10.11772/j.issn.1001-9081.2023040490

Abstract （250）

HTML （10）

PDF （737KB）（327）

Save

Aiming at the problem of incomplete semantic information of word vectors and the problem of word polysemy faced by text feature extraction， a BERT （Bidirectional Encoder Representation from Transformer） word vector-based Twice Attention mechanism weighting algorithm for Relation Extraction （TARE） was proposed. Firstly， in the word embedding stage， the self-attention dynamic encoding algorithm was used to capture the semantic information before and after the text for the current word vector by constructing Q， K and V matrices. Then， after the model output the sentence-level feature vector， the locator was used to extract the corresponding parameters of the fully connected layer to construct the relation attention matrix. Finally， the sentence level attention mechanism algorithm was used to add different attention scores to sentence-level feature vectors to improve the noise immunity of sentence-level features. The experimental results show that compared with Contrastive Instance Learning （CIL） algorithm for relation extraction， the F1 value is increased by 4.0 percentage points and the average value of Precision@100， Precision@200， and Precision@300 （P@M） is increased by 11.3 percentage points on the NYT-10m dataset. Compared with the Piecewise Convolutional Neural Network algorithm based on ATTention mechanism （PCNN-ATT）， the AUC （Area Under precision-recall Curve） value is increased by 4.8 percentage points and the P@M value is increased by 2.1 percentage points on the NYT-10d dataset. In various mainstream Distantly Supervised for Relation Extraction （DSRE） tasks， TARE effectively improves the model’s ability to learn data features.

Table and Figures | Reference | Related Articles | Metrics

Select

Knowledge-guided visual relationship detection model

Yuanlong WANG, Wenbo HU, Hu ZHANG

Journal of Computer Applications 2024, 44 (3): 683-689. DOI: 10.11772/j.issn.1001-9081.2023040413

Abstract （261）

HTML （25）

PDF （1592KB）（321）

Save

The task of Visual Relationship Detection （VRD） is to further detect the relationship between target objects on the basis of target recognition， which belongs to the key technology of visual understanding and reasoning. Due to the interaction and combination between objects， it is easy to cause the combinatorial explosion problem of relationship between objects， resulting in many entity pairs with weak correlation， which in turn makes the subsequent relationship detection recall rate low. To solve the above problems， a knowledge-guided visual relationship detection model was proposed. Firstly， visual knowledge was constructed， data analysis and statistics were carried out on entity labels and relationship labels in common visual relationship detection datasets， and the interaction co-occurrence frequency between entities and relationships was obtained as visual knowledge. Then， the constructed visual knowledge was used to optimize the combination process of entity pairs， the score of entity pairs with weak correlation decreased， while the score of entity pairs with strong correlation increased， and then the entity pairs were ranked according to their scores and the entity pairs with lower scores were deleted； the relationship score was also optimized in a knowledge-guided way for the relationship between entities， so as to improve the recall rate of the model. The effect of the proposed model was verified in the public datasets VG （Visual Genome） and VRD， respectively. In predicate classification tasks， compared with the existing model PE-Net （Prototype-based Embedding Network）， the recall rates Recall@50 and Recall@100 improved by 1.84 and 1.14 percentage points respectively in the VG dataset. Compared to Coacher， the Recall@20， Recall@50 and Recall@100 increased by 0.22， 0.32 and 0.31 percentage points respectively in the VRD dataset.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey of extractive text summarization based on unsupervised learning and supervised learning

Xiawuji, Heming HUANG, Gengzangcuomao, Yutao FAN

Journal of Computer Applications 2024, 44 (4): 1035-1048. DOI: 10.11772/j.issn.1001-9081.2023040537

Abstract （339）

HTML （20）

PDF （1575KB）（704）

Save

Different from generative summarization methods， extractive summarization methods are more feasible to implement， more readable， and more widely used. At present， the literatures on extractive summarization methods mostly analyze and review some specific methods or fields， and there is no multi-faceted and multi-lingual systematic review. Therefore， the meanings of text summarization generation were discussed， related literatures were systematically reviewed， and the methods of extractive text summarization based on unsupervised learning and supervised learning were analyzed multi-dimensionally and comprehensively. First， the development of text summarization techniques was reviewed， and different methods of extractive text summarization were analyzed， including the methods based on rules， Term Frequency-Inverse Document Frequency （TF-IDF）， centrality， potential semantic， deep learning， graph sorting， feature engineering， and pre-training learning， etc. Also， comparisons of advantages and disadvantages among different algorithms were made. Secondly， datasets in different languages for text summarization and popular evaluation metrics were introduced in detail. Finally， problems and challenges for research of extractive text summarization were discussed， and solutions and research trends were presented.

Table and Figures | Reference | Related Articles | Metrics

Select

Human pose transfer model combining convolution and multi-head attention

Hong YANG, He ZHANG, Shaoning JIN

Journal of Computer Applications 2023, 43 (11): 3403-3410. DOI: 10.11772/j.issn.1001-9081.2022111707

Abstract （196）

HTML （14）

PDF （2734KB）（424）

Save

For a given reference image of a person， the goal of Human Pose Transfer （HPT） is to generate an image of that person in any arbitrary pose. Many existing related methods fail to capture the details of a person’s appearance and have difficulties in predicting invisible regions， especially for complex pose transformation， and it is difficult to generate a clear and realistic person’s appearance. To address the above problems， a new HPT model that integrated convolution and multi-head attention was proposed. Firstly， the Convolution-Multi-Head Attention （Conv-MHA） block was constructed by fusing the convolution and multi-head attention， then it was used to extract rich contextual features. Secondly， to improve the learning ability of the proposed model， the HPT network was constructed by using Conv-MHA block. Finally， the self-reconstruction of the reference image was introduced as an auxiliary task to make the model more fully utilized its performance. The Conv-MHA-based human pose transfer model was validated on DeepFashion and Market-1501 datasets， and the results on DeepFashion test dataset show that it outperforms the state-of-the-art human pose transfer model， DPTN （Dual-task Pose Transformer Network）， in terms of Structural SIMilarity （SSIM）， Learned Perceptual Image Patch Similarity （LPIPS） and FID （Fréchet Inception Distance） indicators. Experimental results show that the Conv-MHA module， which integrates convolution and multi-head attention mechanism， can improve the representation ability of the model， capture the details of person’s appearance more effectively， and improve the accuracy of person image generation.

Table and Figures | Reference | Related Articles | Metrics

Select

Missing value imputation algorithm using dual discriminator based on conditional generative adversarial imputation network

Jia SU, Hong YU

Journal of Computer Applications 2024, 44 (5): 1423-1427. DOI: 10.11772/j.issn.1001-9081.2023050697

Abstract （162）

HTML （9）

PDF （872KB）（246）

Save

Various factors in the application may cause data loss and affect the analysis of subsequent tasks. Therefore， the imputation of missing data values in data sets is particularly important. Moreover， the accuracy of data imputation can significantly impact the analysis of subsequent tasks. Incorrect imputation data may introduce more severe bias in the analysis compared to missing data. A new missing value imputation algorithm named DDC-GAIN （Dual Discriminator based on Conditional Generation Adversarial Imputation Network） was introduced based on Conditional Generative Adversarial Imputation Network （C-GAIN） and dual discriminator， in which the primary discriminator was assisted by the auxiliary discriminator in assessing the validity of predicted values. In other words， the authenticity of the generated sample was judged by global sample information and the relationship between features was emphasized to estimate predicted values. Experimental results on four datasets show that， compared with five classical imputation algorithms， DDC-GAIN algorithm achieves the lowest Root Mean Square Error （RMSE） under the same conditions and with large sample size； when the missing rate is 15% on the Default credit card dataset， the RMSE of DDC-GAIN is 28.99% lower than that of the optimal comparison algorithm C-GAIN. This indicates that it is effective to utilize the auxiliary discriminator to support the primary discriminator in learning feature relationships.

Table and Figures | Reference | Related Articles | Metrics

Select

Information retrieval method based on multi-granularity semantic fusion

Zhengyu ZHAO, Jing LUO, Xinhui TU

Journal of Computer Applications 2024, 44 (6): 1775-1780. DOI: 10.11772/j.issn.1001-9081.2023050646

Abstract （203）

HTML （11）

PDF （1551KB）（186）

Save

Information Retrieval （IR） is a process that organizes and processes information using specific techniques and methods to meet users’ information needs. In recent years， dense retrieval methods based on pre-trained models have achieved significant success. However， these methods only utilize vector representations of text and words to calculate the relevance between query and document， ignoring the semantic information at the phrase level. To address this issue， an IR method called MSIR （Multi-Scale Information Retrieval） was proposed. IR performance was enhanced by integrating semantic information of different granularities from the query and the document. First， semantic units of three different granularities — word， phrase， and text — were constructed in the query and the document. Then， the pre-trained model was used to encode these three semantic units separately to obtain their semantic representations. Finally， these semantic representations were used to calculate the relevance between the query and the document. Comparison experiments were conducted on three classic datasets of different sizes， including Corvid-19， TREC2019 and Robust04. Compared with ColBERT （ranking model based on Contextualized late interaction over BERT （Bidirectional Encoder Representation from Transformers））， MSIR shows an approximately 8% improvement in the P@10， P@20， NDCG@10 and NDCG@20 indicators on Robust04 dataset， as well as some improvements on Corvid-19 and TREC2019 datasets. Experimental results demonstrate that MSIR can effectively integrate multi-granularity semantic information， thereby enhancing retrieval accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Location control method for generated objects by diffusion model with exciting and pooling attention

Jinsong XU, Ming ZHU, Zhiqiang LI, Shijie GUO

Journal of Computer Applications 2024, 44 (4): 1093-1098. DOI: 10.11772/j.issn.1001-9081.2023050634

Abstract （221）

HTML （14）

PDF （2886KB）（113）

Save

Due to the ambiguity of text and the lack of location information in training data， current state-of-the-art diffusion model cannot accurately control the locations of generated objects in the image under the condition of text prompts. To address this issue， a spatial condition of the object’s location range was introduced， and an attention-guided method was proposed based on the strong correlation between the cross-attention map in U-Net and the image spatial layout to control the generation of the attention map， thus controlling the locations of the generated objects. Specifically， based on the Stable Diffusion （SD） model， in the early stage of the generation of the cross-attention map in the U-Net layer， a loss was introduced to stimulate high attention values in the corresponding location range， and reduce the average attention value outside the range. The noise vector in the latent space was optimized step by step in each denoising step to control the generation of the attention map. Experimental results show that the proposed method can significantly control the locations of one or more objects in the generated image， and when generating multiple objects， it can reduce the phenomenon of object omission， redundant object generation， and object fusion.

Table and Figures | Reference | Related Articles | Metrics

Select

Text semantic de-duplication algorithm based on keyword graph representation

Jinyun WANG, Yang XIANG

Journal of Computer Applications 2023, 43 (10): 3070-3076. DOI: 10.11772/j.issn.1001-9081.2022101495

Abstract （276）

HTML （22）

PDF （1266KB）（263）

Save

There are a large number of redundant texts with the same or similar semantics in the network. Text de-duplication can solve the problem that redundant texts waste storage space and can reduce unnecessary consumption for information extraction tasks. Traditional text de-duplication algorithms rely on literal overlapping information， and do not make use of the semantic information of texts； at the same time， they cannot capture the interaction information between sentences that are far away from each other in long text， so that the de-duplication effect of these methods is not ideal. Aiming at the problem of text semantic de-duplication， a long text de-duplication algorithm based on keyword graph representation was proposed. Firstly， the text pair was represented as a graph with the keyword phrase as the vertex by extracting the semantic keyword phrase from the text pair. Secondly， the nodes were encoded in various ways， and Graph Attention Network （GAT） was used to learn the relationship between nodes to obtain the vector representation of text to the graph， and judge whether the text pairs were semantically similar. Finally， the de-duplication processing was performed according to the text pair’s semantical similarity. Compared with the traditional methods， this method can use the semantic information of texts effectively， and through the graph structure， the method can connect the distant sentences in the long text by the co-occurrence relationship of keyword phrases to increase the semantic interaction between different sentences. Experimental results show that the proposed algorithm performs better than the traditional algorithms， such as Simhash， BERT （Bidirectional Encoder Representations from Transformers） fine-tuning and Concept Interaction Graph （CIG）， on both CNSE （Chinese News Same Event） and CNSS （Chinese News Same Story） datasets. Specifically， the F1 score of the proposed algorithm on CNSE dataset is 84.65%， and that on CNSS dataset reaches 90.76%. The above indicates that the proposed algorithm can improve the effect of text de-duplication tasks effectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Highway traffic flow prediction based on feature fusion graph attention network

Chun GAO, Mengling WANG

Journal of Computer Applications 2023, 43 (10): 3114-3120. DOI: 10.11772/j.issn.1001-9081.2022101587

Abstract （309）

HTML （32）

PDF （2825KB）（258）

Save

Based on the actual spatio-temporal topology of the traffic network， a Feature Fusion Graph ATtention network （FF-GAT） model was proposed to fuse multiple traffic state information obtained by nodes， so as to predict the highway traffic flow. First， the correlation features among the vehicle speed， traffic flow and occupancy of the nodes were analyzed， and based on the multivariate temporal attention mechanism， the relationships among the vehicle speed， traffic flow and occupancy were incorporated into the attention mechanism to capture the dynamic temporal correlation between different moments of traffic flow. Then， the nodes were divided into different sets of neighborhoods， and the spatial correlation between different neighborhoods of traffic flow was captured by the feature fusion Graph Attention neTwork （GAT）. At the same time， the coupling correlation between multiple heterogeneous data was fully explored by the feature crossover network to provide effective information supplement for predicting the target sequence. Experiments were carried out on two publicly available traffic flow datasets. Experimental results show that FF-GAT model reduces the Root Mean Squared Error （RMSE） by 3.4% compared with ASTGCN （Attention based Spatial-Temporal Graph Convolutional Network） model and 3.1% compared with GCN-GAN （Graph Convolutional Network and Generative Adversarial Network） model on PeMSD8 dataset. It can be seen that FF-GAT model can effectively improve the prediction accuracy through feature fusion.

Table and Figures | Reference | Related Articles | Metrics

Select

Semi-supervised heterophilic graph representation learning model based on Graph Transformer

Shibin LI, Jun GONG, Shengjun TANG

Journal of Computer Applications 2024, 44 (6): 1816-1823. DOI: 10.11772/j.issn.1001-9081.2023060811

Abstract （170）

HTML （10）

PDF （2420KB）（203）

Save

Existing Graph Convolutional Network （GCN） methods are based on the assumption of homophily， which cannot be directly applied to heterophilic graph representation learning， and many studies on heterophilic graph representation learning are limited by message-passing mechanism， which leads to the problem of over-smoothing due to the confusion and over-squeezing of node features. To address these issues， a semi-supervised heterophilic graph representation learning model based on Graph Transformer，named HPGT（HeteroPhilic Graph Transformer）， was proposed. Firstly， the path neighborhood of a node was sampled using the degree connection probability matrix， then the heterophilic connection patterns of nodes on the path were adaptively aggregated through the self-attention mechanism， which were encoded to obtain the structural information of nodes， and the original attribute information and structural information of nodes were used to construct the self-attention module of the Transformer layer. Secondly， the hidden layer representation of each node itself was separated from those of its neighboring nodes and updated to avoid the node aggregating too much information about itself through the self-attention module， and then the representation and the neighborhood representation of nodes were connected to get the output of a single Transformer layer； in addition， the outputs of all Transformer layers were connected to get the final node hidden layer representation so as to prevent the loss of information in middle layers. Finally， the linear layer and Softmax layer were used to map the hidden layer representations of nodes to the predictive labels of nodes. In the comparison experiments with the model without Structural Encoding （SE）， SE based on degree connection probability provides effective deviation information for self-attention modules of Transformer layers， and improves the average accuracy of HPGT by 0.99% to 11.98%. Compared with the comparative models， on the heterophilic datasets （Texas， Cornell， Wisconsin， and Actor）， the node classification accuracies of HPGT are improved by 0.21% to 1.69%， and on homophilic datasets （Cora， CiteSeer， and PubMed）， the node classification accuracies reach 0.837 9， 0.746 7 and 0.886 2， respectively. The experimental results show that HPGT has a strong ability for heterogeneous graph representation learning， and is particularly suitable for node classification tasks of strong heterophilic graphs.

Table and Figures | Reference | Related Articles | Metrics

Select

Generative label adversarial text classification model

Xun YAO, Zhongzheng QIN, Jie YANG

Journal of Computer Applications 2024, 44 (6): 1781-1785. DOI: 10.11772/j.issn.1001-9081.2023050662

Abstract （370）

HTML （15）

PDF （1142KB）（446）

Save

Text classification is a fundamental task in Natural Language Processing （NLP）， aiming to assign text data to predefined categories. The combination of Graph Convolutional neural Network （GCN） and large-scale pre-trained model BERT （Bidirectional Encoder Representations from Transformer） has achieved excellent results in text classification tasks. Undirected information transmission of GCN in large-scale heterogeneous graphs produces information noise， which affects the judgment of the model and reduce the classification ability of the model. To solve this problem， a generative label adversarial model， the Class Adversarial Graph Convolutional Network （CAGCN） model， was proposed to reduce the interference of irrelevant information during classification and improve the classification performance of the model. Firstly， the composition method in TextGCN （Text Graph Convolutional Network） was used to construct the adjacency matrix， which was combined with GCN and BERT models as a Class Generator （CG）. Secondly， the pseudo-label feature training method was used in the model training to construct a clueter. The cluster and the class generator were jointly trained. Finally， experiments were carried out on several widely used datasets. Experimental results show that the classification accuracy of CAGCN model is 1.2， 0.1， 0.5， 1.7 and 0.5 percentage points higher than that of RoBERTaGCN model on the widely used classification datasets 20NG， R8， R52， Ohsumed and MR， respectively.

Table and Figures | Reference | Related Articles | Metrics

Select

Dual-channel sentiment analysis model based on improved prompt learning method

Junfeng SHEN, Xingchen ZHOU, Can TANG

Journal of Computer Applications 2024, 44 (6): 1796-1806. DOI: 10.11772/j.issn.1001-9081.2023060733

Abstract （201）

HTML （13）

PDF （3205KB）（120）

Save

Aiming at the problems of long template iterative update cycle and poor generalization ability in the previous prompt learning method， a dual-channel sentiment analysis model was proposed based on the improved prompt learning method.First， The serialized prompt templates and the input word vectors were introduced into the attention mechanism structure， and meanwhile， the templates were iteratively updated as the input word vectors were updated in the multi-layer attention mechanism. Then， the semantic information was extracted by the ALBERT （A Lite BERT （Bidirectional Encoder Representations from Transformers）） model in another channel. Finally， the extracted semantic features were integrated to improve the generalization ability of the overall model. The model was tested on the Laptop and Restaurants datasets in SemEval2014， the ACL’s （Association for Computational Linguistics） Twitter dataset， and the SST-2 dataset created by Stanford University. The proposed model achieved the classification accuracy of 80.88%， 91.78%， 76.78% and 95.53%， respectively； compared with the baseline model BERT_Large， it increased the classification accuracy by 0.99%， 1.13%， 3.39% and 2.84% respectively； compared with P-tuning v2， the proposed model had 2.88%， 3.60% and 2.06% improvements in classification accuracy on Restaurants， Twitter， and SST-2 datasets respectively， and it reached the convergence state earlier than the original method.

Table and Figures | Reference | Related Articles | Metrics

Select

Adversarial training method with adaptive attack strength

Tong CHEN, Jiwei WEI, Shiyuan HE, Jingkuan SONG, Yang YANG

Journal of Computer Applications 2024, 44 (1): 94-100. DOI: 10.11772/j.issn.1001-9081.2023060854

Abstract （311）

HTML （12）

PDF （1227KB）（308）

Save

The vulnerability of deep neural networks to adversarial attacks has raised significant concerns about the security and reliability of artificial intelligence systems. Adversarial training is an effective approach to enhance adversarial robustness. To address the issue that existing methods adopt fixed adversarial sample generation strategies but neglect the importance of the adversarial sample generation phase for adversarial training， an adversarial training method was proposed based on adaptive attack strength. Firstly， the clean sample and the adversarial sample were input into the model to obtain the output. Then， the difference between the model outputs of the clean sample and the adversarial sample was calculated. Finally， the change of the difference compared with the previous moment was measured to automatically adjust the strength of the adversarial sample. Comprehensive experimental results on three benchmark datasets demonstrate that compared with the baseline method Adversarial Training with Projected Gradient Descent （PGD-AT）， the proposed method improves the robust precision under AA （AutoAttack） attack by 1.92， 1.50 and 3.35 percentage points on three benchmark datasets， respectively， and the proposed method outperforms the state-of-the-art defense method Adversarial Training with Learnable Attack Strategy （LAS-AT） in terms of robustness and natural accuracy. Furthermore， from the perspective of data augmentation， the proposed method can effectively address the problem of diminishing augmentation effect during adversarial training.

Table and Figures | Reference | Related Articles | Metrics

Select

Information diffusion prediction model based on Transformer and relational graph convolutional network

Xiting LYU, Jinghua ZHAO, Haiying RONG, Jiale ZHAO

Journal of Computer Applications 2024, 44 (6): 1760-1766. DOI: 10.11772/j.issn.1001-9081.2023060884

Abstract （268）

HTML （20）

PDF （2150KB）（216）

Save

Aiming at the problem that in the dynamic evolution of information diffusion， it is difficult to effectively capture structural features， temporal features， and the interactive expression between them， an information diffusion prediction model based on Transformer and Relational Graph Convolutional Network （TRGCN） was proposed. Firstly， a dynamic heterogeneous graph composed of the social network graph and the diffusion cascade graph was constructed. The structural features of each node in this graph were then extracted using Relational Graph Convolutional Network （RGCN）. Secondly， the time embedding of each node was re-encoded using Bi-directional Long Short-Term Memory （Bi-LSTM） network. Then a time decay term was introduced to give different weights to the nodes at different time positions， so as to obtain the temporal features of nodes. Finally， structural features and temporal features were input into Transformer and then merged. Finally， the spatial-temporal features were obtained for information diffusion prediction. The experimental results on three real datasets of Twitter， Douban and Memetracker show that compared with the optimal model in the comparison experiment， the Hits@100 of TRGCN increase by 3.18%， 5.96% and 3.34% respectively， the Map@100 of TRGCN increase by 11.60%， 19.72% and 8.47% respectively， proving its validity and rationality.

Table and Figures | Reference | Related Articles | Metrics

Select

Fast adversarial training method based on random noise and adaptive step size

Jinfu WU, Yi LIU

Journal of Computer Applications 2024, 44 (6): 1807-1815. DOI: 10.11772/j.issn.1001-9081.2023060774

Abstract （208）

HTML （8）

PDF （1851KB）（181）

Save

Adversarial Training （AT） and its variants have been proven to be the most effective methods for defending against adversarial attacks. However， the process of generating adversarial examples requires extensive computational resources， resulting in low model training efficiency and limited feasibility. On the other hand， Fast AT （Fast-AT） uses single-step adversarial attacks to replace multi-step attacks for accelerating the training process， but its model robustness is much lower than that of multi-step AT methods， and it is susceptible to Catastrophic Overfitting （CO）. To address these issues， a Fast-AT method based on random noise and adaptive step size was proposed. Firstly， in each iteration of generating adversarial examples， random noise was added to the original input images for data augmentation. Then， the gradients of each adversarial example during the training process were accumulated， and the step size of the adversarial examples was adaptively adjusted based on the gradient information. Finally， adversarial attacks were performed according to the perturbation step size and gradient information to generate adversarial examples for model training. Various adversarial attacks were conducted on the CIFAR-10 and CIFAR-100 datasets， and compared to N-FGSM （Noise Fast Gradient Sign Method）， the proposed method achieved at least a 0.35 percentage point improvement in robust accuracy. The experimental results demonstrate that the proposed method can avoid CO issue in Fast-AT and enhance the robustness of deep learning models.

Table and Figures | Reference | Related Articles | Metrics

Select

Text classification based on pre-training model and label fusion

Hang YU, Yanling ZHOU, Mengxin ZHAI, Han LIU

Journal of Computer Applications 2024, 44 (3): 709-714. DOI: 10.11772/j.issn.1001-9081.2023030340

Abstract （329）

HTML （33）

PDF （922KB）（369）

Save

Accurate classification of massive user text comment data has important economic and social benefits. Nowadays， in most text classification methods， text encoding method is used directly before various classifiers， while the prompt information contained in the label text is ignored. To address the above issues， a pre-training model based Text and Label Information Fusion Classification model based on RoBERTa （Robustly optimized BERT pretraining approach） was proposed， namely TLIFC-RoBERTa. Firstly， a RoBERTa pre-training model was used to obtain the word vector. Then， the Siamese network structure was used to train the text and label vectors respectively， and the label information was mapped to the text through interactive attention， so as to integrate the label information into the model. Finally， an adaptive fusion layer was set to closely fuse the text representation with the label representation for classification. Experimental results on Today Headlines and THUCNews datasets show that compared with mainstream deep learning models such as RA-Labelatt （replacing static word vectors in Label-based attention improved model with word vectors trained by RoBERTa-wwm） and LEMC-RoBERTa （RoBERTa combined with Label-Embedding-based Multi-scale Convolution for text classification）， the accuracy of TLIFC-RoBERTa is the highest， and it achieves the best classification performance in user comment datasets.

Table and Figures | Reference | Related Articles | Metrics

Project Articles