Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Data augmentation technique incorporating label confusion for Chinese text classification
Haitao SUN, Jiayu LIN, Zuhong LIANG, Jie GUO
Journal of Computer Applications    2025, 45 (4): 1113-1119.   DOI: 10.11772/j.issn.1001-9081.2024040550
Abstract42)   HTML0)    PDF (863KB)(27)       Save

Traditional data augmentation techniques, such as synonym substitution, random insertion, and random deletion, may change the original semantics of text and even result in the loss of critical information. Moreover, data in text classification tasks typically have both textual and label parts. However, traditional data augmentation methods only focus on the textual part. To address these issues, a Label Confusion incorporated Data Augmentation (LCDA) technique was proposed for providing a comprehensive enhancement of data from both textual and label aspects. In terms of text, by enhancing the text through random insertion and replacement of punctuation marks and completing end-of-sentence punctuation marks, textual diversity was increased with all textual information and sequence preserved. In terms of labels, simulated label distribution was generated using a label confusion approach, and used to replace the traditional one-hot label distribution, so as to better reflect the relationships among instances and labels as well as between labels. In experiments conducted on few-shot datasets constructed from THUCNews (TsingHua University Chinese News) and Toutiao Chinese news datasets, the proposed technique was combined with TextCNN, TextRNN, BERT (Bidirectional Encoder Representations from Transformers), and RoBERTa-CNN (Robustly optimized BERT approach Convolutional Neural Network) text classification models. The experimental results indicate that compared to those before enhancement, all models demonstrate significant performance improvements. Specifically, on 50-THU, a dataset constructed on THUCNews dataset, the accuracies of four models combing LCDA technique are improved by 1.19, 6.87, 3.21, and 2.89 percentage points, respectively, compared to those before enhancement, and by 0.78, 7.62, 1.75, and 1.28 percentage points, respectively, compared to those of the four models combining softEDA (Easy Data Augmentation with soft labels) method. By both textual and label processing results, model accuracy is enhanced by LCDA technique significantly, particularly in application scenarios characterized by limited data availability.

Table and Figures | Reference | Related Articles | Metrics
Recommendation method based on knowledge‑awareness and cross-level contrastive learning
Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN
Journal of Computer Applications    2024, 44 (4): 1121-1127.   DOI: 10.11772/j.issn.1001-9081.2023050613
Abstract233)   HTML15)    PDF (968KB)(187)       Save

As a kind of side information, Knowledge Graph (KG) can effectively improve the recommendation quality of recommendation models, but the existing knowledge-awareness recommendation methods based on Graph Neural Network (GNN) suffer from unbalanced utilization of node information. To address the above problem, a new recommendation method based on Knowledge?awareness and Cross-level Contrastive Learning (KCCL) was proposed. To alleviate the problem of unbalanced node information utilization caused by the sparse interaction data and noisy knowledge graph that deviate from the true representation of inter-node dependencies during information aggregation, a contrastive learning paradigm was introduced into knowledge-awareness recommendation model of GNN. Firstly, the user-item interaction graph and the item knowledge graph were integrated into a heterogeneous graph, and the node representations of users and items were realized by a GNN based on the graph attention mechanism. Secondly, consistent noise was added to the information propagation aggregation layer for data augmentation to obtain node representations of different levels, and the obtained outermost node representation was compared with the innermost node representation for cross-level contrastive learning. Finally, the supervised recommendation task and the contrastive learning assistance task were jointly optimized to obtain the final representation of each node. Experimental results on DBbook2014 and MovieLens-1m datasets show that compared to the second prior contrastive method, the Recall@10 of KCCL is improved by 3.66% and 0.66%, respectively, and the NDCG@10 is improved by 3.57% and 3.29%, respectively, which verifies the effectiveness of KCCL.

Table and Figures | Reference | Related Articles | Metrics