Received:
Revised:
Online:
Supported by:
李龚林1,范一晨2,米宇舰2,李明1
通讯作者:
基金资助:
Abstract: In view of the problems of large model size and difficulty in applying a single model for text classification to diverse and non-normative representations of public opinion information, the Dynamic Fine-tuning and Secondary Weighting Model Integration Algorithm (Bagging-DyFAS), based on the idea of Bagging training, is proposed. The model integration algorithm (Bagging-DyFAS) is based on the idea of Bagging training, dynamic fine-tuning and secondary weighting model integration algorithm. The weak classifier is first trained with a dataset constructed by self-sampling to give it some a priori knowledge; then it is dynamically weighted once and statically weighted once based on its performance in the development set, and the resulting set of weights is used to generalise the model to unlabeled data, which can further improve the performance of the model in text classification tasks. Experiments on the dataset constructed in this paper show that, compared to other baseline models, the approach in this paper improves accuracy by at least three percentage points, precision by at least three percentage points, recall by at least one percentage point and F1 value by at least three percentage points.
Key words: text classification, model integration, quadratic weighting, dynamic weighting, public opinion analysis, pre-training language model
摘要: 针对单一模型用于文本分类存在的模型体量大,难以适用于舆情信息文本的多元化非规范的表达等问题,提出了基于Bagging训练思想的、动态微调和二次加权的模型集成算法(Bagging-DyFAS)。首先用自助采样构建的数据集训练弱分类器,使其具有一定的先验知识;然后依据其在开发集的表现,进行一次动态加权和一次静态加权,使用得到的一系列权重,将模型泛化到无标注的数据上,可进一步提升模型在文本分类任务的性能。在本文构建的数据集上进行实验,结果表明:相较于其他基线模型,本文的方法在准确率至少提升了3个百分点,精确率至少提升了3个百分点,召回率至少提升了1个百分点,F1值至少提升了3个百分点。
关键词: 文本分类, 模型集成, 二次加权, 动态加权, 舆情分析, 预训练语言模型
CLC Number:
TP391.1
TP18
李龚林 范一晨 米宇舰 李明. Bagging-DyFAS:动态微调的模型集成算法[J]. .
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/