Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (1): 82-89.DOI: 10.11772/j.issn.1001-9081.2024010085

• Artificial intelligence • Previous Articles     Next Articles

Hierarchical multi-label classification model for public complaints with long-tailed distribution

Xin LIU1(), Dawei YANG1, Changheng SHAO2, Haiwen WANG1, Mingjiang PANG1, Yanru LI1   

  1. 1.Qingdao Institute of Software,College of Computer Science and Technology,China University of Petroleum (East China),Qingdao Shandong 266580,China
    2.College of Computer Science and Technology,Qingdao University,Qingdao Shandong 266071,China
  • Received:2024-01-25 Revised:2024-03-25 Accepted:2024-03-27 Online:2024-05-09 Published:2025-01-10
  • Contact: Xin LIU
  • About author:YANG Dawei, born in 1997, M. S. candidate. His research interests include natural language processing, deep learning.
    SHAO Changheng, born in 1986, Ph. D. His research interests include big data, smart city.
    WANG Haiwen, born in 1993, M. S. candidate. His research interests include blockchain, natural language processing.
    PANG Mingjiang, born in 2000, M. S. candidate. His research interests include natural language processing.
    LI Yanru, born in 1998, M. S. candidate. Her research interests include natural language processing, federated learning.
  • Supported by:
    National Natural Science Foundation of China(62071491);Shandong Provincial Natural Science Foundation(ZR2020MF045)

面向长尾分布的民众诉求层次多标签分类模型

刘昕1(), 杨大伟1, 邵长恒2, 王海文1, 庞铭江1, 李艳茹1   

  1. 1.中国石油大学(华东) 青岛软件学院、计算机科学与技术学院,山东 青岛 266580
    2.青岛大学 计算机科学技术学院,山东 青岛 266071
  • 通讯作者: 刘昕
  • 作者简介:杨大伟(1997—),男,江苏扬州人,硕士研究生,主要研究方向:自然语言处理、深度学习;
    邵长恒(1986—),男,山东枣庄人,博士,主要研究方向:大数据、智慧城市;
    王海文(1993—),男,山东淄博人,硕士研究生,主要研究方向:区块链、自然语言处理;
    庞铭江(2000—),男,山东济南人,硕士研究生,主要研究方向:自然语言处理;
    李艳茹(1998—),女,山东济南人,硕士研究生,主要研究方向:自然语言处理、联邦学习。
  • 基金资助:
    国家自然科学基金资助项目(62071491);山东省自然科学基金资助项目(ZR2020MF045)

Abstract:

Swift response to public complaints is an important measure to realize intelligent social governance and improve people’s satisfaction. It is particularly crucial to analyze public complaints accurately to match work order processing departments intelligently, and to realize swift response and efficient handling of public complaints. However, the vague description of complaints, confusion of categories and imbalance of proportion in public complaint data lead to difficulties in analyzing categories of complaints, thus reducing the efficiency and accuracy of intelligent order dispatching. To solve the above problems, a hierarchical multi-label classification model (HMCHotline) for complaints with encoder-decoder structure was proposed. Firstly, the fine-grained keyword prior knowledge in complaint domain was introduced into the text encoder to suppress noise interference, and the spatio-temporal information in complaints was fused to improve the discriminant ability of semantic features. Secondly, the label hierarchy was used to generate label embeddings with hierarchy-awareness and semantic-awareness, and a label decoder based on the Transformer model was constructed to decode labels using the semantic features from the complaints and label features. At the same time,the dynamic label table strategy was introduced based on the hierarchical dependency to limit the decoding range of labels for solving the problem of label inconsistency. Finally, the Softmax grouping strategy was used to divide the label categories with the similar size into the same group for Softmax operation, which alleviated the problem of low classification accuracy caused by the long-tailed distribution of labels. Experimental results on Hotline, RCV1 (Reuters Corpus Volume I) -v2 and WOS (Web Of Science) datasets show that compared with Hierarchy-aware label semantics Matching network (HiMatch), the proposed model improves the Micro-F1 by 1.65, 2.06 and 0.43 percentage points respectively, proving the effectiveness of the proposed model.

Key words: swift response to public complaints, intelligent order dispatching, hierarchical multi-label classification, priori knowledge, long-tailed distribution, encoder-decoder

摘要:

接诉即办是实现社会治理智能化、提高人民满意度的重要举措,其中精准分析民众诉求智能匹配工单处理部门,实现诉求的快速响应、高效办理尤为关键;然而,民众诉求数据中的诉求描述不清晰、类别混淆且比例失衡会导致诉求类别分析困难,影响了智能派单的效率与准确性。针对上述问题,提出编解码器结构的诉求层次多标签分类模型(HMCHotline)。首先,在文本编码器中引入诉求领域中的细粒度关键词先验知识以抑制噪声干扰,并融合诉求的时空信息提高语义特征的判别力;其次,利用标签层次结构生成具有层次与语义感知的标签嵌入,并构建基于Transformer模型的标签解码器,利用诉求的语义特征和标签嵌入进行标签解码;同时,在标签的层级依赖关系基础上引入动态标签表策略限制标签的解码范围,以解决标签不一致问题;最后,采用Softmax分组策略将样本数量相近的标签类别分为同组进行Softmax操作,从而缓解由标签长尾分布导致的分类准确率低的问题。在Hotline、RCV1 (Reuters Corpus Volume I)-v2和WOS (Web Of Science)数据集上的实验结果表明,相较于层次感知的标签语义匹配网络(HiMatch),所提模型的Micro-F1分别提高了1.65、2.06和0.43个百分点,验证了模型的有效性。

关键词: 接诉即办, 智能派单, 层次多标签分类, 先验知识, 长尾分布, 编解码器

CLC Number: