《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (4): 1190-1198.DOI: 10.11772/j.issn.1001-9081.2024030331

• 人工智能 • 上一篇    下一篇

融合叙事单元和可靠标签的小说说话人识别框架

刘天宇, 陶冶(), 鲁超峰, 刘家旺   

  1. 青岛科技大学 信息科学技术学院,山东 青岛 266061
  • 收稿日期:2024-03-25 修回日期:2024-04-29 接受日期:2024-05-06 发布日期:2024-06-04 出版日期:2025-04-10
  • 通讯作者: 陶冶
  • 作者简介:刘天宇(1999—),女,山东济宁人,硕士研究生,主要研究方向:自然语言处理、说话人识别
    鲁超峰(1999—),男,山东菏泽人,硕士研究生,主要研究方向:语音合成、自然语言处理
    刘家旺(1999—),男,山东济宁人,硕士研究生,主要研究方向:语音合成、自然语言处理、问答系统、知识图谱。
  • 基金资助:
    国家重点研发计划项目(2023YFF0612100);青岛市关键技术攻关及产业化示范项目(24?1?2?qljh?19?gx)

Novel speaker identification framework based on narrative unit and reliable label

Tianyu LIU, Ye TAO(), Chaofeng LU, Jiawang LIU   

  1. College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao Shandong 266061,China
  • Received:2024-03-25 Revised:2024-04-29 Accepted:2024-05-06 Online:2024-06-04 Published:2025-04-10
  • Contact: Ye TAO
  • About author:LIU Tianyu, born in 1999, M. S. candidate. Her research interests include natural language processing, speaker identification.
    LU Chaofeng, born in 1999, M. S. candidate. His research interests include speech synthesis, natural language processing.
    LIU Jiawang, born in 1999, M. S. candidate. His research interests include speech synthesis, natural language processing, question answering system, knowledge graph.
  • Supported by:
    National Key Research and Development Program of China(2023YFF0612102);Key Technology Research and Industrialization Demonstration Project in Qingdao(24-1-2-qljh-19-gx)

摘要:

小说中的说话人识别(SI)旨在通过引语所在上下文判断它的说话人。这项任务对在制作有声书的过程中为不同的角色分配合适的声音有很大帮助。然而,现有方法对引语上下文的选择主要以固定窗口值为主,这种方式不够灵活,会产生冗余文段,导致模型不易捕捉到真正有用的信息。另外,由于不同小说的引语数量和写作风格差异巨大,仅靠少量的标注样本无法使模型充分泛化,同时数据集的标注比较昂贵。为了解决上述问题,提出一个融合叙事单元和可靠标签的小说说话人识别框架。首先,使用基于叙事单元的上下文选择(NUCS)方法选择合适长度的上下文,从而让模型高度聚焦与引语归因最密切的文段;其次,构建一个说话人评分网络(SSN),并把生成的上下文作为输入;此外,引入自训练,并设计一个可靠伪标签选择(RPLS)算法,从而在一定程度上弥补标签样本过少的不足,筛选出更可靠且质量更高的伪标签样本;最后,构建并标注一个包含11本中文小说的中文小说说话人识别语料库(CNSI)。为评价所提框架,在2个公开数据集和自建数据集上进行实验,结果表明,融合叙事单元和可靠标签的小说说话人识别框架优于CSN(Candidate Scoring Network)、E2E_SI和ChatGPT-3.5等方法。

关键词: 说话人识别, 自训练, 伪标签, 预训练, 上下文, 小说

Abstract:

Speaker Identification (SI) in novels aims to determine the speaker of a quotation by its context. This task is of great help in assigning appropriate voices to different characters in the production of audiobooks. However, the existing methods mainly use fixed window values in the selection of the context of quotations, which is not flexible enough and may produce redundant segments, making it difficult for the model to capture useful information. Besides, due to the significant differences in the number of quotations and writing styles in different novels, a small number of labeled samples cannot enable the model to fully generalize, and the labeling of datasets is expensive. To solve the above problems, a novel speaker identification framework that integrates narrative units and reliable labels was proposed. Firstly, a Narrative Unit-based Context Selection (NUCS) method was used to select a suitable length of context for the model to focus highly on the segment closest to the quotation attribution. Secondly, a Speaker Scoring Network (SSN) was constructed with the generated context as input. In addition, the self-training was introduced, and a Reliable Pseudo Label Selection (RPLS) algorithm was designed to compensate for the lack of labeled samples to some extent and screen out more reliable pseudo-label samples with higher quality. Finally, a Chinese Novel Speaker Identification corpus (CNSI) containing 11 Chinese novels was built and labeled. To evaluate the proposed framework, experiments were conducted on two public datasets and the self-built dataset. The results show that the novel speaker identification framework that integrates narrative units and reliable labels is superior to the methods such as CSN (Candidate Scoring Network), E2E_SI and ChatGPT-3.5.

Key words: Speaker Identification (SI), self-training, pseudo label, pre-training, context, novel

中图分类号: