Novel speaker identification framework based on narrative unit and reliable label

doi:10.11772/j.issn.1001-9081.2024030331

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1190-1198.DOI: 10.11772/j.issn.1001-9081.2024030331

• Artificial intelligence • Previous Articles Next Articles

Novel speaker identification framework based on narrative unit and reliable label

Tianyu LIU, Ye TAO(), Chaofeng LU, Jiawang LIU

College of Information Science and Technology，Qingdao University of Science and Technology，Qingdao Shandong 266061，China

Received:2024-03-25 Revised:2024-04-29 Accepted:2024-05-06 Online:2024-06-04 Published:2025-04-10
Contact: Ye TAO
About author:LIU Tianyu， born in 1999， M. S. candidate. Her research interests include natural language processing， speaker identification.
LU Chaofeng， born in 1999， M. S. candidate. His research interests include speech synthesis， natural language processing.
LIU Jiawang， born in 1999， M. S. candidate. His research interests include speech synthesis， natural language processing， question answering system， knowledge graph.
Supported by:
National Key Research and Development Program of China(2023YFF0612102);Key Technology Research and Industrialization Demonstration Project in Qingdao(24-1-2-qljh-19-gx)

融合叙事单元和可靠标签的小说说话人识别框架

刘天宇, 陶冶(), 鲁超峰, 刘家旺

青岛科技大学信息科学技术学院，山东青岛 266061

通讯作者: 陶冶
作者简介:刘天宇（1999—），女，山东济宁人，硕士研究生，主要研究方向：自然语言处理、说话人识别
鲁超峰（1999—），男，山东菏泽人，硕士研究生，主要研究方向：语音合成、自然语言处理
刘家旺（1999—），男，山东济宁人，硕士研究生，主要研究方向：语音合成、自然语言处理、问答系统、知识图谱。
基金资助:
国家重点研发计划项目(2023YFF0612100);青岛市关键技术攻关及产业化示范项目(24?1?2?qljh?19?gx)

Abstract

Abstract:

Speaker Identification （SI） in novels aims to determine the speaker of a quotation by its context. This task is of great help in assigning appropriate voices to different characters in the production of audiobooks. However， the existing methods mainly use fixed window values in the selection of the context of quotations， which is not flexible enough and may produce redundant segments， making it difficult for the model to capture useful information. Besides， due to the significant differences in the number of quotations and writing styles in different novels， a small number of labeled samples cannot enable the model to fully generalize， and the labeling of datasets is expensive. To solve the above problems， a novel speaker identification framework that integrates narrative units and reliable labels was proposed. Firstly， a Narrative Unit-based Context Selection （NUCS） method was used to select a suitable length of context for the model to focus highly on the segment closest to the quotation attribution. Secondly， a Speaker Scoring Network （SSN） was constructed with the generated context as input. In addition， the self-training was introduced， and a Reliable Pseudo Label Selection （RPLS） algorithm was designed to compensate for the lack of labeled samples to some extent and screen out more reliable pseudo-label samples with higher quality. Finally， a Chinese Novel Speaker Identification corpus （CNSI） containing 11 Chinese novels was built and labeled. To evaluate the proposed framework， experiments were conducted on two public datasets and the self-built dataset. The results show that the novel speaker identification framework that integrates narrative units and reliable labels is superior to the methods such as CSN （Candidate Scoring Network）， E2E_SI and ChatGPT-3.5.

Key words: Speaker Identification (SI), self-training, pseudo label, pre-training, context, novel

摘要：

小说中的说话人识别（SI）旨在通过引语所在上下文判断它的说话人。这项任务对在制作有声书的过程中为不同的角色分配合适的声音有很大帮助。然而，现有方法对引语上下文的选择主要以固定窗口值为主，这种方式不够灵活，会产生冗余文段，导致模型不易捕捉到真正有用的信息。另外，由于不同小说的引语数量和写作风格差异巨大，仅靠少量的标注样本无法使模型充分泛化，同时数据集的标注比较昂贵。为了解决上述问题，提出一个融合叙事单元和可靠标签的小说说话人识别框架。首先，使用基于叙事单元的上下文选择（NUCS）方法选择合适长度的上下文，从而让模型高度聚焦与引语归因最密切的文段；其次，构建一个说话人评分网络（SSN），并把生成的上下文作为输入；此外，引入自训练，并设计一个可靠伪标签选择（RPLS）算法，从而在一定程度上弥补标签样本过少的不足，筛选出更可靠且质量更高的伪标签样本；最后，构建并标注一个包含11本中文小说的中文小说说话人识别语料库（CNSI）。为评价所提框架，在2个公开数据集和自建数据集上进行实验，结果表明，融合叙事单元和可靠标签的小说说话人识别框架优于CSN（Candidate Scoring Network）、E2E_SI和ChatGPT-3.5等方法。

关键词: 说话人识别, 自训练, 伪标签, 预训练, 上下文, 小说

CLC Number:

TP391.1

Tianyu LIU, Ye TAO, Chaofeng LU, Jiawang LIU. Novel speaker identification framework based on narrative unit and reliable label[J]. Journal of Computer Applications, 2025, 45(4): 1190-1198.

刘天宇, 陶冶, 鲁超峰, 刘家旺. 融合叙事单元和可靠标签的小说说话人识别框架[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1190-1198.

Figures/Tables 11

References 45

1	ZHANG J Y， BLACK A W， SPROAT R. Identifying speakers in children’s stories for speech synthesis ［C］// Proceedings of the 8th European Conference on Speech Communication and Technology. ［S.l.］： ISCA， 2003： 2041-2044.
2	GREENE E， MISHRA T， HAFFNER P， et al. Predicting character-appropriate voices for a TTS-based storyteller system ［C］// Proceedings of the INTERSPEECH 2012. ［S.l.］： ISCA， 2012： 2210-2213.
3	PAN J， WU L， YIN X， et al. A chapter-wise understanding system for text-to-speech in Chinese novels ［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 6069-6073.
4	陈田，蔡从虎，袁晓辉，等. 基于多尺度卷积和自注意力特征融合的多模态情感识别方法［J］. 计算机应用， 2024， 44（2）：369-376.
	CHEN T， CAI C H， YUAN X H， et al. Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion ［J］. Journal of Computer Applications， 2024， 44（2）： 369-376.
5	POPOV V， VOVK I， GOGORYAN V， et al. Grad-TTS： a diffusion probabilistic model for text-to-speech ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 8599-8608.
6	YU D， SUN K， CARDIE C， et al. Dialogue-based relation extraction ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2022： 4927-4940.
7	JIANG Y， XU Y， ZHAN Y， et al. The CRECIL corpus： a new dataset for extraction of relations between characters in Chinese multi-party dialogues ［C］// Proceedings of the 13th Language Resources and Evaluation Conference. Paris： European Language Resources Association， 2022： 2337-2344.
8	ELSNER M. Character-based kernels for novelistic plot structure［C］// Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg： ACL， 2012： 634-644.
9	CHEN R H G， CHEN C C， CHEN C M. Unsupervised cluster analyses of character networks in fiction： community structure and centrality ［J］. Knowledge-Based Systems， 2019， 163： 800-810.
10	BINGENHEIMER M， HUNG J J， WILES S. Social network visualization from TEI data ［J］. Literary and Linguistic Computing， 2011， 26（3）： 271-278.
11	RYDBERG C J. Social networks and the language of Greek tragedy［J］. Journal of the Chicago Colloquium on Digital Humanities and Computer Science， 2011， 1（3）： 1-11.
12	AGARWAL A， CORVALAN A， JENSEN J， et al. Social network analysis of Alice in Wonderland ［C］// Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature. Stroudsburg： ACL， 2012： 88-96.
13	JUNG J J， YOU E， PARK S B. Emotion-based character clustering for managing story-based contents： a cinemetric analysis［J］. Multimedia Tools and Applications， 2013， 65： 29-45.
14	LI J， ZHANG C， TAN H， et al. Complex networks of characters in fictional novels ［C］// Proceedings of the IEEE/ACIS 18th International Conference on the Computer and Information Science. Piscataway： IEEE， 2019： 417-420.
15	GLASS K， BANGAY S. A naïve salience-based method for speaker identification in fiction books ［EB/OL］. ［2023-12-13］..
16	SARMENTO L， NUNES S. Automatic extraction of quotes and topics from news feeds ［EB/OL］. ［2023-12-05］. .
17	PARK T， KIM S H. Novel character identification utilizing semantic relation with animate nouns in Korean［J］. ACM Transactions on Asian and Low-Resource Language Information Processing， 2018， 17（4）： 1-17.
18	ELSON D K， McKEOWN K R. Automatic attribution of quoted speech in literary narrative ［C］// Proceedings of the 24th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2010： 1013-1019.
19	HE H， BARBOSA D， KONDRAK G. Identification of speakers in novels ［C］// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2013： 1312-1320.
20	MUZNY G， FANG M， CHANG A， et al. A two-stage sieve approach for quote attribution ［C］// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics： Volume 1， Long Papers. Stroudsburg： ACL， 2017： 460-470.
21	SHAHIN I. Identifying speakers using their emotion cues ［J］. International Journal of Speech Technology， 2011， 14（2）： 89-98.
22	O’KEEFE T， PARETI S， CURRAN J R， et al. A sequence labelling approach to quote attribution ［C］// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg： ACL， 2012： 790-799.
23	IOSIF E， MISHRA T. From speaker identification to affective analysis： a multi-step system for analyzing children’s stories ［C］// Proceedings of the 3rd Workshop on Computational Linguistics for Literature. Stroudsburg： ACL， 2014： 40-49.
24	YEUNG C Y， LEE J. Identifying speakers and listeners of quoted speech in literary works ［C］// Proceedings of the 8th Conference of the International Joint Conference on Natural Language Processing （Volume 2： Short Papers）. Stroudsburg： ACL， 2017： 325-329.
25	CHEN J X， LING Z H， DAI L R. A Chinese dataset for identifying speakers in novels ［C］// Proceedings of the INTERSPEECH 2019. ［S.l.］： ISCA， 2019： 1561-1565.
26	JIA Y， DOU H， CAO S， et al. Speaker identification and its application to social network construction for Chinese novels ［C］// Proceedings of the 2020 International Conference on Asian Language Processing. Piscataway： IEEE， 2020： 13-18.
27	CHAGANTY A， MUZNY G. Quote attribution for literary text with neural networks ［EB/OL］. ［2023-10-13］. .
28	CHEN Y， LING Z H， LIU Q F. A neural-network-based approach to identifying speakers in novels ［C］// Proceedings of the INTERSPEECH 2021. ［S.l.］： ISCA， 2021： 4114-4118.
29	YU D， ZHOU B， YU D. End-to-end Chinese speaker identification ［C］// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2022： 2274-2285.
30	ZHANG Y， LIU Y. DirectQuote： a dataset for direct quotation extraction and attribution in news articles ［C］// Proceedings of the 13th Language Resources and Evaluation Conference. Paris： European Language Resources Association， 2022： 6959-6966.
31	ZHOU B， YU D， YU D， et al. Cross-lingual speaker identification using distant supervision ［EB/OL］. ［2023-03-07］..
32	SU Z， XU L， XU J， et al. SIG： speaker identification in literature via prompt-based generation ［C］// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2024： 19035-19043.
33	CHEN Y， HE T， ZHOU H， et al. Symbolization， prompt， and classification： a framework for implicit speaker identification in novels ［C］// Findings of the Association for Computational Linguistics： EMNLP 2023. Stroudsburg： ACL， 2023： 3455-3467.
34	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language under-standing ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
35	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach ［EB/OL］. ［2024-03-09］..
36	BROWN T B， MANN B， RYDER N， et al. Language models are few-shot learners ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 1877-1901.
37	PLATT J C. Sequential minimal optimization： a fast algorithm for training support vector machines ［EB/OL］. ［2023-06-14］. .
38	ROSENBLATT F. The perceptron： a probabilistic model for information storage and organization in the brain ［J］. Psychological Review， 1958， 65（6）： 386-408.
39	XU L， HU H， ZHANG X， et al. CLUE： a Chinese language understanding evaluation benchmark ［C］// Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg： ACL， 2020： 4762-4772.
40	ROTH D. Incidental supervision： moving beyond supervised learning ［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2017： 4885-4890.
41	LEE J， YEUNG C Y. An annotated corpus of direct speech ［C］// Proceedings of the 10th International Conference on Language Resources and Evaluation. Paris： European Language Resources Association， 2016： 1059-1063.
42	EK A， WIRÉN M， ÖSTLING R， et al. Identifying speakers and addressees in dialogues extracted from literary fiction ［C］// Proceedings of the 11th International Conference on Language Resources and Evaluation. Paris： European Language Resources Association， 2018： 817-824.
43	PAPAY S， PADÓ S. RiQuA： a corpus of rich quotation annotation for English literary text ［C］// Proceedings of the 12th Language Resources and Evaluation Conference. Paris： European Language Resources Association， 2020： 835-841.
44	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
45	PATEL A， LI B， RASOOLI M S， et al. Bidirectional language models are also few-shot learners ［EB/OL］. ［2023-03-14］..

方法	预训练模型	规则使用情况	WP	JY	CNSI
Random	无	无	37.6	33.7	42.7
Rule	无	纯规则	72.1	86.6	78.4
SVM	无	基于规则的特征	61.0	94.5	—
MLP	无	基于规则的特征	70.5	95.6	—
CSN	BERT-base	后处理	82.5	97.4	89.1
ChatGPT-3.5-turbo	GPT-3.5	无	83.8	97.9	90.7
E2E_SI	RoBERTa-wwm-large	无	80.9	98.3	86.1
SSN	BERT-wwm-ext	无	83.5	98.1	90.9
SSN	BERT-base	无	84.3	98.3	91.3

方法	预训练模型	规则使用情况	WP	JY	CNSI
Random	无	无	37.6	33.7	42.7
Rule	无	纯规则	72.1	86.6	78.4
SVM	无	基于规则的特征	61.0	94.5	—
MLP	无	基于规则的特征	70.5	95.6	—
CSN	BERT-base	后处理	82.5	97.4	89.1
ChatGPT-3.5-turbo	GPT-3.5	无	83.8	97.9	90.7
E2E_SI	RoBERTa-wwm-large	无	80.9	98.3	86.1
SSN	BERT-wwm-ext	无	83.5	98.1	90.9
SSN	BERT-base	无	84.3	98.3	91.3

［m，n］	上下文平均字数	平均候选人数	准确率/%
［-5，+5］	268	3.28	89.8
［-10，+10］	552	4.32	89.2
NUCS	176	2.86	91.3

［m，n］	上下文平均字数	平均候选人数	准确率/%
［-5，+5］	268	3.28	89.8
［-10，+10］	552	4.32	89.2
NUCS	176	2.86	91.3

方法	不同D_L下的准确率/%
方法	D_L=1 000	D_L=2 000	D_L=5 000	D_L=10 000	D_L=20 000
SSN	85.9	86.7	88.3	89.8	91.3
SSN+ST	86.1	87.2	89.1	90.4	91.4
SSN+ST+RPLS	86.5	87.5	89.7	90.8	91.7

Novel speaker identification framework based on narrative unit and reliable label

融合叙事单元和可靠标签的小说说话人识别框架

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 45

Related Articles 15

Recommended Articles

Metrics

[1]	Yuanlong WANG, Tinghua LIU, Hu ZHANG. Commonsense question answering model based on cross-modal contrastive learning [J]. Journal of Computer Applications, 2025, 45(3): 732-738.
[2]	Weichao DANG, Yinghao FAN, Gaimei GAO, Chunxia LIU. Weakly supervised action localization based on temporal and global contextual feature enhancement [J]. Journal of Computer Applications, 2025, 45(3): 963-971.
[3]	Hongye LIU, Xiai CHEN, Tao ZENG. Tri-modal adapter based on selective state space [J]. Journal of Computer Applications, 2025, 45(2): 411-420.
[4]	Yuchen HONG, Jinlong LI. Symbolic music generation with pre-training [J]. Journal of Computer Applications, 2025, 45(2): 578-583.
[5]	Shang LIU, Yuwei ZHOU, Rao DAI, Linfang DONG, Meng LIU. Small target detection algorithm in remote sensing images integrating attention and contextual information [J]. Journal of Computer Applications, 2025, 45(1): 292-300.
[6]	Liang ZHU, Jingzhe MU, Hongqiang ZUO, Jingzhong GU, Fubao ZHU. Location privacy-preserving recommendation scheme based on federated graph neural network [J]. Journal of Computer Applications, 2025, 45(1): 136-143.
[7]	Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731.
[8]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[9]	Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371.
[10]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[11]	Caiqin WANG, Yuhao ZHOU, Shunxiang ZHANG, Yanhui WANG, Xiaolong WANG. Aspect-opinion pair extraction of new energy vehicle complaint text based on context enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2430-2436.
[12]	Hao CHAO, Shuqi FENG, Yongli LIU. Convolutional recurrent neural network optimized by multiple context vectors in EEG-based emotion recognition [J]. Journal of Computer Applications, 2024, 44(7): 2041-2046.
[13]	Hang YU, Yanling ZHOU, Mengxin ZHAI, Han LIU. Text classification based on pre-training model and label fusion [J]. Journal of Computer Applications, 2024, 44(3): 709-714.
[14]	Kaitian WANG, Qing YE, Chunlei CHENG. Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation [J]. Journal of Computer Applications, 2024, 44(2): 411-417.
[15]	Di ZHOU, Zili ZHANG, Jia CHEN, Xinrong HU, Ruhan HE, Jun ZHANG. Stomach cancer image segmentation method based on EfficientNetV2 and object-contextual representation [J]. Journal of Computer Applications, 2023, 43(9): 2955-2962.