基于去噪自编码器和长短时记忆网络的语音测谎算法

doi:10.11772/j.issn.1001-9081.2019071183

《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 589-594.DOI: 10.11772/j.issn.1001-9081.2019071183

• 应用前沿、交叉与综合 • 上一篇下一篇

基于去噪自编码器和长短时记忆网络的语音测谎算法

傅洪亮, 雷沛之()

河南工业大学信息科学与工程学院，郑州 450001

收稿日期:2019-07-08 修回日期:2019-09-01 接受日期:2019-09-02 发布日期:2019-09-19 出版日期:2020-02-10
通讯作者: 雷沛之
作者简介:傅洪亮（1965—），男，河南郑州人，教授，博士，主要研究方向：现代信号处理；
基金资助:
国家自然科学基金资助项目(61601170)

Speech deception detection algorithm based on denoising autoencoder and long short-term memory network

Hongliang FU, Peizhi LEI()

School of Information Science and Engineering，Henan University of Technology，Zhengzhou 450001，China

Received:2019-07-08 Revised:2019-09-01 Accepted:2019-09-02 Online:2019-09-19 Published:2020-02-10
Contact: Peizhi LEI
About author:FU Hongliang， born in 1965， Ph. D.， professor. His research interests include modern signal processing.
Supported by:
the National Natural Science Foundation of China(61601170)

摘要/Abstract

摘要：

为进一步提升语音测谎性能，提出了一种基于去噪自编码器（DAE）和长短时记忆（LSTM）网络的语音测谎算法。首先，该算法构建了优化后的DAE和LSTM的并行结构PDL；然后，提取出语音中的人工特征并输入DAE以获取更具鲁棒性的特征，同时，将语音加窗分帧后提取出的Mel谱逐帧输入到LSTM进行帧级深度特征的学习；最后，将这两种特征通过全连接层及批归一化处理后实现融合，使用softmax分类器进行谎言识别。CSC（Columbia-SRI-Colorado）库和自建语料库上的实验结果显示，融合特征分类的识别准确率分别为65.18%和68.04%，相比其他对比算法的识别准确率最高分别提升了5.56%和7.22%，表明所提算法可以有效提高谎言识别精度。

关键词: 去噪自编码器, 长短时记忆网络, 语音特征, 特征融合, 测谎

Abstract:

In order to further improve the performance of speech deception detection， a speech deception detection algorithm based on Denoising AutoEncoder （DAE） and Long Short-Term Memory （LSTM） network was proposed. Firstly， a parallel structure of DAE and LSTM was constructed， namely PDL （Parallel connection of DAE and LSTM）. Then， artificial features in the speech were extracted and put into the DAE to obtain more robust features. Simultaneously， the Mel spectrums extracted after adding windows to the speech and framing were input into LSTM frame-by-frame for frame-level depth feature learning. Finally， these two types of features were merged by the fully connected layer and the batch normalization， and the softmax classifier was used for the deception recognition. The experimental results on the CSC （Columbia-SRI-Colorado） corpus and the self-built corpus show that the recognition accuracy of the classification with fusion feature is 65.18% and 68.04% respectively， which is up to 5.56% and 7.22% higher than those of other algorithms， indicating that the proposed algorithm can effectively improve the accuracy of deception recognition.

Key words: Denoising AutoEncoder (DAE), Long Short-Term Memory (LSTM) network, speech feature, feature fusion, deception detection

中图分类号:

TP391.41

傅洪亮, 雷沛之. 基于去噪自编码器和长短时记忆网络的语音测谎算法[J]. 计算机应用, 2020, 40(2): 589-594.

Hongliang FU, Peizhi LEI. Speech deception detection algorithm based on denoising autoencoder and long short-term memory network[J]. Journal of Computer Applications, 2020, 40(2): 589-594.

图/表 13

表1 2009年国际语音情感识别挑战赛特征集

Tab. 1 Feature set of 2009 International speech emotion recognition challenge

基本特征集	特征集包含的特征和函数
LLD $(16 × 2)$	均方根能量、基频、过零率、谐波噪声比、梅尔频率倒谱系数1~12
HLSF （12）	标准差、峰度、偏度、均值、最大最小值、相对位置、范围、极限值、斜率、偏量、均方误差

表1 2009年国际语音情感识别挑战赛特征集

Tab. 1 Feature set of 2009 International speech emotion recognition challenge

基本特征集	特征集包含的特征和函数
LLD $(16 × 2)$	均方根能量、基频、过零率、谐波噪声比、梅尔频率倒谱系数1~12
HLSF （12）	标准差、峰度、偏度、均值、最大最小值、相对位置、范围、极限值、斜率、偏量、均方误差

图1 真话和谎话的Mel谱图

Fig. 1 Mel spectrum of truth and deception

图2 去噪自编码器

Fig. 2 Denoising autoencoder

图3 LSTM结构

Fig. 3 Structure of LSTM

图4 本文算法的整体框架

Fig. 4 Overall framework of the proposed algorithm

图5 LSTM提取帧级特征

Fig. 5 Extracting frame-level features with LSTM

表2 游戏玩家人数

Tab. 2 Number of players in games

游戏名称	男性	女性	合计
狼人游戏	23	16	39
杀手游戏	40	24	64

表3 模型参数

Tab. 3 Parameters of model

网络	层名	神经单元数
DAE	输入	384
	编码_1	512
	编码_2	1 024
	解码_1	512
	解码_2	384
LSTM	输入	64
	隐层	1 024
	平均	1 024
总输出		2 048
全连接层		1 024

表4 不同模型的识别精度 (%)

Tab. 4 Recognition accuracy of different models

数据库	模型	WA	UA
CSC	PDL-DAE	62.22	58.46
	PDL-LSTM	63.51	59.98
	PDL	65.18	62.56
Killer	PDL-DAE	62.88	59.74
	PDL-LSTM	64.94	62.11
	PDL	68.04	65.35

图6 不同语料库上的收敛曲线

Fig. 6 Convergence curves on different corpora

表5 测试结果的T检验

Tab. 5 T-test of test results

模型	数据库
模型	CSC	Killer
（PDL， DAE）	$< 0.001$	$< 0.001$
（PDL， LSTM）	$< 0.001$	$< 0.001$

表5 测试结果的T检验

Tab. 5 T-test of test results

模型	数据库
模型	CSC	Killer
（PDL， DAE）	$< 0.001$	$< 0.001$
（PDL， LSTM）	$< 0.001$	$< 0.001$

表6 是否利用DAE得到的不同识别精度 ( %)

Tab. 6 Different recognition accuracies whether to using DAE

数据库	处理方法	WA	UA
CSC	直接融合	63.89	60.08
CSC	本文算法	65.18	62.56
Killer	直接融合	65.97	62.46
Killer	本文算法	68.04	65.35

表7 不同测谎方法的识别精度与单条语音识别时间对比

Tab. 7 Comaprison of recognition accuracy and recognition time of single speech by different deception detection methods

数据库	测谎方法	识别精度/%		单条语音识别时间/s
数据库	测谎方法	WA	UA	单条语音识别时间/s
CSC	SVM	59.62	53.20	0.629×10^-3
	DNN	60.79	57.08	0.351×10^-3
	SAE	62.27	57.86	0.370×10^-3
	DBN-ELM	62.58	59.21	0.443×10^-3
	CNN	63.13	60.03	0.237×10^-2
	本文算法	65.18	62.56	0.785×10^-2
Killer	SVM	60.82	55.68	0.268×10^-2
	DNN	61.45	58.13	0.164×10^-2
	SAE	61.89	59.25	0.135×10^-2
	DBN-ELM	63.40	61.03	0.144×10^-2
	CNN	64.02	61.56.	0.690×10^-2
	本文算法	68.04	65.35	0.197×10^-1

参考文献 20

1	KIRCHHUEBEL C. The acoustic and temporal characteristics of deceptive speech［D］. York， North Yorkshire： University of York， 2013： 37. 10.1016/j.apergo.2012.04.016
2	ANAGNOSTOPOULOS C N， ILIOU T， GIANNOUKOS I. Features and classifiers for emotion recognition from speech： a survey from 2000 to 2011［J］. Artificial Intelligence Review， 2015， 43（2）： 155-177. 10.1007/s10462-012-9368-5
3	EKMAN P， O'SULLIVAN M， FRIESEN W V， et al. Invited article： face， voice， and body in detecting deceit［J］. Journal of Nonverbal Behavior， 1991， 15（2）： 125-135. 10.1007/bf00998267
4	HANSEN J H L， WOMACK B D. Feature analysis and neural network-based classification of speech under stress［J］. IEEE Transactions on Speech and Audio Processing， 1996， 4（4）： 307-313. 10.1109/89.506935
5	ZHOU Y， ZHAO H， PAN X. Lie detection from speech analysis based on K-SVD deep belief network model［C］// Proceedings of the 2015 International Conference on Intelligent Computing， LNCS9225. Cham： Springer， 2015： 189-196.
6	SRIVASTAVA N， DUBEY S. Deception detection using artificial neural network and support vector machine［C］// Proceedings of the 2nd International Conference on Electronics， Communication and Aerospace Technology. Piscataway： IEEE， 2018： 1205-1208. 10.1109/iceca.2018.8474706
7	SCHULLER B， STEIDL S， BATLINER A. The INTERSPEECH 2009 emotion challenge［C］// Proceedings of the 10th Annual Conference of the International Speech Communication Association. ［S.l.］： ISCA， 2009： 312-315. 10.21437/interspeech.2009-103
8	EYBEN F， WENINGER F， GROSS F， et al. Recent developments in openSMILE， the Munich open-source multimedia feature extractor［C］// Proceedings of the 21st ACM International Conference on Multimedia. New York： ACM， 2013： 835-838. 10.1145/2502081.2502224
9	贾文娟，张煜东. 自编码器理论与方法综述［J］. 计算机系统应用， 2018， 275）：1-9 （JIA W J， ZHANG Y D. Survey on theories and methods of autoencoder［J］. Computer Systems and Applications， 2018， 27（5）： 1-9.
10	崔建峰，邓泽平，申飞，等. 基于非负矩阵分解和长短时记忆网络的单通道语音分离［J］. 科学技术与工程， 2019， 19（12）：206-210. 10.3969/j.issn.1671-1815.2019.12.029
	CUI J F， DENG Z P， SHEN F， et al. Single channel speech separation based on non-negative matrix factorization and long short-term memory network［J］. Science Technology and Engineering， 2019， 19（12）： 206-210. 10.3969/j.issn.1671-1815.2019.12.029
11	PORIA S， PENG H， HUSSAIN A， et al. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis［J］. Neurocomputing， 2017， 261： 217-230. 10.1016/j.neucom.2016.09.117
12	CHEN S， JIN Q. Multi-modal conditional attention fusion for dimensional emotion prediction［C］// Proceedings of the 24th ACM International Conference on Multimedia. New York： ACM， 2016： 571-575. 10.1145/2964284.2967286
13	DENG J， XU X， ZHANG Z， et al. Semi-supervised autoencoders for speech emotion recognition［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（1）： 31-43. 10.1109/taslp.2017.2759338
14	YANG Z， WANG C， ZHANG Z， et al. Random Barzilai–Borwein step size for mini-batch algorithms［J］. Engineering Applications of Artificial Intelligence， 2018， 72： 124-135. 10.1016/j.engappai.2018.03.017
15	ENOS F， BENUS S， CAUTIN R L， et al. Personality factors in human deception detection： comparing human to machine performance［C］// Proceedings of the 9th International Conference on Spoken Language Processing. ［S.l.］： ISCA， 2006： 813-816. 10.21437/interspeech.2006-278
16	HUNG H， CHITTARANJAN G. The IDIAP wolf corpus： exploring group behaviour in a competitive role-playing game［C］// Proceedings of the 18th ACM International Conference on Multimedia. New York： ACM， 2010： 879-882. 10.1145/1873951.1874102
17	VEXLER A， YU J. To t-test or not to t-test？： a P-values-based point of view in the ROC curve framework［J］. Journal of Computational Biology， 2018， 25（6）：541-550. 10.1089/cmb.2017.0216
18	VINCENT P， LAROCHELLE H， LAJOIE I， et al. Stacked denoising autoencoders： learning useful representations in a deep network with a local denoising criterion［J］. Journal of Machine Learning Research， 2010， 11： 3371-3408.
19	GUO L， WANG L， DANG J， et al. A feature fusion method based on extreme learning machine for speech emotion recognition［C］// Proceedings of the 2018 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2018： 2666-2670. 10.1109/icassp.2018.8462219
20	YOO H J. Deep convolution neural networks in computer vision： a review［J］. IEIE Transactions on Smart Processing and Computing， 2015， 4（1）：35-43. 10.5573/ieiespc.2015.4.1.035

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[3]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[4]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[5]	韩贵金, 张馨渊, 张文涛, 黄娅. 基于多特征融合的自监督图像配准算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1597-1604.
[6]	李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.
[7]	李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444.
[8]	贾宗泽, 高鹏飞, 马应龙, 刘晓峰, 夏海鑫. 基于注意力机制的多特征融合对话行为层次化分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 715-721.
[9]	蒋占军, 吴佰靖, 马龙, 廉敬. 多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 938-944.
[10]	吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.
[11]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.
[12]	李新叶, 侯晔凝, 孔英会, 燕志旗. 结合特征融合与增强注意力的少样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 745-751.
[13]	黄巧玲, 郑伯川, 丁梓成, 吴泽东. 融合监督注意力模块和跨阶段特征融合的图像修复改进网络[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 572-579.
[14]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.
[15]	刘文博, 于连飞, 谢冬梅, 蔡闯, 曲志坚, 任崇广. 基于多尺度特征融合的时间序列长期预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3435-3441.

基于去噪自编码器和长短时记忆网络的语音测谎算法

Speech deception detection algorithm based on denoising autoencoder and long short-term memory network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 20

相关文章 15

编辑推荐

Metrics