Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 589-594.DOI: 10.11772/j.issn.1001-9081.2019071183
• Frontier & interdisciplinary applications • Previous Articles Next Articles
Received:
2019-07-08
Revised:
2019-09-01
Accepted:
2019-09-02
Online:
2019-09-19
Published:
2020-02-10
Contact:
Peizhi LEI
About author:
FU Hongliang, born in 1965, Ph. D., professor. His research interests include modern signal processing.
Supported by:
通讯作者:
雷沛之
作者简介:
傅洪亮(1965—),男,河南郑州人,教授,博士,主要研究方向:现代信号处理;
基金资助:
CLC Number:
Hongliang FU, Peizhi LEI. Speech deception detection algorithm based on denoising autoencoder and long short-term memory network[J]. Journal of Computer Applications, 2020, 40(2): 589-594.
傅洪亮, 雷沛之. 基于去噪自编码器和长短时记忆网络的语音测谎算法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 589-594.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019071183
基本特征集 | 特征集包含的特征和函数 |
---|---|
LLD | 均方根能量、基频、过零率、谐波噪声比、梅尔频率倒谱系数1~12 |
HLSF (12) | 标准差、峰度、偏度、均值、最大最小值、相对位置、范围、极限值、斜率、偏量、均方误差 |
Tab. 1 Feature set of 2009 International speech emotion recognition challenge
基本特征集 | 特征集包含的特征和函数 |
---|---|
LLD | 均方根能量、基频、过零率、谐波噪声比、梅尔频率倒谱系数1~12 |
HLSF (12) | 标准差、峰度、偏度、均值、最大最小值、相对位置、范围、极限值、斜率、偏量、均方误差 |
游戏名称 | 男性 | 女性 | 合计 |
---|---|---|---|
狼人游戏 | 23 | 16 | 39 |
杀手游戏 | 40 | 24 | 64 |
Tab. 2 Number of players in games
游戏名称 | 男性 | 女性 | 合计 |
---|---|---|---|
狼人游戏 | 23 | 16 | 39 |
杀手游戏 | 40 | 24 | 64 |
网络 | 层名 | 神经单元数 |
---|---|---|
DAE | 输入 | 384 |
编码_1 | 512 | |
编码_2 | 1 024 | |
解码_1 | 512 | |
解码_2 | 384 | |
LSTM | 输入 | 64 |
隐层 | 1 024 | |
平均 | 1 024 | |
总输出 | 2 048 | |
全连接层 | 1 024 |
Tab. 3 Parameters of model
网络 | 层名 | 神经单元数 |
---|---|---|
DAE | 输入 | 384 |
编码_1 | 512 | |
编码_2 | 1 024 | |
解码_1 | 512 | |
解码_2 | 384 | |
LSTM | 输入 | 64 |
隐层 | 1 024 | |
平均 | 1 024 | |
总输出 | 2 048 | |
全连接层 | 1 024 |
数据库 | 模型 | WA | UA |
---|---|---|---|
CSC | PDL-DAE | 62.22 | 58.46 |
PDL-LSTM | 63.51 | 59.98 | |
PDL | 65.18 | 62.56 | |
Killer | PDL-DAE | 62.88 | 59.74 |
PDL-LSTM | 64.94 | 62.11 | |
PDL | 68.04 | 65.35 |
Tab. 4 Recognition accuracy of different models
数据库 | 模型 | WA | UA |
---|---|---|---|
CSC | PDL-DAE | 62.22 | 58.46 |
PDL-LSTM | 63.51 | 59.98 | |
PDL | 65.18 | 62.56 | |
Killer | PDL-DAE | 62.88 | 59.74 |
PDL-LSTM | 64.94 | 62.11 | |
PDL | 68.04 | 65.35 |
模型 | 数据库 | |
---|---|---|
CSC | Killer | |
(PDL, DAE) | ||
(PDL, LSTM) |
Tab. 5 T-test of test results
模型 | 数据库 | |
---|---|---|
CSC | Killer | |
(PDL, DAE) | ||
(PDL, LSTM) |
数据库 | 处理方法 | WA | UA |
---|---|---|---|
CSC | 直接融合 | 63.89 | 60.08 |
本文算法 | 65.18 | 62.56 | |
Killer | 直接融合 | 65.97 | 62.46 |
本文算法 | 68.04 | 65.35 |
Tab. 6 Different recognition accuracies whether to using DAE
数据库 | 处理方法 | WA | UA |
---|---|---|---|
CSC | 直接融合 | 63.89 | 60.08 |
本文算法 | 65.18 | 62.56 | |
Killer | 直接融合 | 65.97 | 62.46 |
本文算法 | 68.04 | 65.35 |
数据库 | 测谎方法 | 识别精度/% | 单条语音识别时间/s | |
---|---|---|---|---|
WA | UA | |||
CSC | SVM | 59.62 | 53.20 | 0.629×10-3 |
DNN | 60.79 | 57.08 | 0.351×10-3 | |
SAE | 62.27 | 57.86 | 0.370×10-3 | |
DBN-ELM | 62.58 | 59.21 | 0.443×10-3 | |
CNN | 63.13 | 60.03 | 0.237×10-2 | |
本文算法 | 65.18 | 62.56 | 0.785×10-2 | |
Killer | SVM | 60.82 | 55.68 | 0.268×10-2 |
DNN | 61.45 | 58.13 | 0.164×10-2 | |
SAE | 61.89 | 59.25 | 0.135×10-2 | |
DBN-ELM | 63.40 | 61.03 | 0.144×10-2 | |
CNN | 64.02 | 61.56. | 0.690×10-2 | |
本文算法 | 68.04 | 65.35 | 0.197×10-1 |
Tab. 7 Comaprison of recognition accuracy and recognition time of single speech by different deception detection methods
数据库 | 测谎方法 | 识别精度/% | 单条语音识别时间/s | |
---|---|---|---|---|
WA | UA | |||
CSC | SVM | 59.62 | 53.20 | 0.629×10-3 |
DNN | 60.79 | 57.08 | 0.351×10-3 | |
SAE | 62.27 | 57.86 | 0.370×10-3 | |
DBN-ELM | 62.58 | 59.21 | 0.443×10-3 | |
CNN | 63.13 | 60.03 | 0.237×10-2 | |
本文算法 | 65.18 | 62.56 | 0.785×10-2 | |
Killer | SVM | 60.82 | 55.68 | 0.268×10-2 |
DNN | 61.45 | 58.13 | 0.164×10-2 | |
SAE | 61.89 | 59.25 | 0.135×10-2 | |
DBN-ELM | 63.40 | 61.03 | 0.144×10-2 | |
CNN | 64.02 | 61.56. | 0.690×10-2 | |
本文算法 | 68.04 | 65.35 | 0.197×10-1 |
1 | KIRCHHUEBEL C. The acoustic and temporal characteristics of deceptive speech[D]. York, North Yorkshire: University of York, 2013: 37. 10.1016/j.apergo.2012.04.016 |
2 | ANAGNOSTOPOULOS C N, ILIOU T, GIANNOUKOS I. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011[J]. Artificial Intelligence Review, 2015, 43(2): 155-177. 10.1007/s10462-012-9368-5 |
3 | EKMAN P, O'SULLIVAN M, FRIESEN W V, et al. Invited article: face, voice, and body in detecting deceit[J]. Journal of Nonverbal Behavior, 1991, 15(2): 125-135. 10.1007/bf00998267 |
4 | HANSEN J H L, WOMACK B D. Feature analysis and neural network-based classification of speech under stress[J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(4): 307-313. 10.1109/89.506935 |
5 | ZHOU Y, ZHAO H, PAN X. Lie detection from speech analysis based on K-SVD deep belief network model[C]// Proceedings of the 2015 International Conference on Intelligent Computing, LNCS9225. Cham: Springer, 2015: 189-196. |
6 | SRIVASTAVA N, DUBEY S. Deception detection using artificial neural network and support vector machine[C]// Proceedings of the 2nd International Conference on Electronics, Communication and Aerospace Technology. Piscataway: IEEE, 2018: 1205-1208. 10.1109/iceca.2018.8474706 |
7 | SCHULLER B, STEIDL S, BATLINER A. The INTERSPEECH 2009 emotion challenge[C]// Proceedings of the 10th Annual Conference of the International Speech Communication Association. [S.l.]: ISCA, 2009: 312-315. 10.21437/interspeech.2009-103 |
8 | EYBEN F, WENINGER F, GROSS F, et al. Recent developments in openSMILE, the Munich open-source multimedia feature extractor[C]// Proceedings of the 21st ACM International Conference on Multimedia. New York: ACM, 2013: 835-838. 10.1145/2502081.2502224 |
9 | 贾文娟,张煜东. 自编码器理论与方法综述[J]. 计算机系统应用, 2018, 275):1-9 (JIA W J, ZHANG Y D. Survey on theories and methods of autoencoder[J]. Computer Systems and Applications, 2018, 27(5): 1-9. |
10 | 崔建峰,邓泽平,申飞,等. 基于非负矩阵分解和长短时记忆网络的单通道语音分离[J]. 科学技术与工程, 2019, 19(12):206-210. 10.3969/j.issn.1671-1815.2019.12.029 |
CUI J F, DENG Z P, SHEN F, et al. Single channel speech separation based on non-negative matrix factorization and long short-term memory network[J]. Science Technology and Engineering, 2019, 19(12): 206-210. 10.3969/j.issn.1671-1815.2019.12.029 | |
11 | PORIA S, PENG H, HUSSAIN A, et al. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis[J]. Neurocomputing, 2017, 261: 217-230. 10.1016/j.neucom.2016.09.117 |
12 | CHEN S, JIN Q. Multi-modal conditional attention fusion for dimensional emotion prediction[C]// Proceedings of the 24th ACM International Conference on Multimedia. New York: ACM, 2016: 571-575. 10.1145/2964284.2967286 |
13 | DENG J, XU X, ZHANG Z, et al. Semi-supervised autoencoders for speech emotion recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(1): 31-43. 10.1109/taslp.2017.2759338 |
14 | YANG Z, WANG C, ZHANG Z, et al. Random Barzilai–Borwein step size for mini-batch algorithms[J]. Engineering Applications of Artificial Intelligence, 2018, 72: 124-135. 10.1016/j.engappai.2018.03.017 |
15 | ENOS F, BENUS S, CAUTIN R L, et al. Personality factors in human deception detection: comparing human to machine performance[C]// Proceedings of the 9th International Conference on Spoken Language Processing. [S.l.]: ISCA, 2006: 813-816. 10.21437/interspeech.2006-278 |
16 | HUNG H, CHITTARANJAN G. The IDIAP wolf corpus: exploring group behaviour in a competitive role-playing game[C]// Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM, 2010: 879-882. 10.1145/1873951.1874102 |
17 | VEXLER A, YU J. To t-test or not to t-test?: a P-values-based point of view in the ROC curve framework[J]. Journal of Computational Biology, 2018, 25(6):541-550. 10.1089/cmb.2017.0216 |
18 | VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11: 3371-3408. |
19 | GUO L, WANG L, DANG J, et al. A feature fusion method based on extreme learning machine for speech emotion recognition[C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 2666-2670. 10.1109/icassp.2018.8462219 |
20 | YOO H J. Deep convolution neural networks in computer vision: a review[J]. IEIE Transactions on Smart Processing and Computing, 2015, 4(1):35-43. 10.5573/ieiespc.2015.4.1.035 |
[1] | LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509. |
[2] | ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614. |
[3] | WANG Wei, ZHAO Erping, CUI Zhiyuan, SUN Hao. Disambiguation method of multi-feature fusion based on HowNet sememe and Word2vec word embedding representation [J]. Journal of Computer Applications, 2021, 41(8): 2193-2198. |
[4] | ZHOU Xianbing, FAN Xiaochao, REN Ge, YANG Yong. Automated English essay scoring method based on multi-level semantic features [J]. Journal of Computer Applications, 2021, 41(8): 2205-2211. |
[5] | DANG Weichao, LI Tao, BAI Shangwang, GAO Gaimei, LIU Chunxia. Real-time remaining life prediction method of Web software system based on self-attention-long short-term memory network [J]. Journal of Computer Applications, 2021, 41(8): 2346-2351. |
[6] | WU Lidan, XUE Yuyang, TONG Tong, DU Min, GAO Qinquan. Image colorization algorithm based on foreground semantic information [J]. Journal of Computer Applications, 2021, 41(7): 2048-2053. |
[7] | DU Yan, LYU Liangfu, JIAO Yichen. Fuzzy prototype network based on fuzzy reasoning [J]. Journal of Computer Applications, 2021, 41(7): 1885-1890. |
[8] | ZHANG Yuanjun, ZHANG Xihuang. Dynamic network representation learning model based on graph convolutional network and long short-term memory network [J]. Journal of Computer Applications, 2021, 41(7): 1857-1864. |
[9] | ZHANG Sun, YIN Chunyong. Sequential multimodal sentiment analysis model based on multi-task learning [J]. Journal of Computer Applications, 2021, 41(6): 1631-1639. |
[10] | LAI Xuemei, TANG Hong, CHEN Hongyu, LI Shanshan. Multimodal sentiment analysis based on feature fusion of attention mechanism-bidirectional gated recurrent unit [J]. Journal of Computer Applications, 2021, 41(5): 1268-1274. |
[11] | LI Wenhui, ZENG Shangyou, WANG Jinjin. Image description generation algorithm based on improved attention mechanism [J]. Journal of Computer Applications, 2021, 41(5): 1262-1267. |
[12] | BIAN Pengcheng, ZHENG Zhonglong, LI Minglu, HE Yiran, WANG Tianxiang, ZHANG Dawei, CHEN Liyuan. Attention fusion network based video super-resolution reconstruction [J]. Journal of Computer Applications, 2021, 41(4): 1012-1019. |
[13] | JIANG Qianyu, WANG Fengying, JIA Lipeng. Malware detection method based on perceptual hash algorithm and feature fusion [J]. Journal of Computer Applications, 2021, 41(3): 780-785. |
[14] | HOU Yunlong, ZHU Lei, CHEN Qin, LYU Suidong. Salient object detection based on difference of Gaussian feature network [J]. Journal of Computer Applications, 2021, 41(3): 706-713. |
[15] | HU Yishan, QIN Pinle, ZENG Jianchao, CHAI Rui, WANG Lifang. Ultrasound thyroid segmentation network based on feature fusion and dynamic multi-scale dilated convolution [J]. Journal of Computer Applications, 2021, 41(3): 891-897. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||