Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 909-915.DOI: 10.11772/j.issn.1001-9081.2022010047
• Multimedia computing and computer simulation • Previous Articles
Received:
2022-01-17
Revised:
2022-06-08
Accepted:
2022-06-10
Online:
2022-07-11
Published:
2023-03-10
Contact:
Yukun WANG
About author:
ZHANG Qiuyu, born in 1966, research fellow. His research interests include network and information security, intelligent information processing, pattern recognition.
Supported by:
通讯作者:
王煜坤
作者简介:
张秋余(1966—),男,河北辛集人,研究员,主要研究方向:网络与信息安全、智能信息处理、模式识别基金资助:
CLC Number:
Qiuyu ZHANG, Yukun WANG. Speech classification model based on improved Inception network[J]. Journal of Computer Applications, 2023, 43(3): 909-915.
张秋余, 王煜坤. 基于改进Inception网络的语音分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 909-915.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022010047
网络层 | 类别 | 卷积尺寸 | 步长 | 深度 | 激活函数 |
---|---|---|---|---|---|
输入 | 输入层 | — | — | 3 | ReLU |
卷积1 | 卷积层 | 3×3 | 1 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 64 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 32 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 128 | ReLU |
全局池化 | 池化 | 3×3 | 1 | — | — |
输出 | 输出 | — | — | — | Softmax |
Tab. 1 Parameters of each layer of improved network model
网络层 | 类别 | 卷积尺寸 | 步长 | 深度 | 激活函数 |
---|---|---|---|---|---|
输入 | 输入层 | — | — | 3 | ReLU |
卷积1 | 卷积层 | 3×3 | 1 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 64 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 32 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 128 | ReLU |
全局池化 | 池化 | 3×3 | 1 | — | — |
输出 | 输出 | — | — | — | Softmax |
参数名 | 参数值 | 效果 |
---|---|---|
rotation_range | 40 | 指定数值,将数据在0至此 数值内随机角度旋转 |
width_shift_range | 0.2 | 水平方向随机平移,平移 最大距离为参数值乘图像宽度 |
height_shift_range | 0.2 | 垂直方向随机平移,平移 最大距离为参数值乘图像高 |
shear_range | 0.2 | 错切交错,让所有点的x(或y)轴 不变,y(或x)轴按参数值比例平移 |
zoom_range | 0.2 | 在长宽两个方向分别进行按 参数值进行缩放操作 |
horizontal_flip | True | 随机对图片执行水平翻转操作 |
fill_mode | nearest | 采用默认方式对平移、缩放、 错切操作之后的数据进行填充 |
Tab. 2 Specific methods of data enhancement
参数名 | 参数值 | 效果 |
---|---|---|
rotation_range | 40 | 指定数值,将数据在0至此 数值内随机角度旋转 |
width_shift_range | 0.2 | 水平方向随机平移,平移 最大距离为参数值乘图像宽度 |
height_shift_range | 0.2 | 垂直方向随机平移,平移 最大距离为参数值乘图像高 |
shear_range | 0.2 | 错切交错,让所有点的x(或y)轴 不变,y(或x)轴按参数值比例平移 |
zoom_range | 0.2 | 在长宽两个方向分别进行按 参数值进行缩放操作 |
horizontal_flip | True | 随机对图片执行水平翻转操作 |
fill_mode | nearest | 采用默认方式对平移、缩放、 错切操作之后的数据进行填充 |
迭代次数 | 准确率/% | 迭代次数 | 准确率/% |
---|---|---|---|
15 | 83.24 | 35 | 89.92 |
20 | 85.00 | 40 | 92.55 |
25 | 88.30 | 45 | 93.03 |
30 | 87.20 | 50 | 93.48 |
Tab. 3 Classification accuracy of different iteration times
迭代次数 | 准确率/% | 迭代次数 | 准确率/% |
---|---|---|---|
15 | 83.24 | 35 | 89.92 |
20 | 85.00 | 40 | 92.55 |
25 | 88.30 | 45 | 93.03 |
30 | 87.20 | 50 | 93.48 |
噪声强度/dB | 分类准确率/% |
---|---|
0 | 93.34 |
20 | 92.83 |
15 | 89.97 |
Tab. 4 Classification accuracy of different noise intensity
噪声强度/dB | 分类准确率/% |
---|---|
0 | 93.34 |
20 | 92.83 |
15 | 89.97 |
模型 | 损失值 | 准确率% | 时间/s |
---|---|---|---|
AlexNet | 1.143 | 65.46 | 5 895.7 |
VGG16 | 0.532 | 86.32 | 186 251.3 |
InceptionV2 | 0.524 | 89.47 | 167 654.8 |
本文模型 | 0.465 | 92.76 | 154 176.4 |
Tab. 5 Comparison of classification performance of different models
模型 | 损失值 | 准确率% | 时间/s |
---|---|---|---|
AlexNet | 1.143 | 65.46 | 5 895.7 |
VGG16 | 0.532 | 86.32 | 186 251.3 |
InceptionV2 | 0.524 | 89.47 | 167 654.8 |
本文模型 | 0.465 | 92.76 | 154 176.4 |
模型 | 特征 | 网络模型构造 | 准确率/% |
---|---|---|---|
EnvNet | Log-Mel谱图 | 2Conv+3Pooling+2FC | 71.00 |
Dilated CNN | Log-Mel谱图 | 2Conv+2Pooling+2FC | 78.00 |
1D CNN | 音频 | 4Conv+2Pooling+3FC | 89.00 |
DS-CNN | 音频+Log-Mel | 5Conv+FC+7Conv+1Pooling+FC | 92.20 |
GoogLeNet | 语谱图+MFCC | 9Inception+3Softmax | 93.00 |
本文模型 | Log-Mel谱图 | 1Conv+4改进Inception+1FC | 93.34 |
Tab. 6 Comparison of accuracy results of different network models
模型 | 特征 | 网络模型构造 | 准确率/% |
---|---|---|---|
EnvNet | Log-Mel谱图 | 2Conv+3Pooling+2FC | 71.00 |
Dilated CNN | Log-Mel谱图 | 2Conv+2Pooling+2FC | 78.00 |
1D CNN | 音频 | 4Conv+2Pooling+3FC | 89.00 |
DS-CNN | 音频+Log-Mel | 5Conv+FC+7Conv+1Pooling+FC | 92.20 |
GoogLeNet | 语谱图+MFCC | 9Inception+3Softmax | 93.00 |
本文模型 | Log-Mel谱图 | 1Conv+4改进Inception+1FC | 93.34 |
1 | MUSHTAQ Z, SU S F, TRAN Q V. Spectral images based environmental sound classification using CNN with meaningful data augmentation[J]. Applied Acoustics, 2021, 172: No.107581. 10.1016/j.apacoust.2020.107581 |
2 | TLEMSANI R, NEGGAZ N. A hybrid evolutionary neural networks training applied to phonetic classification[J]. Algerian Journal of Research and Technology, 2021, 5(1): 1-10. |
3 | 付炜,杨洋. 基于卷积神经网络和随机森林的音频分类方法[J]. 计算机应用, 2018, 38(S2): 58-62. |
FU W, YANG Y. Audio classification method based on convolutional neural network and random forest[J]. Journal of Computer Applications, 2018, 38(S2): 58-62. | |
4 | CHIT Y W, HLAING W E, KHAING M M. Myanmar continuous speech recognition system using convolutional neural network[J]. International Journal of Image, Graphics and Signal Processing, 2021, 13(2): 44-52. 10.5815/ijigsp.2021.02.04 |
5 | BALLESTEROS D M, RODRIGUEZ-ORTEGA Y, RENZA D, et al. Deep4SNet: deep learning for fake speech classification[J]. Expert Systems with Applications, 2021, 184: No.115465. 10.1016/j.eswa.2021.115465 |
6 | 杨立东,张壮壮. 改进卷积神经网络的音频场景分类研究[J]. 现代电子技术, 2021, 44(3): 91-94. |
YANG L D, ZHANG Z Z. Research on acoustic scene classification based on improved convolutional neural network[J]. Modern Electronics Technique, 2021, 44(3): 91-94. | |
7 | TOKOZUME Y, HARADA T. Learning environmental sounds with end-to-end convolutional neural network[C]// Proceedings of the 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2017: 2721-2725. 10.1109/icassp.2017.7952651 |
8 | PONS J, SERRA X. Randomly weighted CNNs for (music) audio classification[C]// Proceeding of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2019: 336-340. 10.1109/icassp.2019.8682912 |
9 | JIN X, WU L, LI X D, et al. ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation[J]. IET Computer Vision, 2019, 13(2): 206-212. 10.1049/iet-cvi.2018.5249 |
10 | MEGHANA A S, SUDHAKAR S, ARUMUGAM G, et al. Age and gender prediction using convolution, ResNet50 and inception ResNetV2[J]. International Journal of Advanced Trends in Computer Science and Engineering, 2020, 9(2): 1328-1334. 10.30534/ijatcse/2020/65922020 |
11 | 熊华煜,余勤,任品,等. 基于机器学习的音频分类[J]. 计算机工程与设计, 2021, 42(1): 156-160. |
XIONG H Y, YU Q, REN P, et al. Audio classification based on machine learning[J]. Computer Engineering and Design, 2021, 42(1): 156-160. | |
12 | PICZAK K J. Environmental sound classification with convolutional neural networks[C]// Proceeding of the IEEE 25th International Workshop on Machine Learning for Signal Processing. Piscataway: IEEE, 2015: 1-6. 10.1109/mlsp.2015.7324337 |
13 | SALAMON J, BELLO J P. Deep convolutional neural networks and data augmentation for environmental sound classification[J]. IEEE Signal Processing Letters, 2017, 24(3): 279-283. 10.1109/lsp.2017.2657381 |
14 | LU L, YANG Y H, JING Y Z, et al. Shallow convolutional neural networks for acoustic scene classification[J]. Wuhan University Journal of Natural Sciences, 2018, 23(2):178-184. 10.1007/s11859-018-1308-z |
15 | PASEDDULA C, GANGASHETTY S V. Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks[J]. Applied Acoustics, 2021, 172: No.107568. 10.1016/j.apacoust.2020.107568 |
16 | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
17 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
18 | KHAN S H, HAYAT M, PORIKLI F. Regularization of deep neural networks with spectral dropout[J]. Neural Networks, 2019, 110: 82-90. 10.1016/j.neunet.2018.09.009 |
19 | SINGARIMBUN R N, NABABAN E B, SITOMPUL O S. Adaptive moment estimation to minimize square error in backpropagation algorithm[C]// Proceedings of the 2019 International Conference of Computer Science and Information Technology. Piscataway: IEEE, 2019: 1-7. 10.1109/icosnikom48755.2019.9111563 |
20 | WANG D, ZHANG X W. THCHS-30: a free Chinese speech corpus[EB/OL]. (2015-12-10) [2021-11-20].. |
21 | BOCK S, GOPPOLD J, WEIβ M. An improvement of the convergence proof of the ADAM-Optimizer[EB/OL]. (2018-04-27) [2021-11-20].. |
22 | ABDOLI S, CARDINAL P, KOERICH A L. End-to-end environmental sound classification using a 1D convolutional neural network[J]. Expert Systems with Applications, 2019, 136: 252-263. 10.1016/j.eswa.2019.06.040 |
23 | CHEN Y, GUO Q, LIANG X Y, et al. Environmental sound classification with dilated convolutions[J]. Applied Acoustics, 2019, 148: 123-132. 10.1016/j.apacoust.2018.12.019 |
24 | LI S B, YAO Y, HU J, et al. An ensemble stacked convolutional neural network model for environmental event sound recognition[J]. Applied Sciences, 2018, 8(7): No.1152. 10.3390/app8071152 |
25 | BODDAPATI V, PETEF A, RASMUSSON J, et al. Classifying environmental sounds using image recognition networks[J]. Procedia Computer Science, 2017, 112: 2048-2056. 10.1016/j.procs.2017.08.250 |
[1] | Ranyan NI, Yi ZHANG. Action recognition method based on video spatio-temporal features [J]. Journal of Computer Applications, 2023, 43(2): 521-528. |
[2] | Ping WANG, Nan CHEN, Lei LU. Fall detection algorithm based on scene prior and attention guidance [J]. Journal of Computer Applications, 2023, 43(2): 529-535. |
[3] | Yang WANG, Hongliang FU, Huawei TAO, Jing YANG, Yue XIE, Li ZHAO. Cross-corpus speech emotion recognition based on decision boundary optimized domain adaptation [J]. Journal of Computer Applications, 2023, 43(2): 374-379. |
[4] | Youxin WANG, Bin CHEN. Print defect detection method based on deep comparison network [J]. Journal of Computer Applications, 2023, 43(1): 250-258. |
[5] | Jianzhuang LIN, Wenzhong YANG, Sixiang TAN, Lexin ZHOU, Danni CHEN. Fusing filter enhancement and reverse attention network for polyp segmentation [J]. Journal of Computer Applications, 2023, 43(1): 265-272. |
[6] | Bin ZOU, Cong ZHANG. Dense crowd detection algorithm based on Faster R-CNN [J]. Journal of Computer Applications, 2023, 43(1): 61-66. |
[7] | Zhijun SHEN, Lina MU, Jing GAO, Yuanhang SHI, Zhiqiang LIU. Review of fine-grained image categorization [J]. Journal of Computer Applications, 2023, 43(1): 51-60. |
[8] | Hongjun HENG, Tianbao XU. Attention sentiment analysis model based on multi-scale convolution and gating mechanism [J]. Journal of Computer Applications, 2022, 42(9): 2674-2679. |
[9] | Hanqing LIU, Xiaodong KANG, Fuqing ZHANG, Xiuyuan ZHAO, Jingyi YANG, Xiaotian WANG, Mengfan LI. Image detection algorithm of cerebral arterial stenosis by improved Libra region-convolutional neural network [J]. Journal of Computer Applications, 2022, 42(9): 2909-2916. |
[10] | Yuefeng LIU, Xiaoyan ZHANG, Wei GUO, Haodong BIAN, Yingjie HE. Remaining useful life prediction method of aero-engine based on optimized hybrid model [J]. Journal of Computer Applications, 2022, 42(9): 2960-2968. |
[11] | Yuhang WANG, Yongxia ZHOU, Liangwu WU. Pooling algorithm based on Gaussian function [J]. Journal of Computer Applications, 2022, 42(9): 2800-2806. |
[12] | Chengxia XU, Qing YAN, Teng LI, Kaichao MIAO. De-raining algorithm based on joint attention mechanism for single image [J]. Journal of Computer Applications, 2022, 42(8): 2578-2585. |
[13] | Xianjie ZHANG, Zhiming ZHANG. Handwritten English text recognition based on convolutional neural network and Transformer [J]. Journal of Computer Applications, 2022, 42(8): 2394-2400. |
[14] | Nanjiang CHENG, Zhenxia YU, Lin CHEN, Hezhe QIAO. Multi-source and multi-label pedestrian attribute recognition based on domain adaptation [J]. Journal of Computer Applications, 2022, 42(8): 2401-2406. |
[15] | Zhenhu LYU, Xinzheng XU, Fangyan ZHANG. Lightweight attention mechanism module based on squeeze and excitation [J]. Journal of Computer Applications, 2022, 42(8): 2353-2360. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||