Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 909-915.DOI: 10.11772/j.issn.1001-9081.2022010047
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Received:
2022-01-17
Revised:
2022-06-08
Accepted:
2022-06-10
Online:
2022-07-11
Published:
2023-03-10
Contact:
Yukun WANG
About author:
ZHANG Qiuyu, born in 1966, research fellow. His research interests include network and information security, intelligent information processing, pattern recognition.
Supported by:
通讯作者:
王煜坤
作者简介:
张秋余(1966—),男,河北辛集人,研究员,主要研究方向:网络与信息安全、智能信息处理、模式识别基金资助:
CLC Number:
Qiuyu ZHANG, Yukun WANG. Speech classification model based on improved Inception network[J]. Journal of Computer Applications, 2023, 43(3): 909-915.
张秋余, 王煜坤. 基于改进Inception网络的语音分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 909-915.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022010047
网络层 | 类别 | 卷积尺寸 | 步长 | 深度 | 激活函数 |
---|---|---|---|---|---|
输入 | 输入层 | — | — | 3 | ReLU |
卷积1 | 卷积层 | 3×3 | 1 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 64 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 32 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 128 | ReLU |
全局池化 | 池化 | 3×3 | 1 | — | — |
输出 | 输出 | — | — | — | Softmax |
Tab. 1 Parameters of each layer of improved network model
网络层 | 类别 | 卷积尺寸 | 步长 | 深度 | 激活函数 |
---|---|---|---|---|---|
输入 | 输入层 | — | — | 3 | ReLU |
卷积1 | 卷积层 | 3×3 | 1 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 16 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 64 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 2 | 32 | ReLU |
改进Inception | 卷积+池化 | 1×1,1×3,3×3 | 1 | 128 | ReLU |
全局池化 | 池化 | 3×3 | 1 | — | — |
输出 | 输出 | — | — | — | Softmax |
参数名 | 参数值 | 效果 |
---|---|---|
rotation_range | 40 | 指定数值,将数据在0至此 数值内随机角度旋转 |
width_shift_range | 0.2 | 水平方向随机平移,平移 最大距离为参数值乘图像宽度 |
height_shift_range | 0.2 | 垂直方向随机平移,平移 最大距离为参数值乘图像高 |
shear_range | 0.2 | 错切交错,让所有点的x(或y)轴 不变,y(或x)轴按参数值比例平移 |
zoom_range | 0.2 | 在长宽两个方向分别进行按 参数值进行缩放操作 |
horizontal_flip | True | 随机对图片执行水平翻转操作 |
fill_mode | nearest | 采用默认方式对平移、缩放、 错切操作之后的数据进行填充 |
Tab. 2 Specific methods of data enhancement
参数名 | 参数值 | 效果 |
---|---|---|
rotation_range | 40 | 指定数值,将数据在0至此 数值内随机角度旋转 |
width_shift_range | 0.2 | 水平方向随机平移,平移 最大距离为参数值乘图像宽度 |
height_shift_range | 0.2 | 垂直方向随机平移,平移 最大距离为参数值乘图像高 |
shear_range | 0.2 | 错切交错,让所有点的x(或y)轴 不变,y(或x)轴按参数值比例平移 |
zoom_range | 0.2 | 在长宽两个方向分别进行按 参数值进行缩放操作 |
horizontal_flip | True | 随机对图片执行水平翻转操作 |
fill_mode | nearest | 采用默认方式对平移、缩放、 错切操作之后的数据进行填充 |
迭代次数 | 准确率/% | 迭代次数 | 准确率/% |
---|---|---|---|
15 | 83.24 | 35 | 89.92 |
20 | 85.00 | 40 | 92.55 |
25 | 88.30 | 45 | 93.03 |
30 | 87.20 | 50 | 93.48 |
Tab. 3 Classification accuracy of different iteration times
迭代次数 | 准确率/% | 迭代次数 | 准确率/% |
---|---|---|---|
15 | 83.24 | 35 | 89.92 |
20 | 85.00 | 40 | 92.55 |
25 | 88.30 | 45 | 93.03 |
30 | 87.20 | 50 | 93.48 |
噪声强度/dB | 分类准确率/% |
---|---|
0 | 93.34 |
20 | 92.83 |
15 | 89.97 |
Tab. 4 Classification accuracy of different noise intensity
噪声强度/dB | 分类准确率/% |
---|---|
0 | 93.34 |
20 | 92.83 |
15 | 89.97 |
模型 | 损失值 | 准确率% | 时间/s |
---|---|---|---|
AlexNet | 1.143 | 65.46 | 5 895.7 |
VGG16 | 0.532 | 86.32 | 186 251.3 |
InceptionV2 | 0.524 | 89.47 | 167 654.8 |
本文模型 | 0.465 | 92.76 | 154 176.4 |
Tab. 5 Comparison of classification performance of different models
模型 | 损失值 | 准确率% | 时间/s |
---|---|---|---|
AlexNet | 1.143 | 65.46 | 5 895.7 |
VGG16 | 0.532 | 86.32 | 186 251.3 |
InceptionV2 | 0.524 | 89.47 | 167 654.8 |
本文模型 | 0.465 | 92.76 | 154 176.4 |
模型 | 特征 | 网络模型构造 | 准确率/% |
---|---|---|---|
EnvNet | Log-Mel谱图 | 2Conv+3Pooling+2FC | 71.00 |
Dilated CNN | Log-Mel谱图 | 2Conv+2Pooling+2FC | 78.00 |
1D CNN | 音频 | 4Conv+2Pooling+3FC | 89.00 |
DS-CNN | 音频+Log-Mel | 5Conv+FC+7Conv+1Pooling+FC | 92.20 |
GoogLeNet | 语谱图+MFCC | 9Inception+3Softmax | 93.00 |
本文模型 | Log-Mel谱图 | 1Conv+4改进Inception+1FC | 93.34 |
Tab. 6 Comparison of accuracy results of different network models
模型 | 特征 | 网络模型构造 | 准确率/% |
---|---|---|---|
EnvNet | Log-Mel谱图 | 2Conv+3Pooling+2FC | 71.00 |
Dilated CNN | Log-Mel谱图 | 2Conv+2Pooling+2FC | 78.00 |
1D CNN | 音频 | 4Conv+2Pooling+3FC | 89.00 |
DS-CNN | 音频+Log-Mel | 5Conv+FC+7Conv+1Pooling+FC | 92.20 |
GoogLeNet | 语谱图+MFCC | 9Inception+3Softmax | 93.00 |
本文模型 | Log-Mel谱图 | 1Conv+4改进Inception+1FC | 93.34 |
1 | MUSHTAQ Z, SU S F, TRAN Q V. Spectral images based environmental sound classification using CNN with meaningful data augmentation[J]. Applied Acoustics, 2021, 172: No.107581. 10.1016/j.apacoust.2020.107581 |
2 | TLEMSANI R, NEGGAZ N. A hybrid evolutionary neural networks training applied to phonetic classification[J]. Algerian Journal of Research and Technology, 2021, 5(1): 1-10. |
3 | 付炜,杨洋. 基于卷积神经网络和随机森林的音频分类方法[J]. 计算机应用, 2018, 38(S2): 58-62. |
FU W, YANG Y. Audio classification method based on convolutional neural network and random forest[J]. Journal of Computer Applications, 2018, 38(S2): 58-62. | |
4 | CHIT Y W, HLAING W E, KHAING M M. Myanmar continuous speech recognition system using convolutional neural network[J]. International Journal of Image, Graphics and Signal Processing, 2021, 13(2): 44-52. 10.5815/ijigsp.2021.02.04 |
5 | BALLESTEROS D M, RODRIGUEZ-ORTEGA Y, RENZA D, et al. Deep4SNet: deep learning for fake speech classification[J]. Expert Systems with Applications, 2021, 184: No.115465. 10.1016/j.eswa.2021.115465 |
6 | 杨立东,张壮壮. 改进卷积神经网络的音频场景分类研究[J]. 现代电子技术, 2021, 44(3): 91-94. |
YANG L D, ZHANG Z Z. Research on acoustic scene classification based on improved convolutional neural network[J]. Modern Electronics Technique, 2021, 44(3): 91-94. | |
7 | TOKOZUME Y, HARADA T. Learning environmental sounds with end-to-end convolutional neural network[C]// Proceedings of the 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2017: 2721-2725. 10.1109/icassp.2017.7952651 |
8 | PONS J, SERRA X. Randomly weighted CNNs for (music) audio classification[C]// Proceeding of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2019: 336-340. 10.1109/icassp.2019.8682912 |
9 | JIN X, WU L, LI X D, et al. ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation[J]. IET Computer Vision, 2019, 13(2): 206-212. 10.1049/iet-cvi.2018.5249 |
10 | MEGHANA A S, SUDHAKAR S, ARUMUGAM G, et al. Age and gender prediction using convolution, ResNet50 and inception ResNetV2[J]. International Journal of Advanced Trends in Computer Science and Engineering, 2020, 9(2): 1328-1334. 10.30534/ijatcse/2020/65922020 |
11 | 熊华煜,余勤,任品,等. 基于机器学习的音频分类[J]. 计算机工程与设计, 2021, 42(1): 156-160. |
XIONG H Y, YU Q, REN P, et al. Audio classification based on machine learning[J]. Computer Engineering and Design, 2021, 42(1): 156-160. | |
12 | PICZAK K J. Environmental sound classification with convolutional neural networks[C]// Proceeding of the IEEE 25th International Workshop on Machine Learning for Signal Processing. Piscataway: IEEE, 2015: 1-6. 10.1109/mlsp.2015.7324337 |
13 | SALAMON J, BELLO J P. Deep convolutional neural networks and data augmentation for environmental sound classification[J]. IEEE Signal Processing Letters, 2017, 24(3): 279-283. 10.1109/lsp.2017.2657381 |
14 | LU L, YANG Y H, JING Y Z, et al. Shallow convolutional neural networks for acoustic scene classification[J]. Wuhan University Journal of Natural Sciences, 2018, 23(2):178-184. 10.1007/s11859-018-1308-z |
15 | PASEDDULA C, GANGASHETTY S V. Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks[J]. Applied Acoustics, 2021, 172: No.107568. 10.1016/j.apacoust.2020.107568 |
16 | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
17 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
18 | KHAN S H, HAYAT M, PORIKLI F. Regularization of deep neural networks with spectral dropout[J]. Neural Networks, 2019, 110: 82-90. 10.1016/j.neunet.2018.09.009 |
19 | SINGARIMBUN R N, NABABAN E B, SITOMPUL O S. Adaptive moment estimation to minimize square error in backpropagation algorithm[C]// Proceedings of the 2019 International Conference of Computer Science and Information Technology. Piscataway: IEEE, 2019: 1-7. 10.1109/icosnikom48755.2019.9111563 |
20 | WANG D, ZHANG X W. THCHS-30: a free Chinese speech corpus[EB/OL]. (2015-12-10) [2021-11-20].. |
21 | BOCK S, GOPPOLD J, WEIβ M. An improvement of the convergence proof of the ADAM-Optimizer[EB/OL]. (2018-04-27) [2021-11-20].. |
22 | ABDOLI S, CARDINAL P, KOERICH A L. End-to-end environmental sound classification using a 1D convolutional neural network[J]. Expert Systems with Applications, 2019, 136: 252-263. 10.1016/j.eswa.2019.06.040 |
23 | CHEN Y, GUO Q, LIANG X Y, et al. Environmental sound classification with dilated convolutions[J]. Applied Acoustics, 2019, 148: 123-132. 10.1016/j.apacoust.2018.12.019 |
24 | LI S B, YAO Y, HU J, et al. An ensemble stacked convolutional neural network model for environmental event sound recognition[J]. Applied Sciences, 2018, 8(7): No.1152. 10.3390/app8071152 |
25 | BODDAPATI V, PETEF A, RASMUSSON J, et al. Classifying environmental sounds using image recognition networks[J]. Procedia Computer Science, 2017, 112: 2048-2056. 10.1016/j.procs.2017.08.250 |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[3] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[4] | Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371. |
[5] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[6] | Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994. |
[7] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[8] | Junfeng SHEN, Xingchen ZHOU, Can TANG. Dual-channel sentiment analysis model based on improved prompt learning method [J]. Journal of Computer Applications, 2024, 44(6): 1796-1806. |
[9] | Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919. |
[10] | Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759. |
[11] | Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785. |
[12] | Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478. |
[13] | Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545. |
[14] | Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120. |
[15] | Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet: MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||