Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 123-128.DOI: 10.11772/j.issn.1001-9081.2023010062
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Yunyun GAO, Lasheng ZHAO, Qiang ZHANG()
Received:
2023-01-30
Revised:
2023-04-01
Accepted:
2023-04-07
Online:
2023-06-06
Published:
2024-01-10
Contact:
Qiang ZHANG
About author:
GAO Yunyun, born in 1997, M. S. candidate. Her research interests include deep learning, spoken term detection.Supported by:
通讯作者:
张强
作者简介:
高芸芸(1997—),女,山东烟台人,硕士研究生,主要研究方向:深度学习、语音关键词检测;基金资助:
CLC Number:
Yunyun GAO, Lasheng ZHAO, Qiang ZHANG. Acoustic word embedding model based on Bi-LSTM and convolutional-Transformer[J]. Journal of Computer Applications, 2024, 44(1): 123-128.
高芸芸, 赵腊生, 张强. 基于双向长短时记忆和卷积Transformer的声学词嵌入模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 123-128.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023010062
序号 | 有FFN | 序号 | 无FFN | ||
---|---|---|---|---|---|
模型 | AP/% | 模型 | AP/% | ||
M1 | Bi-LSTM+ FFN | 88.62 | N1 | Bi-LSTM | 88.92 |
M2 | Transformer | 89.01 | N2 | Transformer | 90.92 |
M3 | 1D Conv + SE | 92.31 | N3 | 1D Conv + SE | 92.88 |
M4 | Swish -> ReLU | 92.52 | N4 | Swish -> ReLU | 93.16 |
M5 | 串联 | 92.55 | N5 | 串联 | 91.43 |
M6 | 本文模型(并联) | 92.68 | N6 | 本文模型(并联) | 93.53 |
Tab. 1 Ablation experiment results
序号 | 有FFN | 序号 | 无FFN | ||
---|---|---|---|---|---|
模型 | AP/% | 模型 | AP/% | ||
M1 | Bi-LSTM+ FFN | 88.62 | N1 | Bi-LSTM | 88.92 |
M2 | Transformer | 89.01 | N2 | Transformer | 90.92 |
M3 | 1D Conv + SE | 92.31 | N3 | 1D Conv + SE | 92.88 |
M4 | Swish -> ReLU | 92.52 | N4 | Swish -> ReLU | 93.16 |
M5 | 串联 | 92.55 | N5 | 串联 | 91.43 |
M6 | 本文模型(并联) | 92.68 | N6 | 本文模型(并联) | 93.53 |
模型名称 | AP/% | PRBEP/% | KL散度 |
---|---|---|---|
LSTM( | 74.15 | 70.48 | 4.658 3 |
Bi-LSTM( | 88.92 | 83.60 | 5.129 6 |
Bi-LSTM+Attention( | 91.52 | 86.09 | 5.637 1 |
Bi-LSTM+Attention( | 92.73 | 87.28 | 6.047 2 |
本文模型( | 93.53 | 87.39 | 6.209 9 |
本文模型( | 94.36 | 88.96 | 6.467 2 |
Tab. 2 Comparative experiment results of different models
模型名称 | AP/% | PRBEP/% | KL散度 |
---|---|---|---|
LSTM( | 74.15 | 70.48 | 4.658 3 |
Bi-LSTM( | 88.92 | 83.60 | 5.129 6 |
Bi-LSTM+Attention( | 91.52 | 86.09 | 5.637 1 |
Bi-LSTM+Attention( | 92.73 | 87.28 | 6.047 2 |
本文模型( | 93.53 | 87.39 | 6.209 9 |
本文模型( | 94.36 | 88.96 | 6.467 2 |
1 | 张卫强,宋贝利,蔡猛,等.基于音素后验概率的样例语音关键词检测方法[J].天津大学学报(自然科学与工程技术版), 2015, 48(9): 757-760. |
ZHANG W Q, SONG B L, CAI M, et al. A query-by-example spoken term detection method based on phonetic posteriorgram [J]. Journal of Tianjin University (Science and Technology), 2015, 48(9): 757-760. | |
2 | HAZEN T J, SHEN W, WHITE C. Query-by-example spoken term detection using phonetic posteriorgram templates [C]// Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2009: 421-426. 10.1109/asru.2009.5372889 |
3 | ZHANG Y, GLASS J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams [C]// Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2009: 398-403. 10.1109/asru.2009.5372931 |
4 | MANTEENA G, ANGUERA X. Speed improvements to information retrieval-based dynamic time warping using hierarchical K-Means clustering [C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2013: 8515-8519. 10.1109/icassp.2013.6639327 |
5 | LEUNG C C, WANG L, XU H, et al. Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: post-evaluation analysis [C]// Proceedings of the INTERSPEECH 2016. [S.l.]: International Speech Communication Association, 2016: 3703-3707. 10.21437/interspeech.2016-691 |
6 | LEVIN K, HENRY K, JANSEN A, et al. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings [C]// Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2013: 410-415. 10.1109/asru.2013.6707765 |
7 | LEVIN K, JANSEN A, VAN DURME B. Segmental acoustic indexing for zero resource keyword search [C]// Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2015: 5828-5832. 10.1109/icassp.2015.7179089 |
8 | SHEN F, DU C, YU K. Acoustic word embeddings for end-to-end speech synthesis [J]. Applied Sciences, 2021, 11(19): No.9010. 10.3390/app11199010 |
9 | SHI B, SETTLE S, LIVESCU K. Whole-word segmental speech recognition with acoustic word embeddings [C]// Proceedings of the 2021 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2021: 164-171. 10.1109/slt48900.2021.9383578 |
10 | KAMPER H. Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models [C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2019: 6535-6539. 10.1109/icassp.2019.8683639 |
11 | KAMPER H, WANG W, LIVESCU K. Deep convolutional acoustic word embeddings using word-pair side information [C]// Proceedings of the 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2016: 4950-4954. 10.1109/icassp.2016.7472619 |
12 | HUANG J, GHARBIEH W, SHIM H S, et al. Query-by-example keyword spotting system using multi-head attention and soft-triple loss [C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2021: 6858-6862. 10.1109/icassp39728.2021.9414156 |
13 | SETTLE S, LIVESCU K. Discriminative acoustic word embeddings: recurrent neural network-based approaches [C]// Proceedings of the 2016 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2016: 503-510. 10.1109/slt.2016.7846310 |
14 | CHEN G, PARADA C, SAINATH T N. Query-by-example keyword spotting using long short-term memory networks [C]// Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2015: 5236-5240. 10.1109/icassp.2015.7178970 |
15 | YUAN Y, LV Z, HUANG S, et al. Verifying deep keyword spotting detection with acoustic word embeddings [C]// Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop. Piscataway: IEEE, 2019: 613-620. 10.1109/asru46091.2019.9003781 |
16 | YUAN Y, XIE L, LEUNG C C, et al. Fast query-by-example speech search using attention-based deep binary embeddings [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1988-2000. 10.1109/taslp.2020.2998277 |
17 | AO C W, LEE H Y. Query-by-example spoken term detection using attention-based multi-hop networks [C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2018: 6264-6268. 10.1109/icassp.2018.8462570 |
18 | ZHANG K, WU Z, JIA J, et al. Query-by-example spoken term detection using attentive pooling networks [C]// Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Piscataway: IEEE, 2019: 1267-1272. 10.1109/apsipaasc47483.2019.9023023 |
19 | RAM D, MICULICICH L, BOURLARD H. CNN based query by example spoken term detection [C]// Proceedings of the INTERSPEECH 2018. [S.l.]: International Speech Communication Association, 2018: 92-96. 10.21437/interspeech.2018-1722 |
20 | RAM D, MICULICICH L, BOURLARD H. Neural network based end-to-end query by example spoken term detection [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1416-1427. 10.1109/taslp.2020.2988788 |
21 | NAIK P, GAONKAR M N, THENKANIDIYOOR V, et al. Kernel based matching and a novel training approach for CNN-based QbE-STD [C]// Proceedings of the 2020 International Conference on Signal Processing and Communications. Piscataway: IEEE, 2020: 1-5. 10.1109/spcom50965.2020.9179588 |
22 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
23 | YUAN Y, LEUNG C C, XIE L, et al. Query-by-example speech search using recurrent neural acoustic word embeddings with temporal context [J]. IEEE Access, 2019, 7: 67656-67665. 10.1109/access.2019.2918638 |
24 | JACOBS C, MATUSEVYCH Y, KAMPER H. Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation [C]// Proceedings of the 2021 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2021: 919-926. 10.1109/slt48900.2021.9383594 |
25 | ZHANG Y, PARK D S, HAN W, et al. BigSSL: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition [J]. IEEE Journal of Selected Topics in Signal Processing, 2022, 16(6): 1519-1532. 10.1109/jstsp.2022.3182537 |
26 | YANG Z, HIRSCHBERG J. Linguistically-informed training of acoustic word embeddings for low-resource languages [C]// Proceedings of the INTERSPEECH 2019. [S.l.]: International Speech Communication Association, 2019: 2678-2682. 10.21437/interspeech.2019-3119 |
27 | SHITOV D, PIROGOVA E, WYSOCKI T A, et al. Learning acoustic word embeddings with dynamic time warping triplet networks [J]. IEEE Access, 2020, 8: 103327-103338. 10.1109/access.2020.2999055 |
28 | LI Z, WU L, LI T, et al. Improves neural acoustic word embeddings query by example spoken term detection with Wav2Vec pretraining and circle loss [C]// Proceedings of the 12th International Symposium on Chinese Spoken Language Processing. Piscataway: IEEE, 2021: 1-5. 10.1109/iscslp49672.2021.9362065 |
[1] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[2] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[3] | Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994. |
[4] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[5] | Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919. |
[6] | Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759. |
[7] | Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545. |
[8] | Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478. |
[9] | Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113. |
[10] | Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120. |
[11] | Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet: MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302. |
[12] | Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708. |
[13] | Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921. |
[14] | Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882. |
[15] | Rui ZHANG, Siqi SONG, Jing HU, Yongmei ZHANG, Yanfeng CHAI. Performance evaluation of industry-university-research based on statistics and adaptive ParNet [J]. Journal of Computer Applications, 2024, 44(2): 628-637. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||