| 1 | 张卫强,宋贝利,蔡猛,等.基于音素后验概率的样例语音关键词检测方法[J].天津大学学报(自然科学与工程技术版), 2015, 48(9): 757-760. | 
																													
																						|  | ZHANG W Q, SONG B L, CAI M, et al. A query-by-example spoken term detection method based on phonetic posteriorgram [J]. Journal of Tianjin University (Science and Technology), 2015, 48(9): 757-760. | 
																													
																						| 2 | HAZEN T J, SHEN W, WHITE C. Query-by-example spoken term detection using phonetic posteriorgram templates [C]// Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2009: 421-426.  10.1109/asru.2009.5372889 | 
																													
																						| 3 | ZHANG Y, GLASS J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams [C]// Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2009: 398-403.  10.1109/asru.2009.5372931 | 
																													
																						| 4 | MANTEENA G, ANGUERA X. Speed improvements to information retrieval-based dynamic time warping using hierarchical K-Means clustering [C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2013: 8515-8519.  10.1109/icassp.2013.6639327 | 
																													
																						| 5 | LEUNG C C, WANG L, XU H, et al. Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: post-evaluation analysis [C]// Proceedings of the INTERSPEECH 2016. [S.l.]: International Speech Communication Association, 2016: 3703-3707.  10.21437/interspeech.2016-691 | 
																													
																						| 6 | LEVIN K, HENRY K, JANSEN A, et al. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings [C]// Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2013: 410-415.  10.1109/asru.2013.6707765 | 
																													
																						| 7 | LEVIN K, JANSEN A, VAN DURME B. Segmental acoustic indexing for zero resource keyword search [C]// Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2015: 5828-5832.  10.1109/icassp.2015.7179089 | 
																													
																						| 8 | SHEN F, DU C, YU K. Acoustic word embeddings for end-to-end speech synthesis [J]. Applied Sciences, 2021, 11(19): No.9010.  10.3390/app11199010 | 
																													
																						| 9 | SHI B, SETTLE S, LIVESCU K. Whole-word segmental speech recognition with acoustic word embeddings [C]// Proceedings of the 2021 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2021: 164-171.  10.1109/slt48900.2021.9383578 | 
																													
																						| 10 | KAMPER H. Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models [C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2019: 6535-6539.  10.1109/icassp.2019.8683639 | 
																													
																						| 11 | KAMPER H, WANG W, LIVESCU K. Deep convolutional acoustic word embeddings using word-pair side information [C]// Proceedings of the 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2016: 4950-4954.  10.1109/icassp.2016.7472619 | 
																													
																						| 12 | HUANG J, GHARBIEH W, SHIM H S, et al. Query-by-example keyword spotting system using multi-head attention and soft-triple loss [C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2021: 6858-6862.  10.1109/icassp39728.2021.9414156 | 
																													
																						| 13 | SETTLE S, LIVESCU K. Discriminative acoustic word embeddings: recurrent neural network-based approaches [C]// Proceedings of the 2016 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2016: 503-510.  10.1109/slt.2016.7846310 | 
																													
																						| 14 | CHEN G, PARADA C, SAINATH T N. Query-by-example keyword spotting using long short-term memory networks [C]// Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2015: 5236-5240.  10.1109/icassp.2015.7178970 | 
																													
																						| 15 | YUAN Y, LV Z, HUANG S, et al. Verifying deep keyword spotting detection with acoustic word embeddings [C]// Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop. Piscataway: IEEE, 2019: 613-620.  10.1109/asru46091.2019.9003781 | 
																													
																						| 16 | YUAN Y, XIE L, LEUNG C C, et al. Fast query-by-example speech search using attention-based deep binary embeddings [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1988-2000.  10.1109/taslp.2020.2998277 | 
																													
																						| 17 | AO C W, LEE H Y. Query-by-example spoken term detection using attention-based multi-hop networks [C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2018: 6264-6268.  10.1109/icassp.2018.8462570 | 
																													
																						| 18 | ZHANG K, WU Z, JIA J, et al. Query-by-example spoken term detection using attentive pooling networks [C]// Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Piscataway: IEEE, 2019: 1267-1272.  10.1109/apsipaasc47483.2019.9023023 | 
																													
																						| 19 | RAM D, MICULICICH L, BOURLARD H. CNN based query by example spoken term detection [C]// Proceedings of the INTERSPEECH 2018. [S.l.]: International Speech Communication Association, 2018: 92-96.  10.21437/interspeech.2018-1722 | 
																													
																						| 20 | RAM D, MICULICICH L, BOURLARD H. Neural network based end-to-end query by example spoken term detection [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1416-1427.  10.1109/taslp.2020.2988788 | 
																													
																						| 21 | NAIK P, GAONKAR M N, THENKANIDIYOOR V, et al. Kernel based matching and a novel training approach for CNN-based QbE-STD [C]// Proceedings of the 2020 International Conference on Signal Processing and Communications. Piscataway: IEEE, 2020: 1-5.  10.1109/spcom50965.2020.9179588 | 
																													
																						| 22 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.  10.1109/cvpr.2018.00745 | 
																													
																						| 23 | YUAN Y, LEUNG C C, XIE L, et al. Query-by-example speech search using recurrent neural acoustic word embeddings with temporal context [J]. IEEE Access, 2019, 7: 67656-67665.  10.1109/access.2019.2918638 | 
																													
																						| 24 | JACOBS C, MATUSEVYCH Y, KAMPER H. Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation [C]// Proceedings of the 2021 IEEE Spoken Language Technology Workshop. Piscataway: IEEE, 2021: 919-926.  10.1109/slt48900.2021.9383594 | 
																													
																						| 25 | ZHANG Y, PARK D S, HAN W, et al. BigSSL: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition [J]. IEEE Journal of Selected Topics in Signal Processing, 2022, 16(6): 1519-1532.  10.1109/jstsp.2022.3182537 | 
																													
																						| 26 | YANG Z, HIRSCHBERG J. Linguistically-informed training of acoustic word embeddings for low-resource languages [C]// Proceedings of the INTERSPEECH 2019. [S.l.]: International Speech Communication Association, 2019: 2678-2682.  10.21437/interspeech.2019-3119 | 
																													
																						| 27 | SHITOV D, PIROGOVA E, WYSOCKI T A, et al. Learning acoustic word embeddings with dynamic time warping triplet networks [J]. IEEE Access, 2020, 8: 103327-103338.  10.1109/access.2020.2999055 | 
																													
																						| 28 | LI Z, WU L, LI T, et al. Improves neural acoustic word embeddings query by example spoken term detection with Wav2Vec pretraining and circle loss [C]// Proceedings of the 12th International Symposium on Chinese Spoken Language Processing. Piscataway: IEEE, 2021: 1-5.  10.1109/iscslp49672.2021.9362065 |