计算机应用 ›› 2019, Vol. 39 ›› Issue (9): 2744-2748.DOI: 10.11772/j.issn.1001-9081.2019030481

• 虚拟现实与多媒体计算 • 上一篇    下一篇

基于三维卷积神经网络的虫音特征识别方法

万永菁1, 王博玮1, 娄定风2   

  1. 1. 华东理工大学 信息科学与工程学院, 上海 200237;
    2. 深圳海关, 广东 深圳 518045
  • 收稿日期:2019-03-22 修回日期:2019-05-24 发布日期:2019-04-29 出版日期:2019-09-10
  • 通讯作者: 王博玮
  • 作者简介:万永菁(1975-),女,江西南昌人,副教授,博士,主要研究方向:智能信息处理、图像处理、模式识别、音频信号处理;王博玮(1997-),男,湖南益阳人,主要研究方向:信号处理、数据挖掘、机器学习;娄定风(1960-),男,江西南昌人,研究员,主要研究方向:昆虫声学。
  • 基金资助:

    国家自然科学基金资助项目(61872143);国家大学生创新创业训练计划项目(201810251064)。

Insect sound feature recognition method based on three-dimensional convolutional neural network

WAN Yongjing<sup>1</sup>, WANG Bowei<sup>1</sup>, LOU Dingfeng<sup>2</sup>   

  1. 1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
    2. Shenzhen Customs, Shenzhen Guangdong 518045, China
  • Received:2019-03-22 Revised:2019-05-24 Online:2019-04-29 Published:2019-09-10
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (61872143), the National Undergraduate Innovation and Entrepreneurship Training Program of China (201810251064).

摘要:

进口木材蛀虫检疫是海关的一项重要工作,但其存在着虫声检测算法准确率低、鲁棒性差等问题。针对这些问题,提出了一种基于三维卷积神经网络(3D CNN)的虫音检测方法以实现虫音特征的识别。首先,对原始虫音音频进行交叠分帧预处理,并使用短时傅里叶变换得到虫音音频的语谱图;然后,将语谱图作为3D CNN的输入,使其通过包含三层卷积层的3D CNN以判断音频中是否存在虫音特征。通过设置不同分帧长度下的输入进行网络训练及测试;最后以准确率、F1分数以及ROC曲线作为评估指标进行性能分析。结果表明,在交叠分帧长度取5 s时,训练及测试效果最佳。此时,3D CNN模型在测试集上的准确率达到96.0%,F1分数为0.96,且比二维卷积神经网络(2D CNN)模型准确率提高近18%。说明所提算法能准确地从音频信号中提取虫音特征并完成蛀虫识别任务,为海关检验检疫提供有力保障。

关键词: 三维卷积神经网络, 短时傅里叶变换, 语谱图, 虫音识别, 声学信号处理

Abstract:

The quarantine of imported wood is an important task for the customs, but there are problems such as low accuracy and poor robustness in the insect sound detection algorithm. To solve these problems, an insect sound detection method based on Three-Dimensional Convolutional Neural Network (3D CNN) was proposed to detect the presence of insect sound features. Firstly, the original insect audio was framed and pre-processed, and Short-Time Fourier Transform (STFT) was operated to obtain the spectrogram of the insect audio. Then, the spectrogram was used as the input of the 3D CNN consisting three convolutional layers. Network training and testing were conducted by setting inputs with different framing lengths. Finally, the analysis of performance was carried out using metrics like accuracy, F1 score and ROC curve. The experiments showed that the test results were best when the overlap framing length was 5 seconds. The best result of the 3D CNN model on the test set achieved an accuracy of 96.0% and an F1 score of 0.96. The accuracy was increased by nearly 18% compared with that of the two-dimensional convolutional neural network (2D CNN) model. It shows that the proposed model can extract the insect sound features from the audio signal more accurately and complete the insect identification task, which provides an engineering solution for customs inspection and quarantine.

Key words: Three-Dimensional Convolutional Neural Network (3D CNN), Short-Time Fourier Transform (STFT), spectrogram, insect sound detection, acoustic signal processing

中图分类号: