Abstract:Aiming at the problem that the current audio steganography detection methods have got low accuracies in detecting audio steganography based on Syndrome-Trellis Codes (STC), considering the advantages of convolutional neural network in extracting abstract feature, a model for audio steganography detection which combined deep residual network and eXtreme Gradient Boosting (XGBoost) was proposed. Firstly, a fixed-parameter high-pass filter was used to preprocess the input audio, and features were extracted through three convolutional layers. Truncated Linear Unit (TLU) activation function was applied in the first convolutional layer to make the model adapt to the distribution of steganographic signals with low signal-to-noise ratio. Then, abstract features were further extracted by five-stage residual blocks and pooling operations. Finally, the extracted high-dimensional features were classified as inputs to the XGBoost model through fully connected layers and dropout layers. The STC steganography and the Least Significant Bit Matching (LSBM) steganography were detected by the proposed methods respectively. When the embedding rates are 0.5 bit per sample, 0.2 bit per sample and 0.1 bit per sample, that is to say, the average number of bits modified for per audio sample equals to 0.5, 0.2 and 0.1 respectively. The experimental results show that the proposed model achieves average detection accuracies of 73.27%, 70.16% and 65.18% respectively for the STC steganography with a submatrix height of 7, and the average detection accuracies of the LSBM steganography are 86.58%, 76.08% and 72.82% respectively. Compared with the traditional steganography detection methods based on extracting handcrafted features and deep learning steganography detection methods, average detection accuracies of the two steganography algorithms both have increased by more than 10 percent points.