Search Result

Select

Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding

Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU

Journal of Computer Applications 2024, 44 (8): 2611-2617. DOI: 10.11772/j.issn.1001-9081.2023081141

Abstract （183）

HTML （3）

PDF （2371KB）（23）

Save

In order to address the issues of insufficient acoustic feature extraction and severe decoding feature loss in single-channel speech enhancement networks based on convolutional encoder-decoder architecture， a single-channel speech enhancement network called Multi-Channel Information Aggregation and Collaborative Decoding （MIACD） was proposed. A dual-channel encoder was utilized to extract the speech magnitude spectrum and complex spectrum features， which were enriched with Self-Supervised Learning （SSL） representations. A four-layer Conformer block was employed to model the extracted features in time and frequency domains. By incorporating residual connections， the speech magnitude and complex features extracted by the dual-channel encoder were introduced into a three-channel information aggregation decoder. Additionally， a Channel-Time-Frequency Attention （CTF-Attention） mechanism was proposed to adjust the aggregated information in the decoder based on the distribution of speech energy， effectively alleviating the problem of severe acoustic information loss during decoding. Experimental results on the publicly available dataset Voice Bank DEMAND demonstrate that， compared to Glance and Gaze： a collaborative learning framework for Single-channel speech enhancement （GaGNet）， the proposed method achieves a 5.1% improvement on the objective metric WB-PESQ （Wide Band Perceptual Evaluation of Speech Quality） and 96.7% on STOI （Short-Time Objective Intelligibility）， validating that the proposed method effectively utilizes speech information for signal reconstruction， noise suppression， and speech intelligibility enhancement.

Table and Figures | Reference | Related Articles | Metrics

Select

Text punctuation restoration for Vietnamese speech recognition with multimodal features

Hua LAI, Tong SUN, Wenjun WANG, Zhengtao YU, Shengxiang GAO, Ling DONG

Journal of Computer Applications 2024, 44 (2): 418-423. DOI: 10.11772/j.issn.1001-9081.2023020231

Abstract （255）

HTML （14）

PDF （3010KB）（313）

Save

The text sequence output by the Vietnamese speech recognition system lacks punctuation， and punctuating the recognized text can help eliminate ambiguity and make it easier to understand. However， the punctuation restoration model based on text modality faces the problem of inaccurate punctuation prediction when dealing with noisy text， as errors in phonemes often occur in Vietnamese speech recognition systems， which can destroy the semantics of the text. A Vietnamese speech recognition text punctuation restoration method that utilizes multi-modal features was proposed， guided by intonation pauses and tone changes in Vietnamese speech to correctly predict punctuation for noisy text. Specifically， Mel-Frequency Cepstral Coefficients （MFCC） were used to extract speech features， pre-trained language models were used to extract text context features， and speech and text features were fused with label attention mechanism to fuse multi-modal features， thereby enhancing the model’s ability to learn contextual information from noisy Vietnamese text. Experimental results show that compared to punctuation restoration models that extract only text features based on Transformer and BERT （Bidirectional Encoder Representations from Transformers）， the proposed method improves the precision， recall， and F1 score on Vietnamese dataset by at least 10 percent points， demonstrating the effectiveness of fusing speech and text features in improving punctuation prediction accuracy for noisy Vietnamese speech recognition text.

Table and Figures | Reference | Related Articles | Metrics

Select

DDoS attack detection by random forest fused with feature selection

Jingcheng XU, Xuebin CHEN, Yanling DONG, Jia YANG

Journal of Computer Applications 2023, 43 (11): 3497-3503. DOI: 10.11772/j.issn.1001-9081.2022111792

Abstract （249）

HTML （6）

PDF （1450KB）（198）

Save

Exsiting machine learning-based methods for Distributed Denial-of-Service （DDoS） attack detection continue to increase in detection difficulty and cost when facing more and more complex network traffic and constantly increased data structures. To address these issues， a random forest DDoS attack detection method that integrates feature selection was proposed. In this method， the mean impurity algorithm based on Gini coefficient was used as the feature selection algorithm to reduce the dimensionality of DDoS abnormal traffic samples， thereby reducing training cost and improving training accuracy. Meanwhile， the feature selection algorithm was embedded into the single base learner of random forest， and the feature subset search range was reduced from all features to the features corresponding to a single base learner， which improved the coupling of the two algorithms and improved the model accuracy. Experimental results show that the model trained by the random forest DDoS attack detection method that integrates feature selection has a recall increased by 21.8 percentage points and an F1-score increased by 12.0 percentage points compared to the model before improvement under the premise of limiting decision tree number and training sample size， and both of them are also better than those of the traditional random forest detection scheme.

Table and Figures | Reference | Related Articles | Metrics