Enhancing Speech Recognition in Noisy Environments Through Multiscale-Multichannel Feature Learning and Adaptive Noise Resilience
Date Issued
2024
Author(s)
Sathishkumar, S
Ghantasala, G S Pradeep
Karthika, K
DOI
http://dx.doi.org/10.1109/DELCON64804.2024.10866923
Abstract
Speech Emotion Recognition (SER) has seen much research done recently, but little is being done to minimize the effect of environmental noise on the predictions. Existing SER models primarily aim to learn the best feature representations of speech from clean datasets while neglecting the pragmatic constraint when it is deployed in adverse acoustic settings. The variations in signal-to-noise ratios, background noises and acoustic interferences make existing SER systems less robust. Here, we focus on a new method that combines adaptive feature learning and noise-resilient techniques using a multiscale-multichannel framework. The objective is to help common SER models generalize better in different acoustic conditions, thus improving the accuracy and robustness of SER in noisy real-life settings. Therefore, these SER models are still not efficient against challenging acoustic conditions and the research area lacks addressing it => thus we utilize state-of-the-art IDCNNs and also explore novel data augmentations to help develop SER systems with great capability for arbitrary acoustical channels that fills the gap of very limited related works till now. © 2024 IEEE.
