Enhancing Speech Recognition in Noisy Environments Through Multiscale-Multichannel Feature Learning and Adaptive Noise Resilience

Karthika, K

doi:http://dx.doi.org/10.1109/DELCON64804.2024.10866923

Enhancing Speech Recognition in Noisy Environments Through Multiscale-Multichannel Feature Learning and Adaptive Noise Resilience

Date Issued

2024

Author(s)

Sengottaiyan, N

Sathishkumar, S

Rajesh Sharma, R

Sungheetha, Akey

Ghantasala, G S Pradeep

Karthika, K

DOI

http://dx.doi.org/10.1109/DELCON64804.2024.10866923

Abstract

Speech Emotion Recognition (SER) has seen much research done recently, but little is being done to minimize the effect of environmental noise on the predictions. Existing SER models primarily aim to learn the best feature representations of speech from clean datasets while neglecting the pragmatic constraint when it is deployed in adverse acoustic settings. The variations in signal-to-noise ratios, background noises and acoustic interferences make existing SER systems less robust. Here, we focus on a new method that combines adaptive feature learning and noise-resilient techniques using a multiscale-multichannel framework. The objective is to help common SER models generalize better in different acoustic conditions, thus improving the accuracy and robustness of SER in noisy real-life settings. Therefore, these SER models are still not efficient against challenging acoustic conditions and the research area lacks addressing it => thus we utilize state-of-the-art IDCNNs and also explore novel data augmentations to help develop SER systems with great capability for arbitrary acoustical channels that fills the gap of very limited related works till now. © 2024 IEEE.

Subjects

1D Cnn

Data Augmentation

Multiscale-Multichann...

Noise Resilience

Speech Emotion Recogn...