Uncertainty Quantification for Speech-To-Text in Spanish

Daniel Rodriguez-Rivas, Saul Calderon-Ramirez, Martin Solis, Walter Morales-Munoz, J. Esteban Perez-Hidalgo

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Speech-to-text is a task which has been recently boosted its performance in practical applications thanks to the advent of deep neural network architectures. Moreover, recent speech-to-text models such as Whisper, have benefited from extensive pre-training with very large datasets. However, its usage in different target scenarios (in less-represented languages, speech recorded in noisy environments, etc.) might yield unsatisfactory results. Uncertainty quantification in the previously described context is important for the safe usage of speech-totext models. This study evaluates three methods for uncertainty quantification: Monte-Carlo dropout, temperature scaling, and feature density estimation. The analysis is conducted using Spanish audio datasets to assess these methods in the context of a less-represented language. A novel metric, analogous to the expected calibration error, is introduced to measure the correlation between predicted uncertainty and word error rate. We provide a detailed description of the dataset construction and experimental parameters. The findings indicate that Whisper demonstrates strong performance with Monte-Carlo dropout and temperature scaling, while the feature density estimation method shows comparatively lower efficacy. Finally, we propose enhancements to the evaluation procedures to further reduce prediction uncertainty.

Idioma originalInglés
Título de la publicación alojada6th IEEE International Conference on BioInspired Processing, BIP 2024
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798350353709
DOI
EstadoPublicada - 2024
Evento6th IEEE International Conference on BioInspired Processing, BIP 2024 - Liberia, Costa Rica
Duración: 4 dic 20246 dic 2024

Serie de la publicación

Nombre6th IEEE International Conference on BioInspired Processing, BIP 2024

Conferencia

Conferencia6th IEEE International Conference on BioInspired Processing, BIP 2024
País/TerritorioCosta Rica
CiudadLiberia
Período4/12/246/12/24

Huella

Profundice en los temas de investigación de 'Uncertainty Quantification for Speech-To-Text in Spanish'. En conjunto forman una huella única.

Citar esto