Skip to main navigation Skip to search Skip to main content

Uncertainty Quantification for Speech-To-Text in Spanish

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech-to-text is a task which has been recently boosted its performance in practical applications thanks to the advent of deep neural network architectures. Moreover, recent speech-to-text models such as Whisper, have benefited from extensive pre-training with very large datasets. However, its usage in different target scenarios (in less-represented languages, speech recorded in noisy environments, etc.) might yield unsatisfactory results. Uncertainty quantification in the previously described context is important for the safe usage of speech-totext models. This study evaluates three methods for uncertainty quantification: Monte-Carlo dropout, temperature scaling, and feature density estimation. The analysis is conducted using Spanish audio datasets to assess these methods in the context of a less-represented language. A novel metric, analogous to the expected calibration error, is introduced to measure the correlation between predicted uncertainty and word error rate. We provide a detailed description of the dataset construction and experimental parameters. The findings indicate that Whisper demonstrates strong performance with Monte-Carlo dropout and temperature scaling, while the feature density estimation method shows comparatively lower efficacy. Finally, we propose enhancements to the evaluation procedures to further reduce prediction uncertainty.

Original languageEnglish
Title of host publication6th IEEE International Conference on BioInspired Processing, BIP 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350353709
DOIs
StatePublished - 2024
Event6th IEEE International Conference on BioInspired Processing, BIP 2024 - Liberia, Costa Rica
Duration: 4 Dec 20246 Dec 2024

Publication series

Name6th IEEE International Conference on BioInspired Processing, BIP 2024

Conference

Conference6th IEEE International Conference on BioInspired Processing, BIP 2024
Country/TerritoryCosta Rica
CityLiberia
Period4/12/246/12/24

Keywords

  • BERT
  • Deep Learning
  • Safe Artificial Intelligence
  • Text complex prediction
  • Transformers
  • Uncertainty Estimation

Fingerprint

Dive into the research topics of 'Uncertainty Quantification for Speech-To-Text in Spanish'. Together they form a unique fingerprint.

Cite this