Skip to main navigation Skip to search Skip to main content

Estimación de Incertidumbre para modelos de transcripción de audio

  • Calderón Ramírez, Saúl (Institutional academic coordinator)
  • Solís Salazar, Martín (Institutional academic collaborator)
  • Morales-Munoz, Walter (Institutional academic collaborator)
  • Pérez Hidalgo, José Esteban (Institutional academic collaborator)
  • Albornoz, Enrique (External collaborating researcher )
  • Martinez, Cesar (External collaborating researcher )
  • Costa Rica Institute of Technology
  • Universidad Nacional del Litoral

Project: Research Projects Internally fundedBasic and applied research

Project Details

Description

In the current context of technological advances, deep learning models have achieved a significant impact by
driving real-world applications. These applications span a wide range of areas, from computer vision and
natural language processing to finance, robotics, and speech recognition, redefining how we approach
complex tasks. Deep Learning models, with their deep neural networks, autonomously learn from vast
datasets, regardless of the application domain.
However, an essential challenge arises: the reliability and accuracy of predictions generated by these models
in everyday applications. The effectiveness of these predictions is influenced by multiple factors, including the
intrinsic quality of training and evaluation data, as well as the inherent complexity of the model architecture. In
this landscape, the quantitative evaluation of data quality emerges as a pressing need. This need is further
intensified in the case of unstructured data, which prevails in applications such as Automatic Speech
Recognition (ASR), where audio signals, being unstructured data, are transformed into accurate transcriptions
through model inference.
In this context, it is crucial to explore how to rigorously measure data quality, particularly in technical and
computational terms. Additionally, the crucial question arises of how to estimate the inherent reliability loss in
model predictions, whether due to variability in input data acquisition or intrinsic characteristics of the model
architecture. In response to these concerns, this research aims to conduct a comprehensive evaluation of the
performance of the Whisper speech recognition model, developed by OpenAI. This evaluation seeks to
quantify both the influence of the model itself and the data quality on the ASR system's performance.
This analysis is relevant not only due to the need to understand and enhance the reliability of speech
recognition systems in an ever-evolving technological environment but also based on the strategic choice of
Whisper as the subject of study. Whisper, born from a renowned company in the field of artificial intelligence,
represents a recent innovation with industry potential. The application of advanced techniques, such as Monte
Carlo Dropout and entropy evaluation, allows for a deep and nuanced understanding of Whisper's performance
and its adaptation to different acoustic conditions and usage contexts. Thus, it aims to address the essential
question of the reliability and performance of deep learning-based speech recognition systems. Through the
rigorous quantification of data quality and the application of advanced uncertainty assessment techniques.
This project aims to explore and propose one or more novel uncertainty estimation techniques, based on
training data. The analysis of the effectiveness of the proposed technique(s) will be carried out with audios in
Spanish, given the under-representation of this language in the current state of the art in the field.

General Objective

Evaluar el desempeño del sistema de reconocimiento de voz Whisper en la tarea de transcripción de audio, mediante cálculos de incertidumbre epistémica y
métricas de calidad de audios.

Research Lines

Evaluar el desempeño del sistema de reconocimiento de voz Whisper en la tarea de transcripción de audio, mediante cálculos de incertidumbre epistémica y
métricas de calidad de audios.
StatusFinished
Effective start/end date1/01/2431/12/25

Keywords

  • Deep Learning
  • Artificial Intelligence
  • ASR
  • Whisper

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.