Resumen
This overview paper provides a description of the No-Audio multimodal speech detection task for MediaEval 2020. Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle event, from multimodal data. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system proposed must exploit the natural human movements that accompany speech, captured by cameras and wearable sensors. Task participants are provided with cropped videos of individuals while interacting, captured by an overhead camera, and the tri-axial acceleration of each individual throughout the event, captured with a single badge-like device hung around the neck. This year's edition of the task also focuses on investigating posible reasons for interpersonal differences in the performances obtained.
| Idioma original | Inglés |
|---|---|
| Publicación | CEUR Workshop Proceedings |
| Volumen | 2882 |
| Estado | Publicada - 2020 |
| Evento | Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020 - Virtual, Online Duración: 14 dic 2020 → 15 dic 2020 |
Huella
Profundice en los temas de investigación de 'No-audio multimodal speech detection task at MediaEval 2020'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver