Abstract
This overview paper provides a description of the No-Audio multimodal speech detection task for MediaEval 2020. Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle event, from multimodal data. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system proposed must exploit the natural human movements that accompany speech, captured by cameras and wearable sensors. Task participants are provided with cropped videos of individuals while interacting, captured by an overhead camera, and the tri-axial acceleration of each individual throughout the event, captured with a single badge-like device hung around the neck. This year's edition of the task also focuses on investigating posible reasons for interpersonal differences in the performances obtained.
| Original language | English |
|---|---|
| Journal | CEUR Workshop Proceedings |
| Volume | 2882 |
| State | Published - 2020 |
| Event | Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020 - Virtual, Online Duration: 14 Dec 2020 → 15 Dec 2020 |
Fingerprint
Dive into the research topics of 'No-audio multimodal speech detection task at MediaEval 2020'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver