Skip to main navigation Skip to search Skip to main content

No-audio multimodal speech detection task at MediaEval 2020

Research output: Contribution to journalConference articlepeer-review

Abstract

This overview paper provides a description of the No-Audio multimodal speech detection task for MediaEval 2020. Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle event, from multimodal data. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system proposed must exploit the natural human movements that accompany speech, captured by cameras and wearable sensors. Task participants are provided with cropped videos of individuals while interacting, captured by an overhead camera, and the tri-axial acceleration of each individual throughout the event, captured with a single badge-like device hung around the neck. This year's edition of the task also focuses on investigating posible reasons for interpersonal differences in the performances obtained.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2882
StatePublished - 2020
EventMultimedia Evaluation Benchmark Workshop 2020, MediaEval 2020 - Virtual, Online
Duration: 14 Dec 202015 Dec 2020

Fingerprint

Dive into the research topics of 'No-audio multimodal speech detection task at MediaEval 2020'. Together they form a unique fingerprint.

Cite this