LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech - Multidisciplinary Institute in Artificial intelligence - Grenoble Alpes

Conference Papers Year : 2021

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

(1) , (1, 2) , (1) , (1) , (2) , (1) , (1) , (2) , (1) , (2) , (3) , (2) , (1) , (1) , (1) , (1) , (4, 5, 1) , (6)

1
2
3
4
5
6

Solène Evain

Function : Author
PersonId : 737268
IdHAL : solene-evain
ORCID : 0000-0003-1766-8894

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Ha Nguyen

Function : Author

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Laboratoire Informatique d'Avignon

Hang Le

Function : Author

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Marcely Zanon Boito

Function : Author
PersonId : 752406
IdHAL : mzboito

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Salima Mdhaffar

Function : Author

Laboratoire Informatique d'Avignon

Sina Alisamir

Function : Author

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Ziyi Tong

Function : Author

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Natalia Tomashenko

Function : Author
PersonId : 17002
IdHAL : natalia-tomashenko
IdRef : 223393304

Laboratoire Informatique d'Avignon

Marco Dinarelli

Function : Author
PersonId : 12699
IdHAL : marco-dinarelli
IdRef : 22461939X

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Titouan Parcollet

Function : Author
PersonId : 174514
IdHAL : titouan-parcollet
ORCID : 0000-0003-0672-1346

Laboratoire Informatique d'Avignon

Alexandre Allauzen

Function : Author
PersonId : 171266
IdHAL : alexandre-allauzen
IdRef : 078187621

Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision

Yannick Estève

Function : Author
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire Informatique d'Avignon

Benjamin Lecouteux

Function : Author
PersonId : 7847
IdHAL : benjamin-lecouteux
ORCID : 0000-0003-3000-6190
IdRef : 135355060

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

François Portet

Function : Author
PersonId : 1069
IdHAL : francois-portet
ORCID : 0000-0003-2542-0661
IdRef : 098179160

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Solange Rossato

Function : Author
PersonId : 746390
IdHAL : solange-rossato

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Fabien Ringeval

Function : Author
PersonId : 13134
IdHAL : fabien-ringeval
ORCID : 0000-0002-9213-4529
IdRef : 154573078

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Didier Schwab

Function : Author
PersonId : 4261
IdHAL : didier-schwab
ORCID : 0000-0002-2462-8148
IdRef : 069192359

Université Grenoble Alpes

Laboratoire d'Informatique de Grenoble

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Laurent Besacier

Function : Author

Naver Labs Europe [Meylan]

Abstract

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.

Keywords

Self-Supervised Representation Learning ASR SLU Speech Translation Automatic Emotion Recognition

Domains

Artificial Intelligence [cs.AI]

Fichier principal

Vignette du fichier

FLOWBERT_IS2021(2).pdf (164.81 Ko)

Origin : Files produced by the author(s)

Solène Evain : Connect in order to contact the contributor

https://hal.science/hal-03317730

Submitted on : Thursday, November 25, 2021-12:19:49 PM

Last modification on : Thursday, April 4, 2024-6:24:01 PM

Dates and versions

hal-03317730 , version 1 (07-08-2021)

hal-03317730 , version 2 (05-11-2021)

hal-03317730 , version 3 (25-11-2021)

Identifiers

HAL Id : hal-03317730 , version 3

Cite

Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, et al.. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech. INTERSPEECH 2021: Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. ⟨hal-03317730v3⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON UGA CNRS UNIV-DAUPHINE LIG LIG_TDCGE_GETALP LAMSADE-DAUPHINE PSL LIA MIAI ANR LIG_SIDCH

420 View

304 Download

Share