Tackling Language Modelling Bias in Support of Linguistic Diversity - Equipe DECIDE, from data to decision
Communication Dans Un Congrès Année : 2024

Tackling Language Modelling Bias in Support of Linguistic Diversity

Résumé

Current AI-based language technologies—language models, machine translation systems, multilingual dictionaries and corpora—are known to focus on the world’s 2-3% most widely spoken languages. Research efforts of the past decade have attempted to expand this coverage to ‛under-resourced languages.’ The goal of our paper is to bring attention to a corollary phenomenon that we call language modelling bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We define language modelling bias as uneven per-language performance under similar test conditions. We show that bias stems not only from technology but also from ethically problematic research and development methodologies that disregard the needs of language communities. Moving towards diversity-aware alternatives, we present an initiative that aims at reducing language modelling bias within lexical resources through both technology design and methodology, based on an eye-level collaboration with local communities.
Fichier principal
Vignette du fichier
FAccT_PREPRINT__Tackling_Language_Modelling_Bias_to_Support_Linguistic_Diversity-1.pdf (543.8 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04564896 , version 1 (01-05-2024)

Licence

Identifiants

  • HAL Id : hal-04564896 , version 1

Citer

Gábor Bella, Paula Helm, Gertraud Koch, Fausto Giunchiglia. Tackling Language Modelling Bias in Support of Linguistic Diversity. FAccT 2024, ACM, Jun 2024, Rio de Janeiro, Brazil. ⟨hal-04564896⟩
83 Consultations
266 Téléchargements

Partager

More