Tackling Language Modelling Bias in Support of Linguistic Diversity

Gábor Bella; Paula Helm; Gertraud Koch; Fausto Giunchiglia

Communication Dans Un Congrès Année : 2024

Tackling Language Modelling Bias in Support of Linguistic Diversity

(1, 2) , (3) , (4) , (5)

1
2
3
4
5

Gábor Bella

Fonction : Auteur correspondant
PersonId : 1343031
IdHAL : gabor-bella
ORCID : 0000-0002-3868-1740

Connectez-vous pour contacter l'auteur

Equipe DECIDE

Département de Science des Données

Paula Helm

Fonction : Auteur
PersonId : 1343029
ORCID : 0000-0002-2719-9721

University of Amsterdam [Amsterdam] = Universiteit van Amsterdam

Gertraud Koch

Fonction : Auteur

University of Hamburg

Fausto Giunchiglia

Fonction : Auteur
PersonId : 1019906

Università degli Studi di Trento = University of Trento

Résumé

Current AI-based language technologies—language models, machine translation systems, multilingual dictionaries and corpora—are known to focus on the world’s 2-3% most widely spoken languages. Research efforts of the past decade have attempted to expand this coverage to ‛under-resourced languages.’ The goal of our paper is to bring attention to a corollary phenomenon that we call language modelling bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We define language modelling bias as uneven per-language performance under similar test conditions. We show that bias stems not only from technology but also from ethically problematic research and development methodologies that disregard the needs of language communities. Moving towards diversity-aware alternatives, we present an initiative that aims at reducing language modelling bias within lexical resources through both technology design and methodology, based on an eye-level collaboration with local communities.

Mots clés

language modeling bias linguistic diversity low-resource languages natural language processing Value-sensitive design

Domaines

Intelligence artificielle [cs.AI] Informatique et langage [cs.CL] Ethique

Fichier principal

FAccT_PREPRINT__Tackling_Language_Modelling_Bias_to_Support_Linguistic_Diversity-1.pdf (543.8 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Gábor Bella : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04564896

Soumis le : mercredi 1 mai 2024-00:28:29

Dernière modification le : lundi 4 novembre 2024-16:36:05

Dates et versions

hal-04564896 , version 1 (01-05-2024)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : hal-04564896 , version 1

Citer

Gábor Bella, Paula Helm, Gertraud Koch, Fausto Giunchiglia. Tackling Language Modelling Bias in Support of Linguistic Diversity. FAccT 2024, ACM, Jun 2024, Rio de Janeiro, Brazil. ⟨hal-04564896⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BREST CNRS LAB-STICC_UBO ETHIQUE ENIB LAB-STICC LAB-STICC_DECIDE LAB-STICC_DMID

83 Consultations

266 Téléchargements

Tackling Language Modelling Bias in Support of Linguistic Diversity

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager