Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications - Department of Natural Language Processing & Knowledge Discovery
Communication Dans Un Congrès Année : 2024

Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications

Résumé

Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life applications involving Voice Activity Detection (VAD), Speaker Diarization (SD), and SA-ASR. Second, we advocate using VAD output segments to fine-tune the SA-ASR model, considering that it is also applied to VAD segments during test, and show that this results in a relative reduction of Speaker Error Rate (SER) up to 28%. Finally, we explore strategies to enhance the extraction of the speaker embedding templates used as inputs by the SA-ASR system. We show that extracting them from SD output rather than annotated speaker segments results in a relative SER reduction up to 20%.
Fichier principal
Vignette du fichier
Odyssey2024_LatexTemplate.pdf (987.94 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04495886 , version 1 (08-03-2024)
hal-04495886 , version 2 (04-09-2024)

Identifiants

Citer

Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent. Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications. The Speaker and Language Recognition Workshop Odyssey 2024, Jun 2024, Quebec, Canada. ⟨hal-04495886v2⟩
79 Consultations
81 Téléchargements

Altmetric

Partager

More