TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records

Théo Gigant; Frédéric Dufaux; Camille Guinaudeau; Marc Decombas

doi:10.1145/3617233.3617238

Communication Dans Un Congrès Année : 2023

TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records

(1, 2) , (1) , (3, 4) , (2)

1
2
3
4

Théo Gigant

Fonction : Auteur
PersonId : 1261551
IdHAL : gigant
ORCID : 0009-0003-6392-8519

Laboratoire des signaux et systèmes

JustAI

Frédéric Dufaux

Fonction : Auteur
PersonId : 11239
IdHAL : fdufaux
ORCID : 0000-0001-6388-4112
IdRef : 169586170

Laboratoire des signaux et systèmes

Camille Guinaudeau

Fonction : Auteur
PersonId : 20609
IdHAL : camille-guinaudeau
ORCID : 0000-0001-7249-8715
IdRef : 173844340

Laboratoire Interdisciplinaire des Sciences du Numérique

Traitement du Langage Parlé - LISN

Marc Decombas

Fonction : Auteur
PersonId : 931123

JustAI

Résumé

Large language models and multimodal language-vision models give impressive results on current available summarization benchmarks, but are not designed to handle long multimodal documents. Most summarization datasets are composed of either mono-modal documents or short multimodal documents. In order to develop models designed for understanding and summarizing real-world videoconference records that are typically around 1 hour long, we propose a dataset of 9,103 videoconference records extracted from the German National Library of Science and Technology (TIB) archive, along with their abstract. Additionally, we process the content using automatic tools in order to provide the transcripts and key frames. Finally, we present experiments for abstractive summarization, to serve as baseline for future research work in multimodal approaches.

Mots clés

multimedia dataset multimodal documents automatic summarization

Domaines

Multimédia [cs.MM] Intelligence artificielle [cs.AI]

Fichier principal

tib_dataset_preprint_230728.pdf (2.68 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Frédéric Dufaux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04168911

Soumis le : vendredi 28 juillet 2023-15:01:16

Dernière modification le : mardi 16 juillet 2024-11:56:03

Archivage à long terme le : dimanche 29 octobre 2023-18:34:57

Dates et versions

hal-04168911 , version 1 (28-07-2023)

Identifiants

HAL Id : hal-04168911 , version 1
DOI : 10.1145/3617233.3617238

Citer

Théo Gigant, Frédéric Dufaux, Camille Guinaudeau, Marc Decombas. TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records. 20th International Conference on Content-based Multimedia Indexing (CBMI 2023), ACM, Sep 2023, Orléans, France. ⟨10.1145/3617233.3617238⟩. ⟨hal-04168911⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA SUP_LSS SUP_TELECOMS CENTRALESUPELEC UNIV-PARIS-SACLAY LISN GS-COMPUTER-SCIENCE GS-SPORT-HUMAN-MOVEMENT LISN-TLP HUB-IA

425 Consultations

330 Téléchargements

TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager