Large-scale Machine-Learning analysis of scientific PDF for monitoring the production and the openness of research data and software in France - Productions du Comité pour la science ouverte
Pré-Publication, Document De Travail Année : 2023

Large-scale Machine-Learning analysis of scientific PDF for monitoring the production and the openness of research data and software in France

Résumé

There is today no standard way to reference research datasets and software in scientific communication. Emerging editorial workflows and supporting infrastructures dedicated to research datasets and software are still poorly adopted in current publishing practices and are highly fragmented. To better follow the production of research datasets and software, we present a text mining method applied to scientific publications at scale and implemented at the French national level. Our approach relies on state-of-the-art Machine Learning and document engineering techniques to ensure reliable accuracy across multiple research areas and document types. The annotations produced by our system are used by the French Open Science Monitor (BSO) platform to follow the production and the openness of research datasets and software, in the context of the second National Plan for Open Science. The source code and the data of the French Open Science Monitor, as well as all the associated tools and training data, are all available under open licences.
Fichier principal
Vignette du fichier
BSO3_preprint_20230625.pdf (843.22 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-04121339 , version 1 (09-06-2023)
hal-04121339 , version 2 (13-06-2023)
hal-04121339 , version 3 (25-06-2023)

Licence

Identifiants

  • HAL Id : hal-04121339 , version 3

Citer

Aricia Bassinet, Laetitia Bracco, Anne L'Hôte, Eric Jeangirard, Patrice Lopez, et al.. Large-scale Machine-Learning analysis of scientific PDF for monitoring the production and the openness of research data and software in France. 2023. ⟨hal-04121339v3⟩
2246 Consultations
450 Téléchargements

Partager

More