Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Caroline Pasquer 1 Agata Savary 1 Jean-Yves Antoine 1 Carlos Ramisch 2 Nicolas Labroche 1 Arnaud Giacometti 1
1 BDTLN - Bases de données et traitement des langues naturelles
LIFAT - Laboratoire d'Informatique Fondamentale et Appliquée de Tours
2 TALEP - Traitement Automatique du Langage Ecrit et Parlé
LIS - Laboratoire d'Informatique et Systèmes
Abstract : Automatic identification of mutiword expressions (MWEs) is a prerequisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variability profiles. We use this fact to determine the optimal set of features which could be used in a supervised classification setting to solve a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. Surprisingly, a simple custom frequency-based feature selection method proves more efficient than other standard methods such as Chi-squared test, information gain or decision trees. An SVM classi-fier using the optimal set of only 6 features out-performs the best systems from a recent shared task on the French seen data.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas
Contributor : Caroline Pasquer <>
Submitted on : Thursday, July 23, 2020 - 6:26:36 PM
Last modification on : Monday, December 14, 2020 - 5:38:41 PM
Long-term archiving on: : Tuesday, December 1, 2020 - 6:28:59 AM


Files produced by the author(s)


  • HAL Id : hal-02905874, version 1


Caroline Pasquer, Agata Savary, Jean-Yves Antoine, Carlos Ramisch, Nicolas Labroche, et al.. To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features. 2020. ⟨hal-02905874⟩



Record views


Files downloads