Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution - Agropolis Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution

Résumé

Clustering with accurate results have become a topic of high interest.Dirichlet Process Mixture (DPM) is a model used for clustering withthe advantage of discovering the number of clusters automaticallyand offering nice properties like,e.g., its potential convergence tothe actual clusters in the data. These advantages come at the priceof prohibitive response times, which impairs its adoption and makescentralized DPM approaches inefficient. We propose DC-DPM, aparallel clustering solution that gracefully scales to millions of datapoints while remaining DPM compliant, which is the challenge ofdistributing this process. Our experiments, on both synthetic andreal world data, illustrate the high performance of our approach onmillions of data points. The centralized algorithm does not scale andhas its limit on 100K data points, where it needs more than 7 hours.In this case, our approach needs less than 30 seconds.
Fichier principal
Vignette du fichier
ACM_SigConf_SAC2019.pdf (7.13 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01999453 , version 1 (30-01-2019)

Identifiants

Citer

Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia. Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution. SAC 2019 - 34th ACM/SIGAPP Symposium on Applied Computing, Apr 2019, Limassol, Cyprus. pp.502-509, ⟨10.1145/3297280.3297327⟩. ⟨hal-01999453⟩
328 Consultations
862 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More