How fast can one resize a distributed file system? - Joint Laboratory on Extreme Scale Computing
Article Dans Une Revue Journal of Parallel and Distributed Computing Année : 2020

How fast can one resize a distributed file system?

Résumé

Efficient resource utilization becomes a major concern as large-scale distributed computing infrastructures keep growing in size. Malleability, the possibility for resource managers to dynamically increase or decrease the amount of resources allocated to a job, is a promising way to save energy and costs. However, state-of-the-art parallel and distributed storage systems have not been designed with malleability in mind. The reason is mainly the supposedly high cost of data transfers required by resizing operations. Nevertheless, as network and storage technologies evolve, old assumptions about potential bottlenecks can be revisited. In this study, we evaluate the viability of malleability as a design principle for a distributed storage system. We specifically model the minimal duration of the commission and decommission operations. To show how our models can be used in practice, we evaluate the performance of these operations in HDFS, a relevant state-of-the-art distributed file system. We show that the existing decommission mechanism of HDFS is good when the network is the bottleneck, but can be accelerated by up to a factor 3 when storage is the limiting factor. We also show that the commission in HDFS can be substantially accelerated. With the highlights provided by our model, we suggest improvements to speed both operations in HDFS. We discuss how the proposed models can be generalized for distributed file systems with different assumptions and what perspectives are open for the design of efficient malleable distributed file systems.
Fichier principal
Vignette du fichier
JPDC-Cheriere-Dorier-Antoniu-2020.pdf (326.49 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02961875 , version 1 (08-10-2020)

Identifiants

Citer

Nathanael Cheriere, Matthieu Dorier, Gabriel Antoniu. How fast can one resize a distributed file system?. Journal of Parallel and Distributed Computing, 2020, 140, pp.80-98. ⟨10.1016/j.jpdc.2020.02.001⟩. ⟨hal-02961875⟩
150 Consultations
185 Téléchargements

Altmetric

Partager

More