A Review of Data Placement and Replication Strategies Based on Machine Learning - Optimisation Dynamique de Requêtes Réparties à grande échelle
Communication Dans Un Congrès Année : 2024

A Review of Data Placement and Replication Strategies Based on Machine Learning

Résumé

The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techniques often require threshold-based activation mechanisms that can vary due to the nature of the workload and the underlying system architecture. Hence, setting and adjusting those thresholds usually require human intervention. In this context, machine learning presents a promising facet to automatically define such thresholds to adapt to different workloads and architectures. In this paper, we study the data placement and replication strategies proposed in the literature that employ machine learning. We classify such strategies based on the machine learning method, the platform on which they are deployed, the dynamicity and the achieved objectives. We describe the approach applied by each strategy as well as possible limitations. In addition, we provide insights into the experimental environments and metrics used to evaluate the strategies. We highlight the need to design data placement and replication strategies that respond better to modern needs for distributed systems. We also motivate the use of machine learning to achieve autonomy in distributed systems.

Fichier principal
Vignette du fichier
A_Review_of_Data_Placement_and_Replication_Strategies_Based_on_Machine_Learning__Camera_Ready___v1.pdf (333.3 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04721852 , version 1 (07-10-2024)

Identifiants

  • HAL Id : hal-04721852 , version 1

Citer

Amir Najjar, Riad Mokadem, Jean-Marc Pierson. A Review of Data Placement and Replication Strategies Based on Machine Learning. The 30th International Conference on Parallel and Distributed Systems (ICPADS 2024), Oct 2024, Belgrade, Serbia. ⟨hal-04721852⟩
79 Consultations
26 Téléchargements

Partager

More