Iso-level CAFT : how to tackle the combination of communication overhead reduction and fault tolerance scheduling - LARA - Libre accès aux rapports scientifiques et techniques
Rapport (Rapport De Recherche) Année : 2008

Iso-level CAFT : how to tackle the combination of communication overhead reduction and fault tolerance scheduling

Résumé

To schedule precedence task graphs in a more realistic framework, weintroduce an efficient fault tolerant scheduling algorithm that is bothcontention-aware and capable of supporting " arbitrary fail-silent (failstop)processor failures. The design of the proposed algorithm which wecall Iso-Level CAFT, is motivated by (i) the search for a better loadbalanceand (ii) the generation of fewer communications. These goalsare achieved by scheduling a chunk of ready tasks simultaneously, whichenables for a global view of the potential communications. Our goalis to minimize the total execution time, or latency, while tolerating anarbitrary number of processor failures. Our approach is based on anactive replication scheme to mask failures, so that there is no need fordetecting and handling such failures. Major achievements include a lowcomplexity, and a drastic reduction of the number of additional communicationsinduced by the replication mechanism. The experimentalresults fully demonstrate the usefulness of Iso-Level CAFT.
Fichier principal
Vignette du fichier
LIP-RR_2008-25.pdf (572.31 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02102781 , version 1 (17-04-2019)

Identifiants

  • HAL Id : hal-02102781 , version 1

Citer

Anne Benoit, Mourad Hakem, Yves Robert. Iso-level CAFT : how to tackle the combination of communication overhead reduction and fault tolerance scheduling. [Research Report] LIP RR-2008-25, Laboratoire de l'informatique du parallélisme. 2008, 2+16p. ⟨hal-02102781⟩
39 Consultations
123 Téléchargements

Partager

More