Iso-level CAFT : how to tackle the combination of communication overhead reduction and fault tolerance scheduling
Résumé
To schedule precedence task graphs in a more realistic framework, weintroduce an efficient fault tolerant scheduling algorithm that is bothcontention-aware and capable of supporting " arbitrary fail-silent (failstop)processor failures. The design of the proposed algorithm which wecall Iso-Level CAFT, is motivated by (i) the search for a better loadbalanceand (ii) the generation of fewer communications. These goalsare achieved by scheduling a chunk of ready tasks simultaneously, whichenables for a global view of the potential communications. Our goalis to minimize the total execution time, or latency, while tolerating anarbitrary number of processor failures. Our approach is based on anactive replication scheme to mask failures, so that there is no need fordetecting and handling such failures. Major achievements include a lowcomplexity, and a drastic reduction of the number of additional communicationsinduced by the replication mechanism. The experimentalresults fully demonstrate the usefulness of Iso-Level CAFT.
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Loading...