Iso-level CAFT : how to tackle the combination of communication overhead reduction and fault tolerance scheduling

Abstract : To schedule precedence task graphs in a more realistic framework, weintroduce an efficient fault tolerant scheduling algorithm that is bothcontention-aware and capable of supporting " arbitrary fail-silent (failstop)processor failures. The design of the proposed algorithm which wecall Iso-Level CAFT, is motivated by (i) the search for a better loadbalanceand (ii) the generation of fewer communications. These goalsare achieved by scheduling a chunk of ready tasks simultaneously, whichenables for a global view of the potential communications. Our goalis to minimize the total execution time, or latency, while tolerating anarbitrary number of processor failures. Our approach is based on anactive replication scheme to mask failures, so that there is no need fordetecting and handling such failures. Major achievements include a lowcomplexity, and a drastic reduction of the number of additional communicationsinduced by the replication mechanism. The experimentalresults fully demonstrate the usefulness of Iso-Level CAFT.
Document type :
Reports
Complete list of metadatas

Cited literature [4 references]  Display  Hide  Download

https://hal-lara.archives-ouvertes.fr/hal-02102781
Contributor : Colette Orange <>
Submitted on : Wednesday, April 17, 2019 - 4:15:32 PM
Last modification on : Friday, April 19, 2019 - 1:38:14 AM

File

LIP-RR_2008-25.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02102781, version 1

Collections

Citation

Anne Benoit, Mourad Hakem, Yves Robert. Iso-level CAFT : how to tackle the combination of communication overhead reduction and fault tolerance scheduling. [Research Report] LIP RR-2008-25, Laboratoire de l'informatique du parallélisme. 2008, 2+16p. ⟨hal-02102781⟩

Share

Metrics

Record views

15

Files downloads

24