NEMO 4.0 performance: how to identify and reduce unnecessary communications
Résumé
A non-intrusive instrumentation of the NEMO code and the development of a simplified configuration (called BENCH) brought information about MPI communications cost and structure. It helped us to identify the most appropriate incremental developments that model needs to enhance its scalability. We prioritised the reduction of extra calculations and communications required at the North Polar folding, the grouping of boundary exchanges and the replacement of global communications by alternative algorithms. Appreciable speed up (x2 in some cases) is measured. Scalability limit is pushed below a size of 7x7 grid points per sub-domain, showing that the limitation of the North Polar folding solution can be compared with the supposed icosahedral grid one. We consider that scalability is not the major well of future performance gain, neither horizontal resolution increase, whereas potentiality of extra developments accelerating cache access (horizontal domain tiling and single precision computations) is favourably evaluated. Taking note of the limited technological gain between the two 6-month old and 4-year-6-month old machines we operated, in order to avoid future net decrease in computing performance, we recommend from now to limit new expensive code developments to what technology and engineering are able to sustain.
Domaines
ClimatologieOrigine | Fichiers produits par l'(les) auteur(s) |
---|