A. Agarwal, D. Kranz, and V. Natrajan, Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.9, pp.943-962, 1995.

J. M. Anderson and M. S. Lam, Global Optmizations for Parallelism and Locality on Scalable Parallel Machines, Proceedings of the ACM SIGPLAN '91 Conference o n P r ogramming Language Design and Implementation, pp.112-125, 1993.

C. Calvin and F. Desprez, Minimizing Communication Overhead Using Pipelining for Multi-Dimensional FFT on Distributed Memory Machines, Parallel Computing'93, pp.65-72, 1993.

, The general problem of determining optimal permutations in the multi-dimensional case is very hard

W. H. Chou and S. Y. Kung, Scheduling Partitioned Algorithms on Processor Arrays with Limited Communication Supports, Proceedings of the International Conference on Application Speciic Array Processors (ASAP), pp.53-64, 1993.

S. Coleman and K. Mckinley, Tile Size Selection using Cache Organization and Data Layout, Proceedings of the ACM SIGPLAN '95 Conference o n P r ogramming Language Design and Implementation, vol.30, pp.279-290, 1995.

F. Desprez, P. Ramet, and J. Roman, Optimal Grain Size Computation for Pipelined Algorithms, Europar'96 Parallel Processing, v olume 1123 of Lecture Notes in Computer Science, pp.165-172, 1996.
URL : https://hal.archives-ouvertes.fr/inria-00346485

E. Fr-ed, J. Desprez, F. Dongarra, Y. Rastello, and . Robert, Determining the Idle Time of a Tiling: New Results, Proceedings of the Conference o n P a r allel Architectures and Compilation Techniques (PACT '97), pp.307-321, 1997.

H. Erik and . Hollander, Partitioning and Labeling of Loops by Unimodular Transformations, IEEE Transactions on Parallel and Distributed Systems, vol.3, issue.4, pp.465-476, 1992.

D. Mich-ele, Alignement et Distribution en Parall elisation Automatique, 1996.

T. Mich-ele-dion, Y. Risset, and . Robert, Resource-constrained Scheduling of Partitioned Algorithms on Processor Arrays, Proceedings of Euromicro Workshop on Parallel and Distributed P r ocessing, pp.571-580, 1995.

F. Irigoin and R. Triolet, Supernode Partitioning, 15th Symposium on Principles of Programming Languages, pp.319-329, 1988.

K. Wesley, . Kaplow, and K. Boleslaw, S z y m a nski. Tiling for Parallel Execution -Optimizing Node Cache Performance, Workshop on Challenges in Compiling for Scaleable Parallel Systems, Eigth IEEE Symposium on Parallel and Distributed P r ocessing, 1996.

S. Y. Kung, VLSI Array Processors, 1988.

W. Li, Compiler Cache Optimizations for Banded Matrix Problems, Conference p r oceedings of the 1995 International Conference on Supercomputing, pp.21-30, 1995.

. Mpi-forum, MPI : A Message Passing Interface Standard, 1995.

H. Ohta, Y. Saito, M. Kainaga, and H. Ona, Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests, Conference proceedings of the 1995 International Conference on Supercomputing, pp.270-279, 1995.

J. Ramanujam and P. Sadayappan, Tiling Multidimensional Iteration Spaces for Multicomputers, Journal of Parallel and Distributed Computing, vol.16, pp.108-120, 1992.

F. Rastello, A. Rao, and S. Pande, Optimal Task Ordering in Linear Tiles for Minimizing Loop Completion Time, 1998.

S. University, This manual is a part of the SUIF compiler documentation set, 1994.

P. Tang and J. N. Zigman, Reducing Data Communication Overhead for DOACROSS Loop Nests, Conference p r oceedings of the 1994 International Conference on Supercomputing, pp.44-53, 1994.
DOI : 10.1145/181181.181261

M. E. Wolf and M. S. Lam, A Data Locality Optimizing Algorithm, Proceedi n g s o f t h e A CM SIGPLAN '91 Conference on Programming Language Design and Implementation, pp.30-44, 1991.

M. J. Olfe, More Iteration Space Tiling, Proceedings of Supercomputing '89, pp.655-664, 1989.

, Jingling Xue. On Tiling as a Loop Transformation. Parallel Processing Letters, vol.7, issue.4, pp.409-424, 1997.