, defined in F) and one just before the ENDDO (the tail of I j ), is an optimal solution for F . No additional barrier is needed compared to F if and only

, CIF F has no optimal barrier placement with a barrier just before the ENDDO of L, a rightmost placement is obtained by placing a barrier just before the tail of each interval in GD(i) (plus an extra barrier before the tail of LAST(i) if LAST(i) = i) where I i is the interval with rightmost tail

, This proves that u is to the left of the tail of some interval in a cycle of D. Conversely, if this is the case, there is an optimal solution for F that cuts also I j , thus F needs only ? l (F) barriers. In other words, the rightmost barrier in an optimal barrier placement for F is just before the rightmost tail of an interval in a cycle of D. There is no need to consider other intervals. To study the optimal barrier placements for F in a loop L with respect to an incoming dependence, i.e., a dependence whose tail v is in L, we treat it as an internal dependence I i = (u, v) for L, where u is just to the right of the DO of L (i.e., to the left of any other endpoint in F) and we study F = F ? {I i }, thanks to Lemmas 6 and 7. Below, we assume that I i does not contain an interval in F (i.e., is minimal in F ), otherwise it is always cut by an optimal barrier placement for F, and the rightmost such placement can be found thanks to Lemmas 6 and 7 applied to F. Note that if I i is minimal in F, to the right of any other endpoint in F) and we identify the rightmost position for u for which F = F ? {I j } needs only ? l (F) barriers and not ? l (F) + 1. Let D be the graph defined by the function NEXT for F . Note that i = FIRST(F ) and NEXT(j) = i. Suppose that ? l (F) barriers are enough for F , i.e., ? l (F) = ? l (F )

A. Agarwal and M. Cherian, Adaptive backoff synchronization techniques, Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA'89), pp.396-406, 1989.

A. Aiken and D. Gay, Barrier inference, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (PoPL'98), pp.342-354, 1998.

R. Allen, D. Callahan, and K. Kennedy, Automatic decomposition of scientific programs for parallel execution, Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (PoPL'87), pp.63-76, 1987.

A. C. Arapaci-dusseau, S. C. Goldstein, A. Krishnamurthy, S. Lumetta, K. A. Thorsten-von-eicken et al., Parallel programming in Split-C, Proceedings of Supercomputing'93, pp.262-273, 1993.

C. D. Callahan, A global approach to detection of parallelism, 1987.

W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks et al., Introduction to UPC and language specification, IDA Center for Computing Sciences, 1999.

. Co-array-fortran,

M. Charles and G. , Algorithmic Graph Theory and Perfect Graphs, 1980.

J. Philip, M. J. Hatcher, and . Quinn, Data-Parallel Programming on MIMD Computers, 1991.

W. Hsu and K. Tsai, Linear time algorithms on circular-arc graphs, Information Processing Letters, vol.40, issue.3, pp.123-129, 1991.

M. John, M. L. Mellor-crummey, and . Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Transactions on Computer Systems, vol.9, issue.1, pp.21-65, 1991.

W. Robert, J. Numrich, and . Reid, Co-array Fortran for parallel programming, SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998.

O. Michael, E. Boyle, and . Stöhr, Compile time barrier synchronization minimization, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.6, pp.529-543, 2002.

C. Tseng, Compiler optimizations for eliminating barrier synchronization, PPoPP'95: Proceedings of the fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.144-155, 1995.

M. Wolfe, High Performance Compilers for Parallel Computing, 1996.

K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit et al., Titanium: A high-performance Java dialect, Concurrency: Practice and Experience, vol.10, pp.825-836, 1998.