, defined in F) and one just before the ENDDO (the tail of I j ), is an optimal solution for F . No additional barrier is needed compared to F if and only

, CIF F has no optimal barrier placement with a barrier just before the ENDDO of L, a rightmost placement is obtained by placing a barrier just before the tail of each interval in GD(i) (plus an extra barrier before the tail of LAST(i) if LAST(i) = i) where I i is the interval with rightmost tail

, This proves that u is to the left of the tail of some interval in a cycle of D. Conversely, if this is the case, there is an optimal solution for F that cuts also I j , thus F needs only ? l (F) barriers. In other words, the rightmost barrier in an optimal barrier placement for F is just before the rightmost tail of an interval in a cycle of D. There is no need to consider other intervals. To study the optimal barrier placements for F in a loop L with respect to an incoming dependence, i.e., a dependence whose tail v is in L, we treat it as an internal dependence I i = (u, v) for L, where u is just to the right of the DO of L (i.e., to the left of any other endpoint in F) and we study F = F ? {I i }, thanks to Lemmas 6 and 7. Below, we assume that I i does not contain an interval in F (i.e., is minimal in F ), otherwise it is always cut by an optimal barrier placement for F, and the rightmost such placement can be found thanks to Lemmas 6 and 7 applied to F. Note that if I i is minimal in F, to the right of any other endpoint in F) and we identify the rightmost position for u for which F = F ? {I j } needs only ? l (F) barriers and not ? l (F) + 1. Let D be the graph defined by the function NEXT for F . Note that i = FIRST(F ) and NEXT(j) = i. Suppose that ? l (F) barriers are enough for F , i.e., ? l (F) = ? l (F )

Adaptive backoff synchronization techniques, Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA'89), pp.396-406, 1989. ,

Barrier inference, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (PoPL'98), pp.342-354, 1998. ,

Automatic decomposition of scientific programs for parallel execution, Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (PoPL'87), pp.63-76, 1987. ,

Parallel programming in Split-C, Proceedings of Supercomputing'93, pp.262-273, 1993. ,

A global approach to detection of parallelism, 1987. ,

Introduction to UPC and language specification, IDA Center for Computing Sciences, 1999. ,

,

Algorithmic Graph Theory and Perfect Graphs, 1980. ,

Data-Parallel Programming on MIMD Computers, 1991. ,

Linear time algorithms on circular-arc graphs, Information Processing Letters, vol.40, issue.3, pp.123-129, 1991. ,

Algorithms for scalable synchronization on shared-memory multiprocessors, ACM Transactions on Computer Systems, vol.9, issue.1, pp.21-65, 1991. ,

Co-array Fortran for parallel programming, SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998. ,

Compile time barrier synchronization minimization, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.6, pp.529-543, 2002. ,

Compiler optimizations for eliminating barrier synchronization, PPoPP'95: Proceedings of the fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.144-155, 1995. ,

High Performance Compilers for Parallel Computing, 1996. ,

Titanium: A high-performance Java dialect, Concurrency: Practice and Experience, vol.10, pp.825-836, 1998. ,