[gomp4, committed] Only use transform_to_exit_first_loop_alt for kernels

Tom de Vries Tom_deVries@mentor.com
Mon Jun 1 14:00:00 GMT 2015


Hi,

this patch fixes the compilation of self-dependent loops in oacc kernels 
regions.


First, consider a simple vector addition:
...
#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
    {
      for (COUNTERTYPE ii = 0; ii < N; ii++)
        c[ii] = a[ii] + b[ii];
    }
...

Until now (or rather, until the introduction of 
transform_to_exit_first_loop_alt), the loop body of such a loop was 
parallelized and executed by different gangs, but the last iteration of 
the loop body was executed by all the gangs (due to 
transform_to_exit_first_loop). This did not lead to wrong results for 
this loop, because executing the statement 'c[N-1] = a[N-1] + b[N-1]' 
once or 32 times did not change the results.

For self-dependent loops, we do get wrong results however:
...
#pragma acc kernels copy (c[0:N])
    {
      for (COUNTERTYPE ii = 0; ii < N; ii++)
        c[ii] = c[ii] + ii + 1;
    }
...

The introduction of transform_to_exit_first_loop_alt fixed that 
correctness problem for this example. But 
transform_to_exit_first_loop_alt does not always succeed. This patch 
makes paralellization fail if transform_to_exit_first_loop_alt fails, 
making sure we don't run into the same problem again.


Furthermore, the patch replaces pass_copy_prop with pass_fre. I found 
this necessary at some point and added it to the patch, but I can't 
reproduce the necessity now, so I'll revert that bit asap.


Committed to gomp-4_0-branch.

Thanks,
- Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Only-use-transform_to_exit_first_loop_alt-for-kernel.patch
Type: text/x-patch
Size: 4205 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150601/a2dc630b/attachment.bin>


More information about the Gcc-patches mailing list