[gomp4, committed] Only use transform_to_exit_first_loop_alt for kernels
Tom de Vries
Tom_deVries@mentor.com
Mon Jun 1 14:00:00 GMT 2015
Hi,
this patch fixes the compilation of self-dependent loops in oacc kernels
regions.
First, consider a simple vector addition:
...
#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
{
for (COUNTERTYPE ii = 0; ii < N; ii++)
c[ii] = a[ii] + b[ii];
}
...
Until now (or rather, until the introduction of
transform_to_exit_first_loop_alt), the loop body of such a loop was
parallelized and executed by different gangs, but the last iteration of
the loop body was executed by all the gangs (due to
transform_to_exit_first_loop). This did not lead to wrong results for
this loop, because executing the statement 'c[N-1] = a[N-1] + b[N-1]'
once or 32 times did not change the results.
For self-dependent loops, we do get wrong results however:
...
#pragma acc kernels copy (c[0:N])
{
for (COUNTERTYPE ii = 0; ii < N; ii++)
c[ii] = c[ii] + ii + 1;
}
...
The introduction of transform_to_exit_first_loop_alt fixed that
correctness problem for this example. But
transform_to_exit_first_loop_alt does not always succeed. This patch
makes paralellization fail if transform_to_exit_first_loop_alt fails,
making sure we don't run into the same problem again.
Furthermore, the patch replaces pass_copy_prop with pass_fre. I found
this necessary at some point and added it to the patch, but I can't
reproduce the necessity now, so I'll revert that bit asap.
Committed to gomp-4_0-branch.
Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Only-use-transform_to_exit_first_loop_alt-for-kernel.patch
Type: text/x-patch
Size: 4205 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150601/a2dc630b/attachment.bin>
More information about the Gcc-patches
mailing list