This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
- From: Tom de Vries <Tom_deVries at mentor dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Thomas Schwinge <thomas at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Tue, 2 Jun 2015 14:49:39 +0200
- Subject: Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
- Authentication-results: sourceware.org; auth=none
- References: <546743BC dot 5070804 at mentor dot com> <54678BE8 dot 7010201 at mentor dot com> <54746A4F dot 30309 at mentor dot com> <87wq15z1wg dot fsf at kepler dot schwinge dot homeip dot net> <alpine dot LSU dot 2 dot 11 dot 1504220941390 dot 20496 at zhemvz dot fhfr dot qr>
On 22-04-15 09:42, Richard Biener wrote:
This patch adds pass_loop_ccp to pass group pass_oacc_kernels.
> > >
> > >We need this pass to simplify the loop body, and allow pass_parloops to detect
> > >that loop iterations are independent.
> > >
> >
> >As suggested here (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html )
> >I've replaced the pass_ccp with pass_copyprop, which performs trivial constant
> >propagation in addition to copy propagation.
> >
> >Bootstrapped and reg-tested as before.
> >
> >OK for trunk?
I've recently wondered why we do copy propagation after LIM and I don't
remember. Can you remind me? Can you add testcases that fail before
this kind of patches and pass afterwards?
For attached test-case, we manage to parallelize with pass_copy_prop (but then
run into an ICE):
...
PASS: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-not
parloops_oacc_kernels "FAILED:"
PASS: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-times
parloops_oacc_kernels "SUCCESS: may be parallelized" 1
FAIL: c-c++-common/goacc/kernels-loop-reduction.c (internal compiler error)
FAIL: c-c++-common/goacc/kernels-loop-reduction.c (test for excess errors)
...
Without pass_copy_prop we don't manage to parallelize:
...
FAIL: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-not
parloops_oacc_kernels "FAILED:"
FAIL: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-times
parloops_oacc_kernels "SUCCESS: may be parallelized" 1
PASS: c-c++-common/goacc/kernels-loop-reduction.c (test for excess errors)
...
In more detail, before pass_copy_prop, we have:
...
<bb 7>:
# D__lsm.14_3 = PHI <D__lsm.14_9(15), D__lsm.14_21(6)>
...
sum.3_39 = D__lsm.14_3;
sum.4_40 = _37 + sum.3_39;
D__lsm.14_9 = sum.4_40;
...
if (ii_43 <= 524287)
goto <bb 15>;
else
goto <bb 8>;
<bb 15>:
goto <bb 7>;
...
And after pass_copy_prop, we have:
...
<bb 7>:
# D__lsm.14_3 = PHI <sum.4_40(8), D__lsm.14_21(6)>
...
sum.4_40 = D__lsm.14_3 + _37;
...
if (ii_43 <= 524287)
goto <bb 8>;
else
goto <bb 9>;
<bb 8>:
goto <bb 7>;
...
The testcase is not committed yet, because reductions are not handled yet (which
explains the ICE).
Thanks,
- Tom
/* { dg-additional-options "-O2" } */
/* { dg-additional-options "-ftree-parallelize-loops=32" } */
/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
#include <stdlib.h>
#define N (1024 * 512)
#define COUNTERTYPE unsigned int
int
main (void)
{
unsigned int *__restrict a;
unsigned int sum = 0;
unsigned int sum2 = 0;
a = (unsigned int *)malloc (N * sizeof (unsigned int));
for (COUNTERTYPE i = 0; i < N; i++)
a[i] = i * 2;
#pragma acc kernels copy (sum) copyin (a[0:N])
{
for (COUNTERTYPE ii = 0; ii < N; ii++)
sum += a[ii];
}
for (COUNTERTYPE i = 0; i < N; i++)
sum2 += a[i];
if (sum != sum2)
abort ();
free (a);
return 0;
}
/* Check that only one loop is analyzed, and that it can be parallelized. */
/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */