This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[gomp4] fix kernel reductions
- From: Nathan Sidwell <nathan at acm dot org>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Tom de Vries <tom at codesourcery dot com>
- Date: Tue, 12 Jan 2016 09:10:20 -0500
- Subject: [gomp4] fix kernel reductions
- Authentication-results: sourceware.org; auth=none
This patch fixes an ICE encountered with the Houston's testsuite when kernel
optimizations are enabled.
The reduction is implemented via a cmp&swap loop, but later than the omp code
usually does that lowering. At the point it happens for kernels, loops must
have simple latches, which this patch implements by splitting the non-simple
latch's back edge (which is what force_single_succ_latches does when run over
the loop structure).
applied to gomp4
nathan
2016-01-08 Nathan Sidwell <nathan@acm.org>
gcc/
* omp-low.c (expand_omp_atomic_pipeline): Pay attention to
LOOPS_HAVE_SIMPLE_LATCHES state.
2016-01-12 Nathan Sidwell <nathan@acm.org>
gcc/testsuite/
* gcc.dg/goacc/kern-1.c: New.
Index: omp-low.c
===================================================================
--- omp-low.c (revision 232179)
+++ omp-low.c (revision 232180)
@@ -12370,6 +12370,9 @@ expand_omp_atomic_pipeline (basic_block
loop->header = loop_header;
loop->latch = store_bb;
add_loop (loop, loop_header->loop_father);
+ if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
+ /* Split the edge from store_bb to loop_header */
+ split_edge (e);
if (gimple_in_ssa_p (cfun))
update_ssa (TODO_update_ssa_no_phi);
Index: gcc.dg/goacc/kern-1.c
===================================================================
--- gcc.dg/goacc/kern-1.c (revision 0)
+++ gcc.dg/goacc/kern-1.c (working copy)
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -O2 -ftree-parallelize-loops=32" } */
+
+/* The reduction on sum could cause an ICE with a non-simple latch loop. */
+
+int printf (char const *, ...);
+
+int
+main ()
+{
+ int i;
+ double a[1000], sum = 0;
+
+
+#pragma acc kernels pcopyin(a[0:1000])
+#pragma acc loop reduction(+:sum)
+ for(int i=0; i<1000; i++) {
+ sum += a[i];
+ }
+
+ printf ("%lf\n", sum);
+
+ return 0;
+}