This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC, PR66873] Use graphite for parloops


On 21/07/15 02:21, Tom de Vries wrote:
On 20/07/15 20:31, Sebastian Pop wrote:
Tom de Vries wrote:
So I wondered, why not always use the graphite dependency analysis
in parloops. (Of course you could use -floop-parallelize-all, but
that also changes the heuristic). So I wrote a patch for parloops to
use graphite dependency analysis by default (so without
-floop-parallelize-all), but while testing found out that all the
reduction test-cases started failing because the modifications
graphite makes to the code messes up the parloops reduction
analysis.

Then I came up with this patch, which:
- first runs a parloops pass, restricted to reduction loops only,

I would prefer to fix graphite to catch the reduction loop and avoid
running an
extra pass before graphite for that case.

Can you please specify which file is
failing to be parallelized?  Are they all those testcases that you
update the flags?

Yep, f.i. autopar/reduc-1.c.

Also it seems to me that you are missing -ffast-math to parallelize
all these
loops: without that flag graphite would not mark reductions as
associative/commutative operations and they would not be recognized as
parallel.

For an unsigned int reduction, we need don't need -ffast-math, so we
don't have to specify it for parloops. It seems graphite is too strict
in that, since it won't do any reductions without -fassociate-math.

But indeed, with -ffast-math -ftree-parallelize-loops=2
-floop-parallelize-all we are able to parallelize the 3 reduction loops
in autopar/reduc-1.c

Is that something the current parloops detection is not too strict about?

Parloops uses vect_is_simple_reduction_1, which has some extensive
testing to see if reordering of operations is allowed.

Hmm, vect_is_simple_reduction_1 seems to miss the check for TYPE_OVERFLOW_WRAPS.

The testing of
graphite seems to be limited to testing fassociative-math, which makes
me suspect that tests are missing there, f.i. TYPE_OVERFLOW_TRAPS.

I could not enable both fassociative-math and -ftrapv, so I guess that case was covered implicitly.

Attached patch:
- adds the missing TYPE_OVERFLOW_WRAPS check in
  vect_is_simple_reduction_1
- enabled reductions in graphite, when safe
- introduces a macro FIXED_POINT_TYPE_OVERFLOW_WRAPS_P

Currently bootstrapping and reg-testing on x86_64.

Thanks,
- Tom

Fix reduction safety checks

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* tree.h (FIXED_POINT_TYPE_OVERFLOW_WRAPS_P): Define.
	* tree-ssa-reassoc.c (can_reassociate_p): Rewrite using
	FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
	flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
	TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
	Only allow wrapping fixed-point otherwise.
	(build_poly_scop): Always call
	rewrite_commutative_reductions_out_of_ssa.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Honour
	TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P. Rewrite using
	FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.

	* gcc.dg/autopar/outer-4.c: Change reduction type to unsigned.
	* gcc.dg/autopar/outer-5.c: Same.
	* gcc.dg/autopar/outer-6.c: Same.
	* gcc.dg/autopar/reduc-2.c: Add -fwrapv to dg-options.
	* gcc.dg/autopar/reduc-8.c: Same.
	* gcc.dg/autopar/reduc-2char.c: Add -fwrapv to dg-options. Update
	scan-tree-dumps.
	* gcc.dg/autopar/reduc-2short.c: Same.
---
 gcc/graphite-sese-to-poly.c                 | 21 +++++++++++++++-----
 gcc/testsuite/gcc.dg/autopar/outer-4.c      |  6 +++---
 gcc/testsuite/gcc.dg/autopar/outer-5.c      |  8 ++++----
 gcc/testsuite/gcc.dg/autopar/outer-6.c      |  8 ++++----
 gcc/testsuite/gcc.dg/autopar/reduc-2.c      |  2 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c  |  6 +++---
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c |  6 +++---
 gcc/testsuite/gcc.dg/autopar/reduc-8.c      |  2 +-
 gcc/tree-ssa-reassoc.c                      |  3 ++-
 gcc/tree-vect-loop.c                        | 30 +++++++++++++++++++++--------
 gcc/tree.h                                  | 12 ++++++++++++
 11 files changed, 71 insertions(+), 33 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 8960c3f..cb2204e 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2604,9 +2604,21 @@ is_reduction_operation_p (gimple stmt)
   gcc_assert (is_gimple_assign (stmt));
   code = gimple_assign_rhs_code (stmt);
 
-  return flag_associative_math
-    && commutative_tree_code (code)
-    && associative_tree_code (code);
+  if (!commutative_tree_code (code)
+      || !associative_tree_code (code))
+    return false;
+
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  if (SCALAR_FLOAT_TYPE_P (type))
+    return flag_associative_math;
+
+  if (INTEGRAL_TYPE_P (type))
+    return (!TYPE_OVERFLOW_TRAPS (type)
+	    && TYPE_OVERFLOW_WRAPS (type));
+
+  return (FIXED_POINT_TYPE_P (type)
+	  && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
 }
 
 /* Returns true when PHI contains an argument ARG.  */
@@ -3147,8 +3159,7 @@ build_poly_scop (scop_p scop)
   if (!scop_ivs_can_be_represented (scop))
     return;
 
-  if (flag_associative_math)
-    rewrite_commutative_reductions_out_of_ssa (scop);
+  rewrite_commutative_reductions_out_of_ssa (scop);
 
   build_sese_loop_nests (region);
   /* Record all conditions in REGION.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 6fd37c5..a57a0e4 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -3,14 +3,14 @@
 
 void abort (void);
 
-int g_sum=0;
-int x[500][500];
+unsigned int g_sum=0;
+unsigned int x[500][500];
 
 __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Double reduction is currently not supported, outer loop is not 
      parallelized.  Inner reduction is detected, inner loop is 
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-5.c b/gcc/testsuite/gcc.dg/autopar/outer-5.c
index 6a0ae91..c1bda6a 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-5.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-5.c
@@ -3,9 +3,9 @@
 
 void abort (void);
 
-int x[500][500];
-int y[500];
-int g_sum=0;
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
 
 __attribute__((noinline))
 void init (int i, int j)
@@ -17,7 +17,7 @@ __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Inner cycle is currently not supported, outer loop is not 
      parallelized.  Inner reduction is detected, inner loop is 
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
index 6bef7cc..9f90ba6 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
@@ -3,9 +3,9 @@
 
 void abort (void);
 
-int x[500][500];
-int y[500];
-int g_sum=0;
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
 
 __attribute__((noinline))
 void init (int i, int j)
@@ -17,7 +17,7 @@ __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Outer loop reduction, outerloop is parallelized.  */ 
   sum=0;
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2.c b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
index 3ad16e4..ad78241 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 072489f..857a3cf 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -60,7 +60,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 4dbbc8a..dcf537e 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -59,6 +59,6 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 16fb954..05f1126 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdlib.h>
 
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index efb813c..b48ae1e 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4229,7 +4229,8 @@ can_reassociate_p (tree op)
 {
   tree type = TREE_TYPE (op);
   if ((INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type))
-      || NON_SAT_FIXED_POINT_TYPE_P (type)
+      || (FIXED_POINT_TYPE_P (type)
+	  && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type))
       || (flag_associative_math && FLOAT_TYPE_P (type)))
     return true;
   return false;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..e014be2 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			"reduction: unsafe fp math optimization: ");
       return NULL;
     }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-	   && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe int math optimization: ");
-      return NULL;
+      if (TYPE_OVERFLOW_TRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow traps): ");
+	  return NULL;
+	}
+      if (!TYPE_OVERFLOW_WRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow doesn't wrap): ");
+	  return NULL;
+	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
+  else if (FIXED_POINT_TYPE_P (type)
+	   && !FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type)
+	   && check_reduction)
     {
       /* Changing the order of operations changes the semantics.  */
       if (dump_enabled_p ())
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..f2ae669 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -501,6 +501,18 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define FIXED_POINT_TYPE_P(TYPE)	(TREE_CODE (TYPE) == FIXED_POINT_TYPE)
 
+/* Nonzero if fixed-point type TYPE wraps at overflow.
+
+   GCC support of fixed-point types as specified by the draft technical report
+   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to control overflow
+   and rounding behaviors are not implemented.
+
+   So, if not saturating, we assume modular wrap-around (see Annex E.4 Modwrap
+   overflow).  */
+
+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
+
 /* Nonzero if TYPE represents a scalar floating-point type.  */
 
 #define SCALAR_FLOAT_TYPE_P(TYPE) (TREE_CODE (TYPE) == REAL_TYPE)
-- 
1.9.1


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]