User account creation filtered due to spam.

Bug 62630 - [5/6/7/8 Regression] gcc.dg/graphite/vect-pr43423.c XFAILed
Summary: [5/6/7/8 Regression] gcc.dg/graphite/vect-pr43423.c XFAILed
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P4 normal
Target Milestone: 7.2
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, xfail
Depends on:
Blocks: graphite
  Show dependency treegraph
 
Reported: 2014-09-01 13:21 UTC by Rainer Orth
Modified: 2017-05-02 15:57 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.9.2
Known to fail: 5.0
Last reconfirmed: 2014-09-01 00:00:00


Attachments
vect dump (1.32 KB, text/plain)
2014-09-01 13:26 UTC, Rainer Orth
Details
use VRP to interpret an expr and compute its range (1.91 KB, patch)
2015-02-18 15:06 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2014-09-01 13:21:37 UTC
Since ca. 20140818 (r214099), gcc.dg/graphite/vect-pr43423.c on SPARC.  I'm
attaching the tree dump.

  Rainer
Comment 1 Rainer Orth 2014-09-01 13:26:25 UTC
Created attachment 33425 [details]
vect dump
Comment 2 ktkachov 2014-09-01 13:32:28 UTC
Also appears on arm and aarch64
Comment 3 Richard Biener 2014-11-24 13:13:25 UTC
Also x86_64.  Should be at least investigated.
Comment 4 Richard Biener 2015-01-19 13:10:22 UTC
7: note: === vect_analyze_data_refs ===
Creating dr for a[_56]
analyze_innermost:
failed: evolution of offset is not affine.
        base_address:
        offset from base address:
        constant offset from base address:
        step:
        aligned to:
        base_object: a
        Access function 0: (int) {(unsigned int) graphite_IV.5_50, +, 1}_3;

the graphive IVs are signed long
  <bb 8>:
  _46 = (signed long) mid_6(D);
...

  <bb 9>:
  graphite_IV.5_50 = MAX_EXPR <_46, 0>;
  _51 = (signed long) n_5(D);
  _52 = _51 + -1;

  <bb 10>:
  # graphite_IV.5_53 = PHI <graphite_IV.5_50(9), graphite_IV.5_54(11)>
  _56 = (int) graphite_IV.5_53;
  _55 = a[_56];
  _57 = c[_56];
  _58 = _55 + _57;
  _64 = (int) graphite_IV.5_53;
  a[_64] = _58;
  graphite_IV.5_54 = graphite_IV.5_53 + 1;
  if (graphite_IV.5_53 < _52)
    goto <bb 11>;
  else
    goto <bb 13>;

  <bb 11>:
  goto <bb 10>;

(instantiate_scev
  (instantiate_below = 9)
  (evolution_loop = 3)
  (chrec = (ssizetype) (int) {(unsigned int) graphite_IV.5_50, +, 1}_3)
  (res = (ssizetype) (int) {(unsigned int) graphite_IV.5_50, +, 1}_3))

so it appears to SCEV that the evolution may wrap.

With GCC 4.9 we choose signed int as GRAPHITE IV.

Looks like the IV type is statically determined by graphite_expression_type_precision as

/* We always try to use signed 128 bit types, but fall back to smaller types
   in case a platform does not provide types of these sizes. In the future we
   should use isl to derive the optimal type for each subexpression.  */

static int max_mode_int_precision =
  GET_MODE_PRECISION (mode_for_size (MAX_FIXED_MODE_SIZE, MODE_INT, 0));
static int graphite_expression_type_precision = 128 <= max_mode_int_precision ?
                                                128 : max_mode_int_precision;

Not sure how this doesn't end up with __int128_t on x86_64, but ...

It's obviously a bad choice to use __int128_t (or even signed long) everywhere.

Let's XFAIL this testcase if nobody fixes the underlying issue.  What does it
take to "use ISL to derive the optimal type for each subexpression"?  Is
that even possible?
Comment 5 Jakub Jelinek 2015-02-12 13:17:15 UTC
Any progress on this?  Or are we just going to xfail the testcase?
Comment 6 Richard Biener 2015-02-17 10:48:56 UTC
I think we have to XFAIL the testcase - the graphite folks seem to be gone again.
Comment 7 Mircea Namolaru 2015-02-17 20:12:37 UTC
Graphite generates MAX/MIN expressions.

I've modified Graphite to use the original types of "n" and "mid" in MIN and MAX, and to not generate the casts of "n" and "mid" to a longer signed INT before MIN/MAX, and the vectorization succeeded.

It seems that it is not a Graphite problem but a scalar evolution one. Scalar evolution is not able to handle MIN/MAX expressions in the presence of casts. Beside vectorization also further unrolling is prevented.
Comment 8 Richard Biener 2015-02-18 11:22:55 UTC
(In reply to Mircea Namolaru from comment #7)
> Graphite generates MAX/MIN expressions.
> 
> I've modified Graphite to use the original types of "n" and "mid" in MIN and
> MAX, and to not generate the casts of "n" and "mid" to a longer signed INT
> before MIN/MAX, and the vectorization succeeded.
> 
> It seems that it is not a Graphite problem but a scalar evolution one.
> Scalar evolution is not able to handle MIN/MAX expressions in the presence
> of casts. Beside vectorization also further unrolling is prevented.

Can you share a patch with that modification?  I'd like to look at the differences it makes with respect to SCEV / niter analysis.  Note that
neither SCEV nor niter analysis handle MIN/MAX_EXPRs explicitely.  It might
be that

  <bb 2>:
  if (n_5(D) > 0)
    goto <bb 4>;
...
  <bb 4>:
  _28 = (signed long) n_5(D);
  _29 = (signed long) mid_6(D);
  _30 = MIN_EXPR <_28, _29>;
  _31 = _30 > 0;
  if (_31 != 0)

can be simplified to mid_6(D) > 0 by expansion/folding in some way though
if there are no casts in the way.  Not sure.  I suppose ISL doesn't get to
know that n > 0 if the loop enters (and doesn't exploit that knowledge)?
Comment 9 Richard Biener 2015-02-18 11:34:01 UTC
Note that as far as vectorization is concerned the issue is that the IV used to
perform the memory accesses is not affine:

  <bb 6>:
  # graphite_IV.4_34 = PHI <0(5), graphite_IV.4_35(7)>
  _37 = (int) graphite_IV.4_34;
  _36 = a[_37];
  _38 = b[_37];
  _39 = _36 + _38;
  _45 = (int) graphite_IV.4_34;
  a[_45] = _39;
  graphite_IV.4_35 = graphite_IV.4_34 + 1;
  if (graphite_IV.4_34 < _33)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  goto <bb 6>;

the IV _37 is (int) { 0, +, 1 } with graphite_IV being signed long.  Thus
_37 may wrap if graphite_IV becomes larger than INT_MAX.  I don't see how
this is related to the MIN/MAX_EXPRs.  Thus it would be nice if ISL could
compute bounds for the IVs (and if we can input bounds on the original IVs,
of course).

VRP may be able to see that _33 is computed as

  if (n_5(D) > 0)
    goto <bb 4>;
...
  <bb 4>:
  _28 = (signed long) n_5(D);
  _29 = (signed long) mid_6(D);
  _30 = MIN_EXPR <_28, _29>;
  _31 = _30 > 0;
  if (_31 != 0)
    goto <bb 5>;
...
  <bb 5>:
  _32 = MIN_EXPR <_28, _29>;
  _33 = _32 + -1;

and thus _33 is [0, INT_MAX - 1] but it isn't able to do that (and VRP
runs too late anyway).  VRP misses that for _30 > 0 it can infer
that mid_6 > 0, but that's probably a too difficult thing to asses
with it working as SSA propagator and ASSERT_EXPR insertion happening
before actual propagation (it needs to know n_5 > 0 to be able to do
that insertion).  A DOM style propagation would have less issues with
deriving proper value-ranges here.
Comment 10 Mircea Namolaru 2015-02-18 11:57:21 UTC
On my Intel x86-64 platform changed in graphite-isl-ast-to-gimple.c:

- static int graphite_expression_type_precision = 128 <= max_mode_int_precision ?
-						  128 : max_mode_int_precision;
+ static int graphite_expression_type_precision = 32;

The computation are done on INT, and you get this code for basic block 4 (and
vectorization performed):

  _28 = MIN_EXPR <n_5(D), mid_6(D)>;
  _29 = _28 > 0;
  if (_29 != 0)
    goto <bb 5>;
  else
    goto <bb 8>;

In my opinion the casts introduced cause problems to scalar evolution (and you are right,
MIN/MAX are not the problem). 

I will look into two directions, and choose the quickest one fixing the regression.
1) not to generate casts in Graphite, if correctness is not affected (as in this case). 
But determining when the use of a longer size signed type is required is not so simple. 
2) modify handling of casts in scalar evolution. But I am not familiar with this code.

----- Original Message -----
> From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
> To: "mircea namolaru" <mircea.namolaru@inria.fr>
> Sent: Wednesday, February 18, 2015 12:22:55 PM
> Subject: [Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630
> 
> --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Mircea Namolaru from comment #7)
> > Graphite generates MAX/MIN expressions.
> > 
> > I've modified Graphite to use the original types of "n" and "mid" in MIN
> > and
> > MAX, and to not generate the casts of "n" and "mid" to a longer signed INT
> > before MIN/MAX, and the vectorization succeeded.
> > 
> > It seems that it is not a Graphite problem but a scalar evolution one.
> > Scalar evolution is not able to handle MIN/MAX expressions in the presence
> > of casts. Beside vectorization also further unrolling is prevented.
> 
> Can you share a patch with that modification?  I'd like to look at the
> differences it makes with respect to SCEV / niter analysis.  Note that
> neither SCEV nor niter analysis handle MIN/MAX_EXPRs explicitely.  It might
> be that
> 
>   <bb 2>:
>   if (n_5(D) > 0)
>     goto <bb 4>;
> ...
>   <bb 4>:
>   _28 = (signed long) n_5(D);
>   _29 = (signed long) mid_6(D);
>   _30 = MIN_EXPR <_28, _29>;
>   _31 = _30 > 0;
>   if (_31 != 0)
> 
> can be simplified to mid_6(D) > 0 by expansion/folding in some way though
> if there are no casts in the way.  Not sure.  I suppose ISL doesn't get to
> know that n > 0 if the loop enters (and doesn't exploit that knowledge)?
> 
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 11 Richard Biener 2015-02-18 13:03:15 UTC
Btw, graphite generates expressions like MIN_EXPR <(long)n, (long)mid> < 0
that fold does not simplify.  Adding a match.pd pattern to shorten min/max
expressions we end up with

  <bb 4>:
  _28 = MIN_EXPR <n_5(D), mid_6(D)>;
  _29 = _28 > 0;
  if (_29 != 0)
    goto <bb 5>;
  else
    goto <bb 8>;

  <bb 5>:
  _30 = MIN_EXPR <n_5(D), mid_6(D)>;
  _31 = (signed long) _30;
  _32 = _31 + -1;

but still niter analysis isn't able to constrain the max iterations to
sth that SCEV could later use to prove that (int) graphite_IV doesn't wrap.

Analyzing # of iterations of loop 2
  exit condition [0, + , 1](no_overflow) < (signed long) _30 + -1
  bounds on difference of bases: -9223372036854775808 ... 9223372036854775806
  result:
    zero if _30 <= 0
    # of iterations (unsigned long) ((signed long) _30 + -1), bounded by 9223372036854775806
/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/graphite/vect-pr43423.c:12:17: note: Symbolic number of iterations is (unsigned long) MAX_EXPR <_30, 1>

so niter analysis can be improved here as _30 is of type int and even if
_30 were INT_MIN the upper bound would be 2147483647.

We can improve that by stripping sign-/zero-extensions from vars in
bound_difference.

With those two improvements we can vectorize the first loop.
Comment 12 Richard Biener 2015-02-18 15:06:24 UTC
Created attachment 34801 [details]
use VRP to interpret an expr and compute its range

The attached patch is a prototype that tries to replace niter analysis
bound_difference value-range estimation code by VRP.  I've chosen to replace
the case with non-equal variable part only (which is what needs improvement
for this testcase), but in theory most of the code should be replaced.
The bound-using-guard code needs refinement (the niter parts go to greater
lengths, expanding simple operations and whatnot - in theory we can go up
SSA use->def chains in the recursion as well, it just has a cost).

Note that without the MIN/MAX shortening this patch doesn't help as niter
analysis doesn't present VRP with expanded enough expressions and of course
all the IL has no value-ranges associated as graphite re-wrote it completely.
Comment 13 Richard Biener 2015-02-18 15:25:39 UTC
Ah, expand_simple_operations will not expand the MIN/MAX_EXPRs.  If we change
that the patch makes data-ref analysis fail differently (we correctly can
then compute bounds for the loops!), as we still may wrap with

  (int) {(unsigned int) graphite_IV.5_50, +, 1}_3;

(non-zero initial value) even with an niter bound of 2147483646.

That said, the loop conditions and guards are set up in a funny enough way
to confuse GCC ...

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c   (revision 220784)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -1674,6 +1687,12 @@ expand_simple_operations (tree expr, tre
       ee = expand_simple_operations (e, stop);
       return fold_build2 (code, TREE_TYPE (expr), ee, e1);
 
+    case MIN_EXPR:
+    case MAX_EXPR:
+      e1 = expand_simple_operations (gimple_assign_rhs1 (stmt));
+      ee = expand_simple_operations (gimple_assign_rhs2 (stmt));
+      return fold_build2 (code, TREE_TYPE (expr), e1, ee);
+
     default:
       return expr;
     }

(not a good idea I think)

The match.pd pattern:

Index: gcc/match.pd
===================================================================
--- gcc/match.pd        (revision 220784)
+++ gcc/match.pd        (working copy)
@@ -567,6 +567,18 @@ (define_operator_list inverted_tcc_compa
       && operand_equal_p (@1, TYPE_MAX_VALUE (type), OEP_ONLY_CONST))
   @1))
 
+/* We can remove extensions of min/max operands and shorten the operation.  */
+(for minmax (min max)
+ (simplify
+  (minmax (convert @0) (convert? @1))
+  (if (INTEGRAL_TYPE_P (type)
+       && TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type)
+       && (((GENERIC && TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == TYPE_MAIN_VARIANT (TREE_TYPE (@1)))
+           || (GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))))
+           || (TREE_CODE (@1) == INTEGER_CST
+               && int_fits_type_p (@1, TREE_TYPE (@0))))
+       && TYPE_PRECISION (TREE_TYPE (@0)) == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
+   (convert (minmax @0 (convert @1))))))

should probably be restricted to apply when feeding another conversion
or a comparison (all shorten_* operations are probably profitable when
feeding comparisons as well).
Comment 14 Mircea Namolaru 2015-02-18 23:34:38 UTC
It seems to me that scalar evolution succeeds to determine
the number of iterations for the case of signed longs. Looking
in vectorization dump, first a symbolic expression for the number of 
iterations of a loop is found, and then vect_analyze_refs is entered.
The problem is that the code expect an offset of a load to be an induction variable, 
but in our case an offset is only a cast of an induction variable, like
below:

 _56 = (intD.6) graphite_IV.5_53;
 _55 = aD.1830[_56];

The offset is found not to be an affine expression, and vectorization don't
succeed. But as the offset is a cast of an induction variable, it has the same
behaviour as an induction variable even if formally is not one. It seems to me
that somehow extending the code to support casts of induction variables
will solve our this problem.
Comment 15 Richard Biener 2015-02-19 10:19:45 UTC
(In reply to Mircea Namolaru from comment #14)
> It seems to me that scalar evolution succeeds to determine
> the number of iterations for the case of signed longs. Looking
> in vectorization dump, first a symbolic expression for the number of 
> iterations of a loop is found, and then vect_analyze_refs is entered.
> The problem is that the code expect an offset of a load to be an induction
> variable, 
> but in our case an offset is only a cast of an induction variable, like
> below:
> 
>  _56 = (intD.6) graphite_IV.5_53;
>  _55 = aD.1830[_56];
> 
> The offset is found not to be an affine expression, and vectorization don't
> succeed. But as the offset is a cast of an induction variable, it has the
> same
> behaviour as an induction variable even if formally is not one.

This is not true - the cast of the induction variable is not an affine IV
as the cast introduces wrapping if the IV exceeds the range of the casted-to
type.  number-of-iteration analysis can come to the rescue here but it has
a very hard time, especially on the 2nd loop.  It would be good to improve
it anyway but we can hardly blame it solely for the problems ;)

> It seems to
> me
> that somehow extending the code to support casts of induction variables
> will solve our this problem.
Comment 16 Mircea Namolaru 2015-02-20 00:45:46 UTC
Yes, but it seems to me that the cast (not in the original code) should not
be generated at all if it could not be guaranteed that the casted-to type is larger 
enough to accommodate it. Otherwise you introduce a cast from a longer signed type
to a shorter signed one whose behaviour is undefined by the C standard and was not
in the original code.

So the cast in the following code is problematic (when
graphite_IV, a signed long is not in the range of a signed int).

       _56 = (intD.6) graphite_IV.5_53;
       _55 = aD.1830[_56];

The solution to fix this is to made Graphite not to generate
casts like this. An alternative is to infer the range of
graphite_IV like you do and remove the cast (but this seems more complicated
and risky as the analysis may not succeed and the problematic cast is not
removed).
Comment 17 joseph@codesourcery.com 2015-02-20 16:43:13 UTC
On Fri, 20 Feb 2015, mircea.namolaru at inria dot fr wrote:

> Yes, but it seems to me that the cast (not in the original code) should 
> not be generated at all if it could not be guaranteed that the casted-to 
> type is larger enough to accommodate it. Otherwise you introduce a cast 
> from a longer signed type to a shorter signed one whose behaviour is 
> undefined by the C standard and was not in the original code.

Such casts have defined semantics in GNU C (and presumably in GIMPLE).  
(In ISO C, "either the result is implementation-defined or an 
implementation-defined signal is raised" - not undefined behavior.)
Comment 18 Mircea Namolaru 2015-02-23 17:56:10 UTC
I've succeeded to explain why these casts are generated, and they seem correct.
Graphite introduces new induction variables with a larger size type (then the type of original
induction variable), to make sure that accommodate the iteration count. Otherwise an overflow 
not in the original program occurs. On the other side, to not hide overflows for computations 
using the original induction variables, you still need to use them.
This explain the casts to variables with smaller types. These cast will cause an overflow (i.e.
a value not in the range of the smaller type), only when an overflow in the original 
problem occurs. There were introduced to mimics the behaviour of the original program in case
of overflows.

So, my understanding is that there is nothing wrong with Graphite or scalar evolution. The 
vectorization succeeded because no larger size type was used,  but this was unsafe. To make it 
work again, you need some supplementary analysis to determine that casts are redundant. This is 
not a simple problem, but with Richard patch the vectorization works. Unfortunately, don't 
see other simpler solutions.

It would be possible maybe to catch in Graphite that the transformation is graphite identity,
and the original lower bounds of loops are zero. This will ensure that a larger size type is not needed
for induction variable. But seems like a modification intended to make this test works, and nothing more.

Btw, the only potential problem found may be with the code for gather in vectorization. After the attempt 
to use the vector load fails, the vectorizer detects an opportunity for a gather instruction, but as don't 
find a suitable one (this depends on architecture) vectorization fails. It seems to me that the analysis for gather don't take into account the possibility of overflows. For this test, I could modify the code and use 
as gather instruction a load vector (even this was found not to be safe). The vectorization would succeed.
But not entirely sure about this ...
Comment 19 Jakub Jelinek 2015-03-09 14:39:58 UTC
I think in any case a graphite issue is not severe enough to justify P1.
Comment 20 Richard Biener 2015-03-25 12:53:42 UTC
XFAILed.
Comment 21 Richard Biener 2015-03-25 12:54:43 UTC
Author: rguenth
Date: Wed Mar 25 12:54:12 2015
New Revision: 221662

URL: https://gcc.gnu.org/viewcvs?rev=221662&root=gcc&view=rev
Log:
2015-03-25  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/62630
	* gcc.dg/graphite/vect-pr43423.c: XFAIL.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/graphite/vect-pr43423.c
Comment 22 Eric Botcazou 2015-12-08 17:47:34 UTC
This is target independent.
Comment 23 Jakub Jelinek 2016-04-27 10:57:13 UTC
GCC 6.1 has been released.
Comment 24 Jakub Jelinek 2017-05-02 15:57:49 UTC
GCC 7.1 has been released.