Bug 58742

Summary: pointer arithmetic simplification
Product: gcc Reporter: Marc Glisse <glisse>
Component: middle-endAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: normal CC: paulo
Priority: P3 Keywords: missed-optimization
Version: 4.9.0   
Target Milestone: ---   
Host: Target:
Build: Known to work: 4.9.0
Known to fail: Last reconfirmed: 2013-10-21 00:00:00

Description Marc Glisse 2013-10-15 21:47:49 UTC
int* f(int*b,int*e){
  return b+(e-b);
}

gives the following .optimized dump:

  e.0_2 = (long int) e_1(D);
  b.1_4 = (long int) b_3(D);
  _5 = e.0_2 - b.1_4;
  _6 = _5 /[ex] 4;
  _7 = (long unsigned int) _6;
  _8 = _7 * 4;
  _9 = b_3(D) + _8;
  return _9;

I believe the desired result is obvious enough...

At the asm level, this gives:

	subq	%rdi, %rsi
	movq	%rsi, %rax
	andq	$-4, %rax
	addq	%rdi, %rax

which seems to indicate that the exactness of the division was somehow lost (otherwise the andq would disappear, and sub+add could then be simplified).
Comment 1 Marc Glisse 2013-10-18 11:57:17 UTC
Interestingly, clang-3.4 and icc-13.1.3 don't fare any better than gcc here.
Comment 2 Richard Biener 2013-10-21 09:06:57 UTC
I'll have a look.
Comment 3 Richard Biener 2013-10-21 09:24:34 UTC
This case is a signed exact division followed by an unsigned multiply.  We do
that to avoid introducing undefined signed overflow.

signed exact division and unsigned multiply still cancel out though, handling
of these is in extract_muldiv (ugh).  I'm not going to enhance that but
pattern match this case.  Testing

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c    (revision 203885)
+++ gcc/fold-const.c    (working copy)
@@ -11002,6 +11002,13 @@ fold_binary_loc (location_t loc,
                                fold_build2_loc (loc, MULT_EXPR, type,
                                             build_int_cst (type, 2) , arg1));
 
+         /* ((T) (X /[ex] C)) * C cancels out if the conversion is
+            sign-changing only.  */
+         if (TREE_CODE (arg1) == INTEGER_CST
+             && TREE_CODE (arg0) == EXACT_DIV_EXPR
+             && operand_equal_p (arg1, TREE_OPERAND (arg0, 1), 0))
+           return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
+
          strict_overflow_p = false;
          if (TREE_CODE (arg1) == INTEGER_CST
              && 0 != (tem = extract_muldiv (op0, arg1, code, NULL_TREE,

note that this simplification worked in 4.1.2 but stopped working in 4.2.0.
Likely a wrong-code fix for extract_muldiv disabled this case (the way
that function works makes special casing a pattern like above impossible).
Comment 4 Richard Biener 2013-10-21 09:38:48 UTC
The patch will fix the regression part, to be left to optimize is the
pointer offset association bits which should best be done in GIMPLE
reassoc which doesn't yet associate POINTER_PLUS_EXPR yet.  It could
do that with "transparently" handling

  ptr p+ (offset + ...)

as

  (sizetype) ptr + (offset + ...)

and in the final chain, if ptr prevailed, associate it first and
re-instantiate the POINTER_PLUS_EXPR, otherwise keep the sizetype
result and convert it to the pointer type (sizetype arith has
wrapping overflow so you cannot simply take any pointer in the
addition chain as the base for a POINTER_PLUS_EXRP).
Comment 5 Richard Biener 2013-10-21 11:34:07 UTC
Author: rguenth
Date: Mon Oct 21 11:34:04 2013
New Revision: 203890

URL: http://gcc.gnu.org/viewcvs?rev=203890&root=gcc&view=rev
Log:
2013-10-21  Richard Biener  <rguenther@suse.de>

	PR middle-end/58742
	* fold-const.c (fold_binary_loc): Fold ((T) (X /[ex] C)) * C
	to (T) X for sign-changing conversions (or no conversion).

	* c-c++-common/fold-divmul-1.c: New testcase.

Added:
    trunk/gcc/testsuite/c-c++-common/fold-divmul-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog
Comment 6 Marc Glisse 2013-10-22 08:25:54 UTC
Thanks. For reference, as noted by Jakub and confirmed by Richard, we will also need at some point a gimple version for:

int *
fx (int *b, int *e)
{
  ptrdiff_t p = e - b;
  return b + p;
}

(or s/ptrdiff_t/long long/, etc.)
Comment 7 Richard Biener 2013-10-23 11:42:52 UTC
Regression fixed on trunk sofar.
Comment 8 Richard Biener 2013-10-24 13:28:35 UTC
(In reply to Richard Biener from comment #4)
> The patch will fix the regression part, to be left to optimize is the
> pointer offset association bits which should best be done in GIMPLE
> reassoc which doesn't yet associate POINTER_PLUS_EXPR yet.  It could
> do that with "transparently" handling
> 
>   ptr p+ (offset + ...)
> 
> as
> 
>   (sizetype) ptr + (offset + ...)
> 
> and in the final chain, if ptr prevailed, associate it first and
> re-instantiate the POINTER_PLUS_EXPR, otherwise keep the sizetype
> result and convert it to the pointer type (sizetype arith has
> wrapping overflow so you cannot simply take any pointer in the
> addition chain as the base for a POINTER_PLUS_EXRP).

Not that easy as 'ptr' and '(sizetype) ptr' are not trivially computed
equal by reassoc (it relies on CSE and has no value-numbering on its own).
Comment 9 Richard Biener 2013-11-18 15:13:17 UTC
Author: rguenth
Date: Mon Nov 18 15:13:14 2013
New Revision: 204965

URL: http://gcc.gnu.org/viewcvs?rev=204965&root=gcc&view=rev
Log:
2013-11-18  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2013-10-21  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/58794
	* fold-const.c (operand_equal_p): Compare FIELD_DECL operand
	of COMPONENT_REFs with OEP_CONSTANT_ADDRESS_OF left in place.

	* c-c++-common/torture/pr58794-1.c: New testcase.
	* c-c++-common/torture/pr58794-2.c: Likewise.

	2013-10-21  Richard Biener  <rguenther@suse.de>

	PR middle-end/58742
	* fold-const.c (fold_binary_loc): Fold ((T) (X /[ex] C)) * C
	to (T) X for sign-changing conversions (or no conversion).

	* c-c++-common/fold-divmul-1.c: New testcase.

	2013-11-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/58653
	* tree-predcom.c (ref_at_iteration): Rewrite to generate
	a MEM_REF.
	(prepare_initializers_chain): Adjust.

	* gcc.dg/tree-ssa/predcom-6.c: New testcase.
	* gcc.dg/tree-ssa/predcom-7.c: Likewise.

	PR tree-optimization/59047
	* tree-predcom.c (ref_at_iteration): Handle bitfield accesses
	properly.

	* gcc.dg/torture/pr59047.c: New testcase.

	2013-10-15  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/58143
	* tree-ssa-loop-im.c (arith_code_with_undefined_signed_overflow):
	New function.
	(rewrite_to_defined_overflow): Likewise.
	(move_computations_dom_walker::before_dom): Rewrite stmts
	with undefined signed overflow that are not always executed
	into unsigned arithmetic.

	* gcc.dg/torture/pr58143-1.c: New testcase.
	* gcc.dg/torture/pr58143-2.c: Likewise.
	* gcc.dg/torture/pr58143-3.c: Likewise.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/c-c++-common/fold-divmul-1.c
    branches/gcc-4_8-branch/gcc/testsuite/c-c++-common/torture/pr58794-1.c
    branches/gcc-4_8-branch/gcc/testsuite/c-c++-common/torture/pr58794-2.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr58143-1.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr58143-2.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr58143-3.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr59047.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/tree-ssa/predcom-6.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/tree-ssa/predcom-7.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/fold-const.c
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_8-branch/gcc/tree-predcom.c
    branches/gcc-4_8-branch/gcc/tree-ssa-loop-im.c
Comment 10 Richard Biener 2013-11-18 15:15:25 UTC
Fixed for 4.8.3, not backporting to 4.7.
Comment 11 Marc Glisse 2013-11-18 15:21:36 UTC
Er, we are still missing a gimple version of that, aren't we? Or would you prefer a different PR?
Comment 12 Richard Biener 2013-11-18 15:33:46 UTC
Ah yes, sorry.  I consider the regression fixed, we can track the original
optimization separately as well as fixing the regression in a more "gimple way".
Comment 13 Richard Biener 2013-11-20 10:40:41 UTC
*** Bug 59209 has been marked as a duplicate of this bug. ***
Comment 14 Marc Glisse 2013-11-20 10:57:31 UTC
So we don't forget it, PR 59209 asks to simplify:

(ptr+size)-ptr to size

not just:

ptr1+(ptr2-ptr1) to ptr2
Comment 15 Marc Glisse 2014-01-21 12:15:06 UTC
Another example: http://stackoverflow.com/q/21253690/1918193

where we have (-DVERSION=2):
  _128 = img$_M_impl$_M_start_130 + 4000000;
  pretmp_146 = (long intD.12) _128;
  pretmp_145 = (long intD.12) img$_M_impl$_M_start_130;
  pretmp_76 = pretmp_146 - pretmp_145;
  pretmp_121 = pretmp_76 /[ex] 4;
  pretmp_120 = (size_typeD.24047) pretmp_121;

We miss that pretmp_120 is a constant, VRP thus fails to eliminate the range checks, vectorization doesn't happen, and the code is more that 4 times slower than it should be.

If the desired reassoc version is hard, a simple forwprop pattern matching would already go a long way to alleviate this issue.
Comment 16 rguenther@suse.de 2014-01-28 10:13:00 UTC
On Tue, 21 Jan 2014, glisse at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58742
> 
> --- Comment #15 from Marc Glisse <glisse at gcc dot gnu.org> ---
> Another example: http://stackoverflow.com/q/21253690/1918193
> 
> where we have (-DVERSION=2):
>   _128 = img$_M_impl$_M_start_130 + 4000000;
>   pretmp_146 = (long intD.12) _128;
>   pretmp_145 = (long intD.12) img$_M_impl$_M_start_130;
>   pretmp_76 = pretmp_146 - pretmp_145;
>   pretmp_121 = pretmp_76 /[ex] 4;
>   pretmp_120 = (size_typeD.24047) pretmp_121;
> 
> We miss that pretmp_120 is a constant, VRP thus fails to eliminate the range
> checks, vectorization doesn't happen, and the code is more that 4 times slower
> than it should be.
> 
> If the desired reassoc version is hard, a simple forwprop pattern matching
> would already go a long way to alleviate this issue.

I agree.  I have a patch.
Comment 17 Richard Biener 2014-01-28 14:54:24 UTC
Author: rguenth
Date: Tue Jan 28 14:53:52 2014
New Revision: 207194

URL: http://gcc.gnu.org/viewcvs?rev=207194&root=gcc&view=rev
Log:
2014-01-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/58742
	* tree-ssa-forwprop.c (associate_plusminus): Handle
	pointer subtraction of the form (T)(P + A) - (T)P.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-forwprop.c
Comment 18 Richard Biener 2014-01-28 14:55:50 UTC
I'll tackle the EXACT_DIV_EXPR / MULT_EXPR cancellation next.  Seems easy enough.
Comment 19 Richard Biener 2014-01-28 15:24:17 UTC
The original identity will remain.  That is,

;; Function fx (fx, funcdef_no=0, decl_uid=1743, symbol_order=0)

fx (int * b, int * e)
{
  long int e.0_2;
  long int b.1_4;
  long int _5;
  long unsigned int _6;
  int * _7;

  <bb 2>:
  e.0_2 = (long int) e_1(D);
  b.1_4 = (long int) b_3(D);
  _5 = e.0_2 - b.1_4;
  _6 = (long unsigned int) _5;
  _7 = b_3(D) + _6;
  return _7;

}

that is to be pattern-matched separately, in associate_pointerplus.  Of course
that's only a piece of associating as b+5+(e-b) will then still not be
optimized.
Comment 20 Richard Biener 2014-01-29 11:09:32 UTC
Author: rguenth
Date: Wed Jan 29 11:08:59 2014
New Revision: 207232

URL: http://gcc.gnu.org/viewcvs?rev=207232&root=gcc&view=rev
Log:
2014-01-29  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/58742
	* tree-ssa-forwprop.c (associate_plusminus): Return true
	if we changed sth, defer EH cleanup to ...
	(ssa_forward_propagate_and_combine): ... here.  Call simplify_mult.
	(simplify_mult): New function.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-forwprop.c
Comment 21 Richard Biener 2014-01-29 12:25:02 UTC
Ok, I have a patch that does the remaining (including the duplicate bug).  But
for the (p + sz) - p case it can only optimize the sizeof (*p) == 1 without
range information - for example for

__SIZE_TYPE__
fx (int *a, __SIZE_TYPE__ sz)
{
  int *b = a + sz;
  return b - a;
}

we get

fx (int * a, long unsigned int sz)
{
  int * b;
  long unsigned int _2;
  long int _7;
  long int _8;
  long unsigned int _9;

  <bb 2>:
  _2 = sz_1(D) * 4;
  _7 = (long int) _2;
  _8 = _7 /[ex] 4;
  _9 = (long unsigned int) _8;
  return _9;

}

as result which is basically (sz * 4) /[ex] 4 as we don't know whether
the multiplication by 4 overflows (well, the C language may say it
doesn't but the IL does not reflect this).  If we make 'sz' a signed
int then VRP could later optimize this (but it doesn't currently),
or forwprop could use range information.  On RTL we manage to optimize
the signed int input case.
Comment 22 Richard Biener 2014-01-29 14:46:16 UTC
Author: rguenth
Date: Wed Jan 29 14:45:44 2014
New Revision: 207239

URL: http://gcc.gnu.org/viewcvs?rev=207239&root=gcc&view=rev
Log:
2014-01-29  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/58742
	* tree-ssa-forwprop.c (associate_pointerplus): Rename to
	associate_pointerplus_align.
	(associate_pointerplus_diff): New function.
	(associate_pointerplus): Likewise.  Call associate_pointerplus_align
	and associate_pointerplus_diff.

	* gcc.dg/pr58742-1.c: New testcase.
	* gcc.dg/pr58742-2.c: Likewise.
	* gcc.dg/pr58742-3.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/pr58742-1.c
    trunk/gcc/testsuite/gcc.dg/pr58742-2.c
    trunk/gcc/testsuite/gcc.dg/pr58742-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-forwprop.c
Comment 23 Richard Biener 2014-01-29 14:46:24 UTC
Fixed for 4.9.
Comment 24 Marc Glisse 2014-01-31 17:14:52 UTC
Thank you.
Sadly, for the example in comment #15, this is not quite enough, I need to add forwprop+ccp right before the VRP1 pass (and then the range check is eliminated, the vectorizer works and perfs are the same as without range checking). Indeed, we learn that size is (start+4000000)-start quite late (need to inline, look through mem_refs, etc -> FRE2) so the previous forwprop pass is too early. VRP2 is too late if we hope to vectorize, and in any case it fails to remove the range checks, because it is confused by the new shape of the loops (possibly related to PR 25643, or not). The VRP2 failure looks funny with these consecutive lines:

  # ivtmp.80_92 = PHI <ivtmp.80_53(9), ivtmp.80_83(8)>
  # RANGE [10101, 989898] NONZERO 0x000000000000fffff
  _23 = ivtmp.80_92;
  if (ivtmp.80_92 > 999999)

Really, we don't know that the comparison returns false?


For the overflow in sizeof(*p) * sz, would it make sense to have the front-end generate, when it sees p+sz: if((long)sz>LONG_MAX/sizeof(*p)) __builtin_unreachable() (or abort or a sanitizer call depending on options), and a similar check for large negative values? It feels very heavy for such a common operation, but if the FE is the only one with the information, I am not sure how else to pass it down to gimple.


I might file a low priority enhancement PR about extending reassoc to pointers, that would still cover some cases (and it wouldn't make the forwprop transformation useless because of single-use restrictions).
Comment 25 Richard Biener 2014-02-03 09:28:47 UTC
(In reply to Marc Glisse from comment #24)
> Thank you.
> Sadly, for the example in comment #15, this is not quite enough, I need to
> add forwprop+ccp right before the VRP1 pass (and then the range check is
> eliminated, the vectorizer works and perfs are the same as without range
> checking).

VERSION=0 and VERSION=1 are the same speed for me now, VERSION=2 is a lot
slower still.

> Indeed, we learn that size is (start+4000000)-start quite late
> (need to inline, look through mem_refs, etc -> FRE2) so the previous
> forwprop pass is too early.

Yeah, the issue is that while FRE does some expression simplification it
doesn't wire into a common gimple pattern matcher (something I'd like to
fix for 4.10).  That is, the simplification forwprop performs should be
done by FRE already.  See tree-ssa-sccvn.c:simplify_binary_expression.

> VRP2 is too late if we hope to vectorize, and in
> any case it fails to remove the range checks, because it is confused by the
> new shape of the loops (possibly related to PR 25643, or not). The VRP2
> failure looks funny with these consecutive lines:
> 
>   # ivtmp.80_92 = PHI <ivtmp.80_53(9), ivtmp.80_83(8)>
>   # RANGE [10101, 989898] NONZERO 0x000000000000fffff
>   _23 = ivtmp.80_92;
>   if (ivtmp.80_92 > 999999)
> 
> Really, we don't know that the comparison returns false?

Well, _23 is simply dead at this point and VRP computed _92 to be
varying.

> 
> For the overflow in sizeof(*p) * sz, would it make sense to have the
> front-end generate, when it sees p+sz: if((long)sz>LONG_MAX/sizeof(*p))
> __builtin_unreachable() (or abort or a sanitizer call depending on options),
> and a similar check for large negative values? It feels very heavy for such
> a common operation, but if the FE is the only one with the information, I am
> not sure how else to pass it down to gimple.

From the no-undefined-overflow branch I'd take the idea of adding op
variants with known no overflow.  That is, add MULTNV_EXPR, PLUSNV_EXPR,
MINUSNV_EXPR that can be used on unsigned types, too (you'd of course
have to define what overflow means there - if a - b does not overflow
then a + (-b) will - negate of x will always overflow if x is not zero).

The idea of no-undefined-overflow branch was to make all ops wrapping
by default (even signed type arithmetic) and make frontends explicitely
use non-overflowing ops when language semantics says they are not
overflowing.

> I might file a low priority enhancement PR about extending reassoc to
> pointers, that would still cover some cases (and it wouldn't make the
> forwprop transformation useless because of single-use restrictions).
Comment 26 Marc Glisse 2014-02-03 10:24:24 UTC
(In reply to Richard Biener from comment #25)
> VERSION=0 and VERSION=1 are the same speed for me now,

They aren't quite for me (2.5 vs 2.7) but

> VERSION=2 is a lot slower still.

that's the part I am concerned with here.
 
> Yeah, the issue is that while FRE does some expression simplification it
> doesn't wire into a common gimple pattern matcher (something I'd like to
> fix for 4.10).  That is, the simplification forwprop performs should be
> done by FRE already.  See tree-ssa-sccvn.c:simplify_binary_expression.

Ah, ok, that makes sense. I assume it would also have basic CCP-like functionality (forwprop can create constants but doesn't always fold the resulting constant operations). Looking forward to that!

> > VRP2 is too late if we hope to vectorize, and in
> > any case it fails to remove the range checks, because it is confused by the
> > new shape of the loops (possibly related to PR 25643, or not). The VRP2
> > failure looks funny with these consecutive lines:
> > 
> >   # ivtmp.80_92 = PHI <ivtmp.80_53(9), ivtmp.80_83(8)>
> >   # RANGE [10101, 989898] NONZERO 0x000000000000fffff
> >   _23 = ivtmp.80_92;
> >   if (ivtmp.80_92 > 999999)
> > 
> > Really, we don't know that the comparison returns false?
> 
> Well, _23 is simply dead at this point and VRP computed _92 to be
> varying.

Yes. I just meant that, as a hack, for 2 SSA_NAME defined in the same BB where one is a copy of the other, we could merge their range info (in both directions) and it might in this special case work around the fact that VRP2 is confused by the loop. But that would be too fragile and hackish.

> From the no-undefined-overflow branch I'd take the idea of adding op
> variants with known no overflow.  That is, add MULTNV_EXPR, PLUSNV_EXPR,
> MINUSNV_EXPR that can be used on unsigned types, too (you'd of course
> have to define what overflow means there - if a - b does not overflow
> then a + (-b) will - negate of x will always overflow if x is not zero).

Ah, yes, I'd forgotten about those. I always wondered if it is better to have many different tree codes or a single one with "options". Like MULT_EXPR with a parameter saying what happens on overflow: undefined, saturate, wrap, other (seems hard to handle "jump to this location" in the same). Or COMPARISON_EXPR with several bools telling what the return value is if a<b, a==b, a>b, one is NaN, and if it can raise exceptions (we don't have the corresponding 32 tree codes). Or the 5 DIV_EXPR variants (counting only integers). I guess it doesn't really matter.
Comment 27 rguenther@suse.de 2014-02-03 10:38:27 UTC
On Mon, 3 Feb 2014, glisse at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58742
> 
> --- Comment #26 from Marc Glisse <glisse at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #25)
> > VERSION=0 and VERSION=1 are the same speed for me now,
> 
> They aren't quite for me (2.5 vs 2.7) but
> 
> > VERSION=2 is a lot slower still.
> 
> that's the part I am concerned with here.
> 
> > Yeah, the issue is that while FRE does some expression simplification it
> > doesn't wire into a common gimple pattern matcher (something I'd like to
> > fix for 4.10).  That is, the simplification forwprop performs should be
> > done by FRE already.  See tree-ssa-sccvn.c:simplify_binary_expression.
> 
> Ah, ok, that makes sense. I assume it would also have basic CCP-like
> functionality (forwprop can create constants but doesn't always fold the
> resulting constant operations). Looking forward to that!
> 
> > > VRP2 is too late if we hope to vectorize, and in
> > > any case it fails to remove the range checks, because it is confused by the
> > > new shape of the loops (possibly related to PR 25643, or not). The VRP2
> > > failure looks funny with these consecutive lines:
> > > 
> > >   # ivtmp.80_92 = PHI <ivtmp.80_53(9), ivtmp.80_83(8)>
> > >   # RANGE [10101, 989898] NONZERO 0x000000000000fffff
> > >   _23 = ivtmp.80_92;
> > >   if (ivtmp.80_92 > 999999)
> > > 
> > > Really, we don't know that the comparison returns false?
> > 
> > Well, _23 is simply dead at this point and VRP computed _92 to be
> > varying.
> 
> Yes. I just meant that, as a hack, for 2 SSA_NAME defined in the same BB where
> one is a copy of the other, we could merge their range info (in both
> directions) and it might in this special case work around the fact that VRP2 is
> confused by the loop. But that would be too fragile and hackish.
> 
> > From the no-undefined-overflow branch I'd take the idea of adding op
> > variants with known no overflow.  That is, add MULTNV_EXPR, PLUSNV_EXPR,
> > MINUSNV_EXPR that can be used on unsigned types, too (you'd of course
> > have to define what overflow means there - if a - b does not overflow
> > then a + (-b) will - negate of x will always overflow if x is not zero).
> 
> Ah, yes, I'd forgotten about those. I always wondered if it is better to have
> many different tree codes or a single one with "options". Like MULT_EXPR with a
> parameter saying what happens on overflow: undefined, saturate, wrap, other
> (seems hard to handle "jump to this location" in the same). Or COMPARISON_EXPR
> with several bools telling what the return value is if a<b, a==b, a>b, one is
> NaN, and if it can raise exceptions (we don't have the corresponding 32 tree
> codes). Or the 5 DIV_EXPR variants (counting only integers). I guess it doesn't
> really matter.

It matters for convenience with existing code like fold-const.c
which takes decomposed expression trees.  You'd need to add a bunch
of flags there and pass them through appropriately.  Much easier
to encode it in enum tree_code directly.

Richard.