Bug 69526 - ivopts candidate strangeness
Summary: ivopts candidate strangeness
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.0
: P3 minor
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2016-01-28 08:52 UTC by rdapp
Modified: 2021-12-22 08:44 UTC (History)
2 users (show)

See Also:
Host:
Target: s390, x86
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-01-28 00:00:00


Attachments
Tentative patch for VRP and loop-doloop (4.52 KB, patch)
2016-03-31 11:56 UTC, rdapp
Details | Diff
VRP/match.pd patch (2.72 KB, patch)
2016-05-20 14:30 UTC, rdapp
Details | Diff
Updated patch and test, match.pd/VRP (2.99 KB, patch)
2016-06-27 13:22 UTC, rdapp
Details | Diff
Updated patch and test, math.pd/VRP (3.35 KB, patch)
2016-07-11 13:17 UTC, rdapp
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description rdapp 2016-01-28 08:52:51 UTC
While inspecting loop generation code on s390, I saw ivopts choose an IV candidate which does something peculiar. On x86 the same candidate is never chosen because of different cost estimations, yet the behavior can be evoked on x86, through "forcing" GCC to use the same candidate as on s390. I did this to confirm my suspicions and I'll show the x86 version here:

This source

void v(unsigned long *in, unsigned long *out, unsigned int n)
{
  int i;

  for (i = 0; i < n; i++)
  {
    out[i] = in[i];
  }
}

results in the following assembly, when ivopts candidate 7 is used:

v:
  testl %edx, %edx
  je .L1
  leal -1(%rdx), %eax
  leaq 8(,%rax,8), %rcx
  xorl %eax, %eax
.L3:
  movq (%rdi,%rax), %rdx
  movq %rdx, (%rsi,%rax)
  addq $8, %rax
  cmpq %rcx, %rax
  jne .L3
.L1:
  rep ret

Should the following be happening?

  leal -1(%rdx), %eax
  leaq 8(,%rax,8), %rcx

i.e. %eax = n - 1
     %rcx = 8 * (n + 1)

The pattern can already be observed in ivopts' GIMPLE:

<bb 4>:
_15 = n_5(D) + 4294967295;
_2 = (sizetype) _15;
_1 = _2 + 1;
_24 = _1 * 8;

Why do we need the - 1 and subsequent + 1 when the %eax is zeroed afterwards anyway? Granted, this exact situation won't ever be observed on x86 as another ivopts candidate is chosen but on s390 this situation will amount to three instructions.

If I see it correctly, the n - 1 comes from estimating the number of loop iterations, while the +1 is then correctly added by cand_value_at() because the loop counter is incremented before the exit test. Perhaps this is intended behavior and there is nothing wrong with it?

Regards
 Robin
Comment 1 Richard Biener 2016-01-28 09:20:28 UTC
_15 = n_5(D) + 4294967295;
_2 = (sizetype) _15;
_1 = _2 + 1;

cannot be simplified to (sizetype)n_5(D) because n_5 - 1 might underflow
and then adding 1 in a wider type doesn't cancel the operation.

So yes, this is expected but unfortunate.  I don't think we do any costing
on the code we insert on the loop entry edge to break ties between
equal cost candidates.  Well, if that is what happens.

The real issue above is that we sometimes end up using sizetype variables
for no good reason.
Comment 2 rdapp 2016-01-28 09:31:25 UTC
Ok, that's sensible but why is the - 1 necessary in the first place?

n_5 - 1 can only underflow if n_5 == 0 which is checked by

testl %edx, %edx

before. This is in a previous basic block, unfortunately, and will not be regarded by ivopts then (because strictly speaking, it has no influence on the induction variable)?
Comment 3 bin cheng 2016-01-28 10:10:52 UTC
(In reply to rdapp from comment #2)
> Ok, that's sensible but why is the - 1 necessary in the first place?
> 
> n_5 - 1 can only underflow if n_5 == 0 which is checked by
> 
> testl %edx, %edx
> 
> before. This is in a previous basic block, unfortunately, and will not be
> regarded by ivopts then (because strictly speaking, it has no influence on
> the induction variable)?

yes, one way out is use loop preconditions to improve range information and then use that info to prove non-overflow.  This has already been applied to loop niter analysis and scev overflow check.  I once opened a PR about this in items of loop header bloated issue.  (will update the PR number cause I don't have it now).  Another point is I think Richard planned to improve range analysis wrto control flow?  It could be helpful here in items of compilation time.
Comment 4 bin cheng 2016-01-28 11:43:12 UTC
(In reply to rdapp from comment #2)
> Ok, that's sensible but why is the - 1 necessary in the first place?
> 
> n_5 - 1 can only underflow if n_5 == 0 which is checked by
> 
> testl %edx, %edx
> 
> before. This is in a previous basic block, unfortunately, and will not be
> regarded by ivopts then (because strictly speaking, it has no influence on
> the induction variable)?

Actually ivopts does take preheader costs into consideration.  Just the cost needs to be amortized over number of loop iterations when compiling for speed.  After that, the cost gets more inaccurate.  For example, instruction cost is generally 4, while average loop niters is 5, the cost becomes 0 in this way.
Comment 5 rdapp 2016-01-28 15:12:23 UTC
I still don't quite get why the "n - 1" is needed. Do we need it to possibly have an exit condition like

 if (i != n-1) or 
 if (i <= n-1)?

Am I missing something really obvious here?
Comment 6 bin cheng 2016-01-28 15:31:45 UTC
(In reply to rdapp from comment #5)
> I still don't quite get why the "n - 1" is needed. Do we need it to possibly
> have an exit condition like
> 
>  if (i != n-1) or 
>  if (i <= n-1)?
> 
> Am I missing something really obvious here?

The dump before ivopt is as below:

  <bb 2>:
  if (n_5(D) != 0)
    goto <bb 4>;
  else
    goto <bb 3>;

  <bb 3>:
  return;

  <bb 4>:

  <bb 5>:
  # i_18 = PHI <0(4), i_14(7)>
  _6 = (long unsigned int) i_18;
  _7 = _6 * 8;
  _9 = out_8(D) + _7;
  _11 = in_10(D) + _7;
  _12 = *_11;
  *_9 = _12;
  i_14 = i_18 + 1;
  i.0_4 = (unsigned int) i_14;
  if (i.0_4 < n_5(D))
    goto <bb 7>;
  else
    goto <bb 6>;

  <bb 6>:
  goto <bb 3>;

  <bb 7>:
  goto <bb 5>;


It comes from loop niter analysis, as in may_eliminate_iv, we have:

(gdb) call debug_generic_expr(desc->niter)
n_5(D) + 4294967295
Comment 7 rdapp 2016-01-28 15:40:22 UTC
(In reply to amker from comment #6)

> It comes from loop niter analysis, as in may_eliminate_iv, we have:
> 
> (gdb) call debug_generic_expr(desc->niter)
> n_5(D) + 4294967295

and this is correct? I.e. the number of iterations is n - 1? I'd naively expect 
  desc->niter = n_5(D)

(Again, it might be necessary for some reason escapes me currently)
Comment 8 bin cheng 2016-01-28 15:47:42 UTC
(In reply to rdapp from comment #7)
> (In reply to amker from comment #6)
> 
> > It comes from loop niter analysis, as in may_eliminate_iv, we have:
> > 
> > (gdb) call debug_generic_expr(desc->niter)
> > n_5(D) + 4294967295
> 
> and this is correct? I.e. the number of iterations is n - 1? I'd naively
> expect 
>   desc->niter = n_5(D)
> 
> (Again, it might be necessary for some reason escapes me currently)

It confused me too, but niter in tree_niter_desc means the number of time latch is executed, as commented in tree-ssa-loop.h:

  tree niter;		/* The expression giving the number of iterations of
			   a loop (provided that assumptions == true and
			   may_be_zero == false), more precisely the number
			   of executions of the latch of the loop.  */
Comment 9 Richard Biener 2016-03-18 09:56:31 UTC
Note that VRP figures out that

_15 = n_18 + 4294967295;
Found new range for _15: [0, 4294967294]

(so no underflow)

_2 = (sizetype) _15;
Found new range for _2: [0, 4294967294]
_1 = _2 + 1;
Found new range for _1: [1, 4294967295]

but VRP doesn't have any transform that would take advantage of this
(like computing _2 + 1 in unsigned int and only then casting to sizetype).
Comment 10 rdapp 2016-03-31 11:56:30 UTC
Created attachment 38144 [details]
Tentative patch for VRP and loop-doloop

Meanwhile I found the time to implement a pattern for VRP which seems to do the job on tree level. Although larger now (and not many cases are covered yet) I suppose I agree that it fits better in VRP than in ivopts and might even catch more cases than before.

The fix on RTL level is nonetheless necessary since doloop calculates the number of iterations independently of the code before and also adds a +1.
I cleaned up the fix a little by introducing niterp1_expr (niter plus 1) but there's still room for improvement.

No regressions so far, improvement ideas welcome.
Comment 11 Richard Biener 2016-03-31 12:25:21 UTC
(In reply to rdapp from comment #10)
> Created attachment 38144 [details]
> Tentative patch for VRP and loop-doloop
> 
> Meanwhile I found the time to implement a pattern for VRP which seems to do
> the job on tree level. Although larger now (and not many cases are covered
> yet) I suppose I agree that it fits better in VRP than in ivopts and might
> even catch more cases than before.
> 
> The fix on RTL level is nonetheless necessary since doloop calculates the
> number of iterations independently of the code before and also adds a +1.
> I cleaned up the fix a little by introducing niterp1_expr (niter plus 1) but
> there's still room for improvement.
> 
> No regressions so far, improvement ideas welcome.

It's a bit very ad-hoc (looking at the VRP part).  I think what we'd need to
do is

  a) in VRP have a general helper telling you whether UNOP A or A BINOP B
  overflows/underflows

  b) in the existing patterns that would handle this and similar situations
  in match.pd:

    /* (A +- CST) +- CST -> A + CST  */
    (for outer_op (plus minus)
     (for inner_op (plus minus)
...

  use such a generic predicate to handle cases with (A +- CST) wrapped
  inside a conversion (it should already be fine to handle this case
  if A +- CST cannot overflow because TYPE_OVERFLOW_UNDEFINED).

  c) use fold_stmt (or gimple_simplify directly) in vrp_fold_stmt to
  invoke the folding machinery - eventually simply unconditionally
  call fold_stmt in tree-ssa-propagate.c:substitute_and_fold_dom_walker::before_dom_children rather
  than only when we progagated into a stmt.

Even when doing the "ad-hoc" thing initially splitting out a) (implementing
only a subset of operations) would be a useful thing to do.  Note that
during VRP itself you can use its own lattice while when not in VRP you'd
have to rely on SSA_NAME_RANGE_INFO.
Comment 12 Richard Biener 2016-03-31 12:30:08 UTC
So, we don't optimize at -O2

long foo (int a)
{
  return (long)(a + 1) - 1;
}

Note that (T)(A +- CST1) +- CST2 -> (T)A +- CST3 thus the combined
addition in general needs to be done in the larger type _unless_ the
absolute constant value of CST3 is equal or smaller than CST1.

Handling the special case of CST1 == CST2 might be easiest here.
Comment 13 rdapp 2016-05-20 14:30:21 UTC
Created attachment 38535 [details]
VRP/match.pd patch

Found some time again and hacked together a fix for match.pd and VRP. The patch exports a VRP function no_range_overflow that is called from match.pd via tree.c. I realize that VRP didn't export any functions so far and if there's a better/idiomatic way to do it, please tell me. Further improvements or "code smell" hints very welcome.

Currently this should fix cases like

long foo(int i)
{
  return (long) (i + 1) - 2;
}

Commutativity of a PLUS_EXPR is not yet being exploited and I'm relying on TYPE_OVERFLOW_UNDEFINED so far. The overflow checking could also be done more elegantly I suppose.

This diff is extracted from the full patch which also includes the RTL level fix which I haven't included here since it has not changed. I hope the splitting didn't introduce new bugs but there are no regressions on s390x for the full patch.
Comment 14 rguenther@suse.de 2016-05-23 09:40:46 UTC
On Fri, 20 May 2016, rdapp at linux dot vnet.ibm.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526
> 
> --- Comment #13 from rdapp at linux dot vnet.ibm.com ---
> Created attachment 38535 [details]
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38535&action=edit
> VRP/match.pd patch
> 
> Found some time again and hacked together a fix for match.pd and VRP. The patch
> exports a VRP function no_range_overflow that is called from match.pd via
> tree.c. I realize that VRP didn't export any functions so far and if there's a
> better/idiomatic way to do it, please tell me. Further improvements or "code
> smell" hints very welcome.
> 
> Currently this should fix cases like
> 
> long foo(int i)
> {
>   return (long) (i + 1) - 2;
> }
> 
> Commutativity of a PLUS_EXPR is not yet being exploited and I'm relying on
> TYPE_OVERFLOW_UNDEFINED so far. The overflow checking could also be done more
> elegantly I suppose.
> 
> This diff is extracted from the full patch which also includes the RTL level
> fix which I haven't included here since it has not changed. I hope the
> splitting didn't introduce new bugs but there are no regressions on s390x for
> the full patch.

So I think you should use get_range_info () instead of exporting stuff
from VRP.  Plus change the pattern to be more explicit about the
inner operation (the comment suggests you care about +- only). Thus

(for outer_op (plus minus)
 (for inner_op (plus minus)
  (simplify
   (outer_op (convert (inner_op SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))

VRP only has ranges for types that would combine with INTEGER_CSTs,
not other constants so using INTEGER_CST is better here.  The above
has the advantage that you don't need to lookup SSA def stmts in
your manually written code.  For the range of @0 you'd then do

  (with
    {
      wide_int rmin, rmax;
      enum value_range_type rtype = get_range_info (@0, &rmin, &rmax);

and you'd use wide-int operations to check for overflow rather than
const_binop and looking for TREE_OVERFLOW.  Both wi::add and wi::sub
have variants with a overflow flag.

As you rely on range-info the transform is indeed GIMPLE only but
please follow other cases and wrap the whole pattern in

#if GIMPLE
#endif

so it does not produce dead code in generic-match.c
Comment 15 rdapp 2016-06-07 14:03:37 UTC
Thanks for the suggestions. The omission of the inner op was actually more or less on purpose as I intended to capture the
(convert @inner)
in order to access @inner's value range as a whole rather than re-calculating what VRP already figured out.

Is there a simple method to access @inner when
capturing
(outer_op (convert (inner_op SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))
                   ^---------------------------------^
                                 @inner
or, even-better both, @inner as well as @0 and @1, at the same time? (Apart from looking through use stmts)

In my case VRP determines @0's range as ~[0,0] and @inner's as [0,INT_MAX-1]. VRP probably canonicalized the anti-range to a normal range and performed other simplifications in order to arrive at [0,INT_MAX-1]. If I cannot get @inner's VRP info with the capture above, would there be another way to obtain it?

The TREE_OVERFLOW/const_binop code is copied from the (A +- CST) +- CST -> A + CST pattern and the result of const_binop is used in the final simplification. The overflow check could of course be done via wi::add/sub but wouldn't I just replicate what int_const_binop_1 is doing on top of it when I need the result anyway?
Comment 16 Marc Glisse 2016-06-07 14:51:02 UTC
(In reply to rdapp from comment #15)
> Is there a simple method to access @inner when
> capturing
> (outer_op (convert (inner_op SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))
>                    ^---------------------------------^
>                                  @inner
> or, even-better both, @inner as well as @0 and @1, at the same time? (Apart
> from looking through use stmts)

(outer_op (convert (inner_op@3 SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))

Does @3 do what you want here? (I didn't follow the discussion)
Comment 17 rdapp 2016-06-07 15:22:55 UTC
(In reply to Marc Glisse from comment #16)
> (In reply to rdapp from comment #15)
> > Is there a simple method to access @inner when
> > capturing
> > (outer_op (convert (inner_op SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))
> >                    ^---------------------------------^
> >                                  @inner
> > or, even-better both, @inner as well as @0 and @1, at the same time? (Apart
> > from looking through use stmts)
> 
> (outer_op (convert (inner_op@3 SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))
> 
> Does @3 do what you want here? (I didn't follow the discussion)

looks good, thanks :)
Comment 18 rguenther@suse.de 2016-06-08 14:27:33 UTC
On Tue, 7 Jun 2016, rdapp at linux dot vnet.ibm.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526
> 
> --- Comment #15 from rdapp at linux dot vnet.ibm.com ---
> Thanks for the suggestions. The omission of the inner op was actually more or
> less on purpose as I intended to capture the
> (convert @inner)
> in order to access @inner's value range as a whole rather than re-calculating
> what VRP already figured out.
> 
> Is there a simple method to access @inner when
> capturing
> (outer_op (convert (inner_op SSA_NAME@0 INTEGER_CST@1)) INTEGER_CST@2))
>                    ^---------------------------------^
>                                  @inner
> or, even-better both, @inner as well as @0 and @1, at the same time? (Apart
> from looking through use stmts)
> 
> In my case VRP determines @0's range as ~[0,0] and @inner's as [0,INT_MAX-1].
> VRP probably canonicalized the anti-range to a normal range and performed other
> simplifications in order to arrive at [0,INT_MAX-1]. If I cannot get @inner's
> VRP info with the capture above, would there be another way to obtain it?
> 
> The TREE_OVERFLOW/const_binop code is copied from the (A +- CST) +- CST -> A +
> CST pattern and the result of const_binop is used in the final simplification.
> The overflow check could of course be done via wi::add/sub but wouldn't I just
> replicate what int_const_binop_1 is doing on top of it when I need the result
> anyway?

int_const_binop_1 is legacy, it is better to avoid building an INTEGER_CST
if you throw away the result due to TREE_OVERFLOW.
Comment 19 rdapp 2016-06-23 16:07:46 UTC
(In reply to rguenther@suse.de from comment #18)
>

The match.pd patch is slowly assuming shape... Two things however I'm still unsure about:

1.
An existing rule in match.pd checks for overflow of the combined constant
in (a +- CST1) +- CST2 and does not perform the simplification when an overflow occurs. My case looks quite similar, however CST1 and CST2 are/might be unsigned ints/longs. I assume this is because non-overflow cannot be proved by the C frontend and it therefore internally uses unsigned representation?

(cast) (a +- CST1) +- CST2

We can prove the non-overflow of (a +- CST1) via VRP but an overflow check of (CST1 +- CST2) would fail since ((UINT_MAX - 1) + 1) does overflow.

So, in general, should we care about overflow of the combined operation at all after having established the inner operation does not overflow?
If the overflow is well-defined and we overflow, the result should be valid. If the overflow is undefined, we could do anything, in particular optimize this which wouldn't even be too unexpected from a user's perspective. Wouldn't any overflow in the combined constant be caused anyway, even without combining?

I think there are only two cases two discern, regardless of overflow:
 
 - abs(CST1 +- CST2) < abs(CST1), then we can simplify to (cast)(a +- (CST1 +- CST2))

 - else, we can simplify to (cast)(a) +- (CST1 +- CST2)

2.
Is there an idiomatic/correct way to check a VR_RANGE for overflow? Does it suffice to check if the range includes +-INF or +-INF(OVF)? I suspect other, naive methods like checking if min < max will fail, since the ranges are canonicalized.
Comment 20 bin cheng 2016-06-23 20:32:11 UTC
(In reply to rdapp from comment #19)
> (In reply to rguenther@suse.de from comment #18)
> >
> 
> The match.pd patch is slowly assuming shape... Two things however I'm still
> unsure about:
> 
> 1.
> An existing rule in match.pd checks for overflow of the combined constant
> in (a +- CST1) +- CST2 and does not perform the simplification when an
> overflow occurs. My case looks quite similar, however CST1 and CST2
> are/might be unsigned ints/longs. I assume this is because non-overflow
> cannot be proved by the C frontend and it therefore internally uses unsigned
> representation?
> 
> (cast) (a +- CST1) +- CST2
> 
> We can prove the non-overflow of (a +- CST1) via VRP but an overflow check
> of (CST1 +- CST2) would fail since ((UINT_MAX - 1) + 1) does overflow.
IIUC we can simply handle signed/unsigned type differently.  Given a has int/uint type. 
For unsigned case: (unsigned long long)(a + cst1) + cst2  and a is unsigned int.
It can be simplified into (unsigned long long)a + cst3 iff a + cst1 doesn't overflow/wrap.  Since unsigned type can wrap, simplify (unsigned long long)cst1 + cst2 into cst3 is always safe.
For signed case: (long long)(a + cst) + cst2  and a is signed int.
It can be simplified into (long long)a + cst3 iff (long long)cst1 + cst2 doesn't overflow.  We don't need to prove (a+cst) doesn't overflow.
> 
> So, in general, should we care about overflow of the combined operation at
> all after having established the inner operation does not overflow?
> If the overflow is well-defined and we overflow, the result should be valid.
> If the overflow is undefined, we could do anything, in particular optimize
> this which wouldn't even be too unexpected from a user's perspective.
> Wouldn't any overflow in the combined constant be caused anyway, even
> without combining?
> 
> I think there are only two cases two discern, regardless of overflow:
>  
>  - abs(CST1 +- CST2) < abs(CST1), then we can simplify to (cast)(a +- (CST1
> +- CST2))
> 
>  - else, we can simplify to (cast)(a) +- (CST1 +- CST2)
> 
> 2.
> Is there an idiomatic/correct way to check a VR_RANGE for overflow? Does it
> suffice to check if the range includes +-INF or +-INF(OVF)? I suspect other,
> naive methods like checking if min < max will fail, since the ranges are
> canonicalized.
Comment 21 rdapp 2016-06-24 05:40:24 UTC
(In reply to amker from comment #20)
> IIUC we can simply handle signed/unsigned type differently.  Given a has
> int/uint type. 
> For unsigned case: (unsigned long long)(a + cst1) + cst2  and a is unsigned
> int.
> It can be simplified into (unsigned long long)a + cst3 iff a + cst1 doesn't
> overflow/wrap.  Since unsigned type can wrap, simplify (unsigned long
> long)cst1 + cst2 into cst3 is always safe.
> For signed case: (long long)(a + cst) + cst2  and a is signed int.
> It can be simplified into (long long)a + cst3 iff (long long)cst1 + cst2
> doesn't overflow.  We don't need to prove (a+cst) doesn't overflow.

ok, the stipulation seems to be assume no signed overflow if the operation was already present but don't provoke a potentially new overflow by combining constants that previously would not have been. Makes sense. Your cases can be improved a little as Richard also pointed out: When abs(cst3) < abs(cst1) the combined operation can be done in the inner type (and the signs match so the overflow proof still holds).
Comment 22 bin cheng 2016-06-24 07:39:30 UTC
(In reply to rdapp from comment #21)
> (In reply to amker from comment #20)
> > IIUC we can simply handle signed/unsigned type differently.  Given a has
> > int/uint type. 
> > For unsigned case: (unsigned long long)(a + cst1) + cst2  and a is unsigned
> > int.
> > It can be simplified into (unsigned long long)a + cst3 iff a + cst1 doesn't
> > overflow/wrap.  Since unsigned type can wrap, simplify (unsigned long
> > long)cst1 + cst2 into cst3 is always safe.
> > For signed case: (long long)(a + cst) + cst2  and a is signed int.
> > It can be simplified into (long long)a + cst3 iff (long long)cst1 + cst2
> > doesn't overflow.  We don't need to prove (a+cst) doesn't overflow.
> 
> ok, the stipulation seems to be assume no signed overflow if the operation
> was already present but don't provoke a potentially new overflow by
> combining constants that previously would not have been. Makes sense. Your
> cases can be improved a little as Richard also pointed out: When abs(cst3) <
> abs(cst1) the combined operation can be done in the inner type (and the
> signs match so the overflow proof still holds).

Hmm, combine (long long)cst1 + cst2 is compilation time work, as long as it doesn't overflow.  It doesn't matter in which type the combination is done?  Oh, do you mean simplify the original expression into (long long)(a + cst3)?
Comment 23 bin cheng 2016-06-24 11:22:47 UTC
(In reply to rdapp from comment #19)
> (In reply to rguenther@suse.de from comment #18)
> 2.
> Is there an idiomatic/correct way to check a VR_RANGE for overflow? Does it
> suffice to check if the range includes +-INF or +-INF(OVF)? I suspect other,
> naive methods like checking if min < max will fail, since the ranges are
> canonicalized.

Hmm, not sure if I mis-understood your point.
I think there is no *overflow* concept for a VR_RANGE (or a variable), overflow is always a semantic concept for an (arithmetic) operation.  Take below loop as an example:

unsigned int i;
for (i = 4; i != 2; i += 2)
  {
    //....
  }

Variable i has VR_RANGE [0, MAX_VALUE - 1], but it does overflow in the loop.
Comment 24 rdapp 2016-06-27 13:22:37 UTC
Created attachment 38775 [details]
Updated patch and test, match.pd/VRP

(In reply to amker from comment #23)
> Hmm, not sure if I mis-understood your point.
> I think there is no *overflow* concept for a VR_RANGE (or a variable),
> overflow is always a semantic concept for an (arithmetic) operation.  Take
> below loop as an example:
> 
> unsigned int i;
> for (i = 4; i != 2; i += 2)
>   {
>     //....
>   }
> 
> Variable i has VR_RANGE [0, MAX_VALUE - 1], but it does overflow in the loop.

Yes, a variable by itself cannot overflow, only an operation can. VRP for example has a function extract_range_from_binary_expr_1 that performs some overflow checking internally. It does so by comparing the minima and maxima of the respective ranges from both operands. I assumed an overflow there would somehow be propagated into the ranges, hence the question how to access the cast expression in match.pd.

Guess I was thinking too complicated, getting the range of both operands and checking if their common minimum is smaller than the maximum should suffice for now.

Attached is the current version of the patch, no regressions on s390x and x86-64. The test could be made a bit more specific I suppose, currently it counts the number of "gimple_simplified"s.
Comment 25 rdapp 2016-07-11 13:17:13 UTC
Created attachment 38875 [details]
Updated patch and test, math.pd/VRP

Some cleanups and fixes, the patch can now handle more cases in the inner type. Also added more test cases.