Bug 113879 - missed optimization - not exploiting known range of integers
Summary: missed optimization - not exploiting known range of integers
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2024-02-11 20:01 UTC by Martin Uecker
Modified: 2024-05-23 23:03 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Uecker 2024-02-11 20:01:31 UTC
This is similar to PR105855 but without sanitizer.

In the following example the loop is not vectorized because  the overflow check (which is not needed) is not removed.  It works when adding the first check before the loop.  But the information about j < INT_MAX can be derived directly from j < i + 4. 

void f(int i, float * restrict a, float * restrict b) 
{   
#if 0
    if (INT_MAX - 4 < i)
        __builtin_unreachable();
#endif
    for (int j = i; j < i + 4; ) {
        a[j] = b[j] + 1.;
#if 1
        if (INT_MAX - 1 < j)
            __builtin_trap();
#endif 
        j++;
    }
}

https://godbolt.org/z/xnEbh5zfv
Comment 1 Richard Biener 2024-02-12 09:04:12 UTC
VRP has difficulties with cycles.
Comment 2 Andrew Macleod 2024-02-12 16:01:32 UTC
Not so much a cycle issue as a backward propagation issue.

 <bb 2> :
  goto <bb 6>; [INV]

  <bb 3> :
  _1 = (long unsigned int) j_10;
<..>
  if (j_10 >= -1)
    goto <bb 4>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 4> :
  __builtin_trap ();

  <bb 5> :
  j_18 = j_10 + 1;

  <bb 6> :
  # j_10 = PHI <i_12(D)(2), j_18(5)>
  _9 = i_12(D) + 3;
  if (_9 >= j_10)
    goto <bb 3>; [INV]
  else
    goto <bb 7>; [INV]

  <bb 7> :
  return;

'i' is only ever referenced twice.  Very first thing on the edge from 2->6 as the initial value for j_10, and then in the calculation of _9 which is then used in the branch against j_10.

That initial value of i_12 we have no way of knowing can't be INT_MAX.  Its only later during the calculation _9 = i_12 + 3 that we can infer that i_12 must be INT_MAX-3. 

We certainly know after the branch that 
6->3  (T) i_12(D) :     [irange] int [-INF, 2147483644]

Going back to examine the initial value use on the edge 2->6 is not something that we would normally expect to have to do.  Its similar to the __builtin_unreachable() problem in that we can infer the global range of i_12 based on that addition statement, but detecting that is the case in general circumstance is not trivial as we have to go look at all earlier uses to make sure they are post dominated by the statement.   

There is also the additional option that I do no believe we currently register an inferred range on the statement 
  _9 = i_12(D) + 3;
for i_12.   Adding an inferred range for every arithmetic statement would come with a cost.. not sure exactly what it would be, but we weren't expecting inferred ranges from the majority of statements. we don't normally need that because as you can see, we get the ranges right after the branches anyway.  I could do a dry run and see what the time differential is. 
 

We could consider adding those inferred ranges at -O3.  we could also consider an enhancement that works like the builtin_unreachable() removal pass, but looks at all inferred ranges in the function as well to see if they are applicable in a global context and adjusts the global value if appropriate.  Which they would be in a case like this.
Comment 3 GCC Commits 2024-05-23 20:53:07 UTC
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:efc4255d4393cba3d2232a7152799e1b161c3062

commit r15-802-gefc4255d4393cba3d2232a7152799e1b161c3062
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Thu May 2 12:23:18 2024 -0400

    Add inferred ranges for range-ops based statements.
    
    Gimple_range_fold contains some shorthand fold_range routines for
    easy user consumption of that range-ops interface, but there is no equivalent
    routines for op1_range and op2_range.  This patch provides basic versions.
    
    Any range-op entry which has an op1_range or op2_range implemented can
    potentially also provide inferred ranges.  This is a step towards
    PR 113879.  Default is currently OFF for performance reasons as it
    dramtically increases the number of inferred ranges.
    
            PR tree-optimization/113879
            * gimple-range-fold.cc (op1_range): New.
            (op2_range): New.
            * gimple-range-fold.h (op1_range): New prototypes.
            (op2_range): New prototypes.
            * gimple-range-infer.cc (gimple_infer_range::add_range): Do not
            add an inferred range if it is VARYING.
            (gimple_infer_range::gimple_infer_range): Add inferred ranges
            for any range-op statements if requested.
            * gimple-range-infer.h (gimple_infer_range): Add parameter.