Bug 101240 - [missed optimization] Transitivity of less-than and less-or-equal
Summary: [missed optimization] Transitivity of less-than and less-or-equal
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2021-06-28 12:03 UTC by Kyrylo Bohdanenko
Modified: 2023-07-13 18:27 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-07-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kyrylo Bohdanenko 2021-06-28 12:03:52 UTC
Consider the following C++ code

#define ALWAYS_TRUE(x) do { if (x) __builtin_unreachable(); } while (false)

int divide(int a, int b) {
  ALWAYS_TRUE(a == b);
  return a / b;
}

void test_array(unsigned (&arr)[3]) {
  ALWAYS_TRUE(a[0] < a[1] && a[1] < a[2]);
  return a[0] < a[2];
}

The first function is optimiozed away:

divide(int, int):
        mov     eax, 1
        ret

While the second still does the comparison:

test_array(unsigned int (&) [3]):
        mov     eax, DWORD PTR [rdi+8]
        cmp     DWORD PTR [rdi], eax
        setb    al
        ret

It would be nice if GCC could deduce that 

a < b && b < c --> a < c

And optimize that second function (provided no other UB is involved)
Comment 1 Richard Biener 2021-06-29 07:01:34 UTC
EVRP sees

  <bb 2> :
  _1 = (*a_8(D))[0];
  _2 = (*a_8(D))[1];
  if (_1 < _2)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 3> :
  _4 = (*a_8(D))[2];
  if (_2 < _4)
    goto <bb 4>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 4> :
  __builtin_unreachable ();

  <bb 5> :
  _6 = (*a_8(D))[2];
  _9 = _1 < _6;

so predicate analysis should handle this (but it does not).  Btw, I fixed
the testcase for you:

#define ALWAYS_TRUE(x) do { if (x) __builtin_unreachable(); } while (false)
bool test_array(unsigned (&a)[3]) {
  ALWAYS_TRUE(a[0] < a[1] && a[1] < a[2]);
  return a[0] < a[2];
}
Comment 2 Richard Biener 2021-06-29 07:04:45 UTC
Note will require a VRP pass after PRE since only there we'll be able to
CSE the a[2] load.
Comment 3 Andrew Macleod 2021-11-05 21:31:42 UTC
in VRP2 we see:

 <bb 2> [local count: 1073741824]:
  _1 = (*a_6(D))[0];
  _2 = (*a_6(D))[1];
  pretmp_8 = (*a_6(D))[2];
  if (_1 < _2)
    goto <bb 3>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 3> [local count: 536870913]:
  if (_2 < pretmp_8)
    goto <bb 4>; [0.00%]
  else
    goto <bb 5>; [100.00%]

  <bb 4> [count: 0]:
  __builtin_unreachable ();

  <bb 5> [local count: 1073741824]:
  _7 = _1 < pretmp_8;
  return _7;

There isnt much VRP can do about this because if we take the path 2->5, we have no way of resolving _1 < pretmp_8 because the comparison has not been made yet.

interestingly, if this were to be threaded, 
2->3 would register _1 < _2
3->5 would register _2 >= ptrtmp_8
then in bb5'  _1 < pretmp_8 should be answered by the path oracle as TRUE and then be folded to return true.  at least it should :-)

I guess the next question would be... why doesn't the threader think this is worth threading?
Comment 4 Aldy Hernandez 2021-11-05 22:04:09 UTC
(In reply to Andrew Macleod from comment #3)
> in VRP2 we see:
> 
>  <bb 2> [local count: 1073741824]:
>   _1 = (*a_6(D))[0];
>   _2 = (*a_6(D))[1];
>   pretmp_8 = (*a_6(D))[2];
>   if (_1 < _2)
>     goto <bb 3>; [50.00%]
>   else
>     goto <bb 5>; [50.00%]
> 
>   <bb 3> [local count: 536870913]:
>   if (_2 < pretmp_8)
>     goto <bb 4>; [0.00%]
>   else
>     goto <bb 5>; [100.00%]
> 
>   <bb 4> [count: 0]:
>   __builtin_unreachable ();
> 
>   <bb 5> [local count: 1073741824]:
>   _7 = _1 < pretmp_8;
>   return _7;
> 
> There isnt much VRP can do about this because if we take the path 2->5, we
> have no way of resolving _1 < pretmp_8 because the comparison has not been
> made yet.
> 
> interestingly, if this were to be threaded, 
> 2->3 would register _1 < _2
> 3->5 would register _2 >= ptrtmp_8
> then in bb5'  _1 < pretmp_8 should be answered by the path oracle as TRUE
> and then be folded to return true.  at least it should :-)
> 
> I guess the next question would be... why doesn't the threader think this is
> worth threading?

The threader only looks at paths that end in a block with multiple exits:

 FOR_EACH_BB_FN (bb, m_fun)
    if (EDGE_COUNT (bb->succs) > 1)
      maybe_thread_block (bb);

so...a switch or a cond.
Comment 5 Andrew Pinski 2023-07-13 18:27:58 UTC
here is a better testcase:
```
int test_array(unsigned (&a)[3]) {
  int t = (a[0]+a[1]+a[2]);
  ALWAYS_TRUE(a[0] < a[1] && a[1] < a[2]);
  return (a[0] < a[2])+t;
}
```
(just to make sure VRP is only dealing with SSA names).

We should be able to optimize the above to just
return a[0]+a[1]+a[2]+1;