Consider the following C++ code #define ALWAYS_TRUE(x) do { if (x) __builtin_unreachable(); } while (false) int divide(int a, int b) { ALWAYS_TRUE(a == b); return a / b; } void test_array(unsigned (&arr)[3]) { ALWAYS_TRUE(a[0] < a[1] && a[1] < a[2]); return a[0] < a[2]; } The first function is optimiozed away: divide(int, int): mov eax, 1 ret While the second still does the comparison: test_array(unsigned int (&) [3]): mov eax, DWORD PTR [rdi+8] cmp DWORD PTR [rdi], eax setb al ret It would be nice if GCC could deduce that a < b && b < c --> a < c And optimize that second function (provided no other UB is involved)
EVRP sees <bb 2> : _1 = (*a_8(D))[0]; _2 = (*a_8(D))[1]; if (_1 < _2) goto <bb 3>; [INV] else goto <bb 5>; [INV] <bb 3> : _4 = (*a_8(D))[2]; if (_2 < _4) goto <bb 4>; [INV] else goto <bb 5>; [INV] <bb 4> : __builtin_unreachable (); <bb 5> : _6 = (*a_8(D))[2]; _9 = _1 < _6; so predicate analysis should handle this (but it does not). Btw, I fixed the testcase for you: #define ALWAYS_TRUE(x) do { if (x) __builtin_unreachable(); } while (false) bool test_array(unsigned (&a)[3]) { ALWAYS_TRUE(a[0] < a[1] && a[1] < a[2]); return a[0] < a[2]; }
Note will require a VRP pass after PRE since only there we'll be able to CSE the a[2] load.
in VRP2 we see: <bb 2> [local count: 1073741824]: _1 = (*a_6(D))[0]; _2 = (*a_6(D))[1]; pretmp_8 = (*a_6(D))[2]; if (_1 < _2) goto <bb 3>; [50.00%] else goto <bb 5>; [50.00%] <bb 3> [local count: 536870913]: if (_2 < pretmp_8) goto <bb 4>; [0.00%] else goto <bb 5>; [100.00%] <bb 4> [count: 0]: __builtin_unreachable (); <bb 5> [local count: 1073741824]: _7 = _1 < pretmp_8; return _7; There isnt much VRP can do about this because if we take the path 2->5, we have no way of resolving _1 < pretmp_8 because the comparison has not been made yet. interestingly, if this were to be threaded, 2->3 would register _1 < _2 3->5 would register _2 >= ptrtmp_8 then in bb5' _1 < pretmp_8 should be answered by the path oracle as TRUE and then be folded to return true. at least it should :-) I guess the next question would be... why doesn't the threader think this is worth threading?
(In reply to Andrew Macleod from comment #3) > in VRP2 we see: > > <bb 2> [local count: 1073741824]: > _1 = (*a_6(D))[0]; > _2 = (*a_6(D))[1]; > pretmp_8 = (*a_6(D))[2]; > if (_1 < _2) > goto <bb 3>; [50.00%] > else > goto <bb 5>; [50.00%] > > <bb 3> [local count: 536870913]: > if (_2 < pretmp_8) > goto <bb 4>; [0.00%] > else > goto <bb 5>; [100.00%] > > <bb 4> [count: 0]: > __builtin_unreachable (); > > <bb 5> [local count: 1073741824]: > _7 = _1 < pretmp_8; > return _7; > > There isnt much VRP can do about this because if we take the path 2->5, we > have no way of resolving _1 < pretmp_8 because the comparison has not been > made yet. > > interestingly, if this were to be threaded, > 2->3 would register _1 < _2 > 3->5 would register _2 >= ptrtmp_8 > then in bb5' _1 < pretmp_8 should be answered by the path oracle as TRUE > and then be folded to return true. at least it should :-) > > I guess the next question would be... why doesn't the threader think this is > worth threading? The threader only looks at paths that end in a block with multiple exits: FOR_EACH_BB_FN (bb, m_fun) if (EDGE_COUNT (bb->succs) > 1) maybe_thread_block (bb); so...a switch or a cond.
here is a better testcase: ``` int test_array(unsigned (&a)[3]) { int t = (a[0]+a[1]+a[2]); ALWAYS_TRUE(a[0] < a[1] && a[1] < a[2]); return (a[0] < a[2])+t; } ``` (just to make sure VRP is only dealing with SSA names). We should be able to optimize the above to just return a[0]+a[1]+a[2]+1;