While experimenting with __builtin_unreachable I found that in some cases adding it pessimizes the code. Consider the following code (also at https://goo.gl/WmR8PX): -- enum Side { Bid, Ask }; struct Foo { int a; int b; }; int test(Side side, const Foo &foo) { if (side == Bid) return foo.a; return foo.b; } int test_with_unreach(Side side, const Foo &foo) { if (side == Bid) return foo.a; if (side != Ask) __builtin_unreachable(); return foo.b; } -- In the non-unreachable case `test`, the code generates the cmove I'd expect: -- test(Side, Foo const&): mov eax, DWORD PTR [rsi+4] test edi, edi cmove eax, DWORD PTR [rsi] ret -- In the unreachable case, GCC resorts back to branching: -- test_with_unreach(Side, Foo const&): test edi, edi je .L9 mov eax, DWORD PTR [rsi+4] ret .L9: mov eax, DWORD PTR [rsi] ret -- It's not really clear to me how much of a pessimization this is; but it was surprising that the unreachability had such an effect. I was hoping to prove to the compiler that the only valid inputs were "Bid" and "Ask" and as such it could actually generate something like: -- mov eax, DWORD PTR[rsi+eax*4] ret --
Without the unreachable PHI-OPT will host the loads (via hoist_adjacent_loads) but not with it. Thus we end up with the if-convertible <bb 2>: _5 = foo_4(D)->a; _6 = foo_4(D)->b; if (side_2(D) == 0) goto <bb 4>; else goto <bb 3>; <bb 3>: <bb 4>: # _1 = PHI <_5(2), _6(3)> compared to <bb 2>: if (side_2(D) == 0) goto <bb 3>; else goto <bb 4>; <bb 3>: _5 = foo_4(D)->a; goto <bb 5>; <bb 4>: _6 = foo_4(D)->b; <bb 5>: # _1 = PHI <_5(3), _6(4)> which is not if-convertible (by RTL if-conversion which doesn't perform this trick itself).
Thanks for updating the bug. As a corollary, moving the unreachability above the returns yields the same code as the non-unreachable: https://goo.gl/MdULOs -- int test_with_unreach_First(Side side, const Foo &foo) { if (side != Ask && side != Bid) __builtin_unreachable(); if (side == Bid) return foo.a; return foo.b; } -- test_with_unreach_First(Side, Foo const&): mov eax, DWORD PTR [rsi+4] test edi, edi cmove eax, DWORD PTR [rsi] ret -- For what it's worth I've been unable to coax either clang or icc (13.0.1 anyway) into the code I'd ideally like (the rsi+eax*4 case).
Note I think the underlying issue is recorded in PR 102008. Also note aarch64 produces a csel for both case.
Fixed in GCC 13 by vrp2 removing the __builtin_unreachable at that point and allowing a later phiopt to do the job.
Amazing: thank you Andrew!