void f(int*q){ int*p=(int*)__builtin_assume_aligned(q,32); for(int i=0;i<(1<<20);++i){ if(p+i==0)__builtin_unreachable(); p[i]=i; } } If I remove the line with __builtin_unreachable, this is vectorized at -O3. But as is, we have control flow in the loop and the vectorizer won't even try.
We remove paths ending in __builtin_unreachable () at some point but arguably too late for the vectorizer. An (easy?) way out for the vectorizer would be to ignore that path (that is, do not generate vectorized code for it). But then first its various pieces would need to deal with loops with multiple basic-blocks.
Another optimization prevented by __builtin_unreachable: extern void f(); void g(int i){ if(i>0){ f(); __builtin_unreachable(); } } misses that f is a tail call and generates at -O3 on x86_64: testl %edi, %edi jg .L6 rep ret .L6: pushq %rax xorl %eax, %eax call f instead of (removing the builtin): testl %edi, %edi jle .L1 xorl %eax, %eax ( in C++ this line disappears ) jmp f .L1: rep ret (it inverted the comparison, but my point is "call" vs "jmp")
__builtin_unreachable is processed by "fab" optimization pass (fold all builtins / tree-ssa-ccp.c) in optimize_unreachable function. This pass is executed after vectorization so vectorizer gets __builtin_unreachable. Moreover __builtin_unreachable processing is one of the last pass on gimple. It leads that __builtin_unreachable can prevent not only vectorization but also many others optimization. So I suggest processing __builtin_unreachable immediately after "cfg" pass (cfg buiding). If we don't have any objections I can make patch for this problem.
Created attachment 30914 [details] Fisrt patch
Created attachment 30915 [details] Second patch
I have created two patches to fix this problem. The first patch (bug_fix_58359_builit_unreachable.patch) just moves functionality of optimize_unreachable from "fab" pass to "cfg" pass The second patch (bug_fix_58359_builit_unreachable.AGGRESSIVE.patch) is more aggressive variant. Origininal implementation of optimize_unreachable doesn't delete basic block if there is FORCED_LABEL, non debug statemnt, or call function before __built_unreachable in this basic block. I think we can't delete basic block if it contains some statement X before __built_unreachable. This statement X can potentially transfer control from this basic block and can't return. It's possible in two cases: if statement X is procedure call (without return) or assembler instruction. (See also __built_unreachable description)
(In reply to Anatoly Sinyavin from comment #3) > So I suggest processing __builtin_unreachable immediately after "cfg" pass > (cfg buiding). That seems awfully early. Don't we want to at least wait until after VRP1, maybe even DOM1 or later? __builtin_unreachable should be made not to hurt other optimizations too much, but it shouldn't be made useless either.
(In reply to Marc Glisse from comment #7) > (In reply to Anatoly Sinyavin from comment #3) > > So I suggest processing __builtin_unreachable immediately after "cfg" pass > > (cfg buiding). > > That seems awfully early. Don't we want to at least wait until after VRP1, > maybe even DOM1 or later? __builtin_unreachable should be made not to hurt > other optimizations too much, but it shouldn't be made useless either. I changed my opinion and offered new solution couple of weeks ago. New patch was proposed in 'gcc-patches@gcc.gnu.org' (Subject: "PR __builtin_unreachable prevents vectorization/58359 / bug fix / discussion") I attached this new patch here also (patch_and_test_results.tar.bz2). Body of this letter: >>> Hello colleagues, Bug fix for PR __builtin_unreachable prevents vectorization/58359 __builtin_unreachable is processed by "fab" optimization pass (fold all builtins / tree-ssa-ccp.c) in optimize_unreachable function. This pass is executed after vectorization so vectorizer gets __builtin_unreachable and it can't handle. We need to fold __builtin_unreachable before vectorization. I recommended (see comments & patches in 58359) to do it immediately after "cfg" pass (cfg buiding) but it's bad solution because it prevents some optimizations. For instance "gcc.dg/builtin-unreachable-2.c" failed. // part of gcc.dg/builtin-unreachable-2.c if (i > 1) __builtin_unreachable(); if (i > 1) foo (); // user expresses fact that "i > 1" // is always false so call can be removed User can express "always false" conditions via __builtin_unreachable so gcc can use this info for optimization purposes. I think good place is "ifcvt" pass. I have created two patches to fix this problem. The first patch (new_bug_fix_58359_builit_unreachable.patch) just moves functionality of optimize_unreachable from "fab" pass to "ifcvt" pass. The second patch (new_bug_fix_58359_builit_unreachable.AGGRESSIVE.patch) is more aggressive variant. Origininal implementation of optimize_unreachable doesn't delete basic block if there is FORCED_LABEL, non debug statemnt, or call function before __built_unreachable in this basic block. I think we can't delete basic block if it contains some statement X before __built_unreachable. This statement X can potentially transfer control from this basic block and can't return. It's possible in two cases: if statement X is procedure call (without return) or assembler instruction. (See also __built_unreachable description) Test cases: - test in PR 58359 - existing test in gcc test suit "gcc.dg/builtin-unreachable-2.c" Test report: - compiler without my changes original.log - compiler with "new_bug_fix_58359_builit_unreachable_2.patch" builtin_non_aggressive.log - compiler with "new_bug_fix_58359_builit_unreachable_2_aggressive.patch" builtin_aggressive.log There are no regressions Attached file: patch_and_test_results.tar.bz2 conatins patches & test results Many thanks, Anatoly S <<<
Created attachment 31082 [details] Patch for new solution
That is still wrong, __builtin_unreachable is still very much useful even at the RTL level (where we expand it as basic blocks without successors). Perhaps if-conversion (for LOOP_VERSIONED loop only) can just drop the __builtin_unreachable () from the to be vectorized loop?
(In reply to Jakub Jelinek from comment #10) > That is still wrong, __builtin_unreachable is still very much useful even at > the RTL level (where we expand it as basic blocks without successors). > Perhaps if-conversion (for LOOP_VERSIONED loop only) can just drop the > __builtin_unreachable () from the to be vectorized loop? It isn't just vectorization, or even just loop optimizations that are hindered by spurious control flow. Most of the __builtin_unreachable I see are of the form: if(x<0)__builtin_unreachable(); (often hidden below a macro called ASSUME or a similar name) For those, it is tempting to say that replacing the GIMPLE_COND, leading to a block that contains only a call to __builtin_unreachable, with an ASSERT_EXPR, or range information on this SSA_NAME (now that we store it), or anything without control flow, wouldn't lose any information. In reality, it would still lose it sometimes, but I don't know how often/bad that is. Just thinking of lowering *some* of the calls earlier... (though it wouldn't help with comment #2, where tree-tailcall.c should probably be taught about __builtin_unreachable)
Does it mean that my solution is not accepted? If it's so I am going to think about two approaches - vectorizer should ignore that path (Richard Biener 2013-09-09 08:22:53 UTC) - replacing the GIMPLE_COND, leading to a block that contains only a call to __builtin_unreachable, with an ASSERT_EXPR, or range information on this SSA_NAME (now that we store it), or anything without control flow (Marc Glisse 2013-10-24 12:03:18 UTC)
In the original testcase, it is very similar to PR 101993 where it is a check for pointer being null.
Note the original testcase is fixed in GCC 12.1.0. The one in comment #2 is not but I think it is a seperate issue ...