Bug 58359 - __builtin_unreachable prevents vectorization
Summary: __builtin_unreachable prevents vectorization
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, needs-bisection
Depends on: 101993
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2013-09-08 08:20 UTC by Marc Glisse
Modified: 2023-04-18 14:48 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 12.1.0
Known to fail:
Last reconfirmed: 2013-09-09 00:00:00


Attachments
Fisrt patch (1.48 KB, patch)
2013-09-27 14:46 UTC, Anatoly Sinyavin
Details | Diff
Second patch (1.44 KB, patch)
2013-09-27 14:46 UTC, Anatoly Sinyavin
Details | Diff
Patch for new solution (3.92 KB, application/x-bzip)
2013-10-23 15:56 UTC, Anatoly Sinyavin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2013-09-08 08:20:22 UTC
void f(int*q){
  int*p=(int*)__builtin_assume_aligned(q,32);
  for(int i=0;i<(1<<20);++i){
    if(p+i==0)__builtin_unreachable();
    p[i]=i;
  }
}

If I remove the line with __builtin_unreachable, this is vectorized at -O3. But as is, we have control flow in the loop and the vectorizer won't even try.
Comment 1 Richard Biener 2013-09-09 08:22:53 UTC
We remove paths ending in __builtin_unreachable () at some point but arguably
too late for the vectorizer.

An (easy?) way out for the vectorizer would be to ignore that path (that is,
do not generate vectorized code for it).

But then first its various pieces would need to deal with loops with multiple
basic-blocks.
Comment 2 Marc Glisse 2013-09-17 13:31:25 UTC
Another optimization prevented by __builtin_unreachable:

extern void f();
void g(int i){
  if(i>0){
    f();
    __builtin_unreachable();
  }
}

misses that f is a tail call and generates at -O3 on x86_64:

	testl	%edi, %edi
	jg	.L6
	rep ret
.L6:
	pushq	%rax
	xorl	%eax, %eax
	call	f

instead of (removing the builtin):

	testl	%edi, %edi
	jle	.L1
	xorl	%eax, %eax   ( in C++ this line disappears )
	jmp	f
.L1:
	rep ret

(it inverted the comparison, but my point is "call" vs "jmp")
Comment 3 Anatoly Sinyavin 2013-09-20 14:20:23 UTC
__builtin_unreachable is processed by "fab" optimization pass (fold all builtins / tree-ssa-ccp.c) in optimize_unreachable function. This pass is executed after vectorization so vectorizer gets __builtin_unreachable.
	Moreover __builtin_unreachable processing is one of the last pass on gimple. It leads that __builtin_unreachable can prevent not only vectorization but also many others optimization.

	So I suggest processing __builtin_unreachable immediately after "cfg" pass (cfg buiding).

	If we don't have any objections I can make patch for this problem.
Comment 4 Anatoly Sinyavin 2013-09-27 14:46:03 UTC
Created attachment 30914 [details]
Fisrt patch
Comment 5 Anatoly Sinyavin 2013-09-27 14:46:35 UTC
Created attachment 30915 [details]
Second patch
Comment 6 Anatoly Sinyavin 2013-09-27 14:48:08 UTC
I have created two patches to fix this problem.

   The first patch (bug_fix_58359_builit_unreachable.patch) just moves functionality of optimize_unreachable from "fab" pass to "cfg" pass

   The second patch (bug_fix_58359_builit_unreachable.AGGRESSIVE.patch) is more aggressive variant. Origininal implementation of optimize_unreachable doesn't delete basic block if there is FORCED_LABEL, non debug statemnt, or call function before __built_unreachable in this basic block.
   I think we can't delete basic block if it contains some statement X before 
__built_unreachable. This statement X can potentially transfer control from this basic block and can't return. It's possible in two cases: if statement X is procedure call (without return) or assembler instruction. (See also __built_unreachable description)
Comment 7 Marc Glisse 2013-10-23 14:56:19 UTC
(In reply to Anatoly Sinyavin from comment #3)
> So I suggest processing __builtin_unreachable immediately after "cfg" pass
> (cfg buiding).

That seems awfully early. Don't we want to at least wait until after VRP1, maybe even DOM1 or later? __builtin_unreachable should be made not to hurt other optimizations too much, but it shouldn't be made useless either.
Comment 8 Anatoly Sinyavin 2013-10-23 15:55:00 UTC
(In reply to Marc Glisse from comment #7)
> (In reply to Anatoly Sinyavin from comment #3)
> > So I suggest processing __builtin_unreachable immediately after "cfg" pass
> > (cfg buiding).
> 
> That seems awfully early. Don't we want to at least wait until after VRP1,
> maybe even DOM1 or later? __builtin_unreachable should be made not to hurt
> other optimizations too much, but it shouldn't be made useless either.

I changed my opinion and offered new solution couple of weeks ago. New patch was proposed in 'gcc-patches@gcc.gnu.org' (Subject: "PR __builtin_unreachable prevents vectorization/58359 / bug fix / discussion")

I attached this new patch here also (patch_and_test_results.tar.bz2). Body of this letter:

>>>

Hello colleagues,

Bug fix for PR __builtin_unreachable prevents vectorization/58359

__builtin_unreachable is processed by "fab" optimization pass (fold 
all builtins / tree-ssa-ccp.c) in optimize_unreachable function. 
This pass is executed after vectorization so vectorizer gets 
__builtin_unreachable and it can't handle.

We need to fold __builtin_unreachable before vectorization. I 
recommended (see comments & patches in 58359) to do it immediately
after "cfg" pass (cfg buiding) but it's bad solution because it 
prevents some optimizations. 
For instance "gcc.dg/builtin-unreachable-2.c" failed. 

// part of gcc.dg/builtin-unreachable-2.c
if (i > 1)
                __builtin_unreachable();
if (i > 1)
                foo ();   // user expresses fact that "i > 1"
                               // is always false so call can be removed

User can express "always false" conditions via __builtin_unreachable
so gcc can use this info for optimization purposes.

I think good place is "ifcvt" pass. I have created two patches to
fix this problem.

The first patch (new_bug_fix_58359_builit_unreachable.patch) just
moves functionality of optimize_unreachable 
from "fab" pass to "ifcvt" pass.

The second patch (new_bug_fix_58359_builit_unreachable.AGGRESSIVE.patch)
is more aggressive variant.
Origininal implementation of optimize_unreachable doesn't delete
basic block if there is FORCED_LABEL, non debug statemnt, or call
function before __built_unreachable in this basic block. I think we
can't delete basic block if it contains some statement X before
__built_unreachable. This statement X can potentially transfer control
from this basic block and can't return. It's possible in two cases:
if statement X is procedure call (without return) or assembler instruction.
(See also __built_unreachable description)

Test cases:
    - test in PR 58359
    - existing test in gcc test suit "gcc.dg/builtin-unreachable-2.c"

Test report:
    - compiler without my changes
                original.log

    - compiler with "new_bug_fix_58359_builit_unreachable_2.patch"
                builtin_non_aggressive.log

    - compiler with "new_bug_fix_58359_builit_unreachable_2_aggressive.patch"
                builtin_aggressive.log

    There are no regressions

Attached file:
    patch_and_test_results.tar.bz2 conatins patches & test results


Many thanks,
Anatoly S

<<<
Comment 9 Anatoly Sinyavin 2013-10-23 15:56:22 UTC
Created attachment 31082 [details]
Patch for new solution
Comment 10 Jakub Jelinek 2013-10-24 08:52:39 UTC
That is still wrong, __builtin_unreachable is still very much useful even at the RTL level (where we expand it as basic blocks without successors).
Perhaps if-conversion (for LOOP_VERSIONED loop only) can just drop the __builtin_unreachable () from the to be vectorized loop?
Comment 11 Marc Glisse 2013-10-24 12:03:18 UTC
(In reply to Jakub Jelinek from comment #10)
> That is still wrong, __builtin_unreachable is still very much useful even at
> the RTL level (where we expand it as basic blocks without successors).
> Perhaps if-conversion (for LOOP_VERSIONED loop only) can just drop the
> __builtin_unreachable () from the to be vectorized loop?

It isn't just vectorization, or even just loop optimizations that are hindered by spurious control flow. Most of the __builtin_unreachable I see are of the form:

if(x<0)__builtin_unreachable();

(often hidden below a macro called ASSUME or a similar name)

For those, it is tempting to say that replacing the GIMPLE_COND, leading to a block that contains only a call to __builtin_unreachable, with an ASSERT_EXPR, or range information on this SSA_NAME (now that we store it), or anything without control flow, wouldn't lose any information. In reality, it would still lose it sometimes, but I don't know how often/bad that is.

Just thinking of lowering *some* of the calls earlier... (though it wouldn't help with comment #2, where tree-tailcall.c should probably be taught about __builtin_unreachable)
Comment 12 Anatoly Sinyavin 2013-12-06 12:22:01 UTC
Does it mean that my solution is not accepted? 

If it's so I am going to think about two approaches
	- vectorizer should ignore that path (Richard Biener 2013-09-09 08:22:53 UTC)
	- replacing the GIMPLE_COND, leading to a block that contains only 
	a call to __builtin_unreachable, with an ASSERT_EXPR, or range information on this SSA_NAME (now that we store it), or anything without control flow (Marc Glisse 2013-10-24 12:03:18 UTC)
Comment 13 Andrew Pinski 2021-08-24 23:22:57 UTC
In the original testcase, it is very similar to PR 101993 where it is a check for pointer being null.
Comment 14 Andrew Pinski 2023-04-18 14:48:42 UTC
Note the original testcase is fixed in GCC 12.1.0.

The one in comment #2 is not but I think it is a seperate issue ...