[Bug tree-optimization/93440] scalar unrolled loop makes vectorized code unreachable

ikonomisma at googlemail dot com gcc-bugzilla@gcc.gnu.org
Fri Jan 31 17:00:00 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93440

--- Comment #6 from ikonomisma at googlemail dot com ---
(In reply to Richard Biener from comment #5)
> OK, so I can compile the testcase now but I fail to see the error.  We're
> doing pointer difference compares and those should work out fine?
> 
> We're also doig many checks but you probably refer to the very first test?
> 
> _Z12brokenvectorRKSt6vectorIiSaIiEES3_:
> .LFB2470:
>         .cfi_startproc
>         movq    (%rsi), %rdx
>         movq    8(%rdi), %rsi
>         xorl    %r8d, %r8d
>         movq    (%rdi), %rax
>         movq    %rsi, %rcx
>         subq    %rax, %rcx
>         cmpq    $12, %rcx
>         jle     .L18
> 
> that's created from
> 
>   <bb 2> [local count: 1073741824]:
>   _7 = MEM[(int * *)b_2(D)];
>   _6 = MEM[(int * *)a_3(D) + 8B];
>   _4 = MEM[(int * *)a_3(D)];
>   _10 = _6 - _4;
>   if (_10 > 12)
> 
> what are the actual pointers here?

So the structure of the code is like this:

- function label
- function prologue
- test whether less than or equal 12 bytes (3 or less ints) are to be
processed, jump to SIMD vector prologue
- unrolled scalar loop
- test whether less than or equal 12 bytes remain to be processed
- jump back to scalar loop if more of the vector remains to be processed
- SIMD vector prologue testing whether enough of the vector remains unprocessed
to warrant vectorized execution. This will effectively never be the case


To see the problem, you could call the function (non-inlined) in a test program
with a reasonably large vector. Run under gdb, set a breakpoint on one of the
instructions in the SIMD-vector code, run. You'll find the SIMD code never gets
executed.


More information about the Gcc-bugs mailing list