[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Mar 12 08:13:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jörn Engel from comment #13)
> None of those examples convince me.  If you or I know that a zero-argument
> is impossible, but the compiler doesn't know, wouldn't that still be UB? 
> And if the compiler knows, it can remove the branch either way.

The current design is good.
As has been said, what the various hw instructions do varies a lot, it can
result in the bitsize of the corresponding type, in -1, in some larger value or
in completely undefined result, e.g. the x86 bsf instruction leaves the content
of the destination register unmodified if used with 0.

Try:

int foo (int x) { return __builtin_ctz (x); }
int bar (int x) { return x ? __builtin_ctz (x) : 32; }
int baz (int x) { return x ? __builtin_ctz (x) : -1; }

Without -mbmi, gcc emits:
        xorl    %eax, %eax
        rep bsfl        %edi, %eax
        ret
for foo, and
        xorl    %eax, %eax
        movl    $32, %edx
        rep bsfl        %edi, %eax
        testl   %edi, %edi
        cmove   %edx, %eax
        ret
for bar and
        testl   %edi, %edi
        je      .L8
        xorl    %eax, %eax
        rep bsfl        %edi, %eax
        ret
.L8:
        movl    $-1, %eax
        ret
for baz.  If __builtin_ctz was well defined for 0, we could not emit the simple
first case unless the optimizers figure out that 0 is not possible, plus the
choice of what to do for 0 would probably need to be consistent on all arches,
so generating worse code if the chosen value doesn't match what the hw can do.


More information about the Gcc-bugs mailing list