[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?
jakub at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Mar 12 08:13:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jörn Engel from comment #13)
> None of those examples convince me. If you or I know that a zero-argument
> is impossible, but the compiler doesn't know, wouldn't that still be UB?
> And if the compiler knows, it can remove the branch either way.
The current design is good.
As has been said, what the various hw instructions do varies a lot, it can
result in the bitsize of the corresponding type, in -1, in some larger value or
in completely undefined result, e.g. the x86 bsf instruction leaves the content
of the destination register unmodified if used with 0.
Try:
int foo (int x) { return __builtin_ctz (x); }
int bar (int x) { return x ? __builtin_ctz (x) : 32; }
int baz (int x) { return x ? __builtin_ctz (x) : -1; }
Without -mbmi, gcc emits:
xorl %eax, %eax
rep bsfl %edi, %eax
ret
for foo, and
xorl %eax, %eax
movl $32, %edx
rep bsfl %edi, %eax
testl %edi, %edi
cmove %edx, %eax
ret
for bar and
testl %edi, %edi
je .L8
xorl %eax, %eax
rep bsfl %edi, %eax
ret
.L8:
movl $-1, %eax
ret
for baz. If __builtin_ctz was well defined for 0, we could not emit the simple
first case unless the optimizers figure out that 0 is not possible, plus the
choice of what to do for 0 would probably need to be consistent on all arches,
so generating worse code if the chosen value doesn't match what the hw can do.
More information about the Gcc-bugs
mailing list