Why does __builtin_ctz clear eax on amd64 targets
Mason
slash.tmp@free.fr
Tue Oct 3 13:53:00 GMT 2017
Hello,
Consider the following code:
int my_ctz(unsigned int arg) { return __builtin_ctz(arg); }
which "gcc-7 -O -S -march=skylake" compiles to:
my_ctz:
xorl %eax, %eax
tzcntl %edi, %eax
ret
I don't understand why GCC clears eax before executing tzcnt.
(Actually, this happens for other built-ins as well: clz, popcount.)
tzcnt (or bsf) will write their result to eax.
http://www.felixcloutier.com/x86/TZCNT.html
http://www.felixcloutier.com/x86/BSF.html
Does it have to do with partial register write stalls?
Probably not, because the zero-ing remains even when the call
is inlined, and gcc "sees" there are no partial register writes.
Regards.
More information about the Gcc-help
mailing list