This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
- From: "lh_mouse at 126 dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 28 Jul 2017 07:29:16 +0000
- Subject: [Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64
- Auto-submitted: auto-generated
- References: <bug-50168-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Liu Hao <lh_mouse at 126 dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lh_mouse at 126 dot com
--- Comment #11 from Liu Hao <lh_mouse at 126 dot com> ---
I think it is okay to zero-extend the result (which is a no-op, because on x64,
writing to the lower 32 bits of a GPR zeroes its upper 32 bits implicitly,
except the `NOP` instruction which is actually encoded as `XCHG EAX, EAX`),
because sign-extending an _undefined_ result doesn't produce a defined result
after all.
Clang generates the optimal code, as follows:
source:
```c
unsigned long long my_ctz(unsigned n){
return (unsigned)__builtin_ctz(n);
}
```
clang 4.0.0 with `-O3`:
```
my_ctz(unsigned int):
bsfl %edi, %eax
retq
```
gcc 7.1 with `-O3`:
```
my_ctz(unsigned int):
xorl %eax, %eax
rep bsfl %edi, %eax
cltq # This conversion from `int` to `unsigned long long` is
redundent.
ret
```