This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168

Liu Hao <lh_mouse at 126 dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lh_mouse at 126 dot com

--- Comment #11 from Liu Hao <lh_mouse at 126 dot com> ---
I think it is okay to zero-extend the result (which is a no-op, because on x64,
writing to the lower 32 bits of a GPR zeroes its upper 32 bits implicitly,
except the `NOP` instruction which is actually encoded as `XCHG EAX, EAX`),
because sign-extending an _undefined_ result doesn't produce a defined result
after all.

Clang generates the optimal code, as follows:

source:
```c
unsigned long long my_ctz(unsigned n){
    return (unsigned)__builtin_ctz(n);
}
```

clang 4.0.0 with `-O3`:
```
my_ctz(unsigned int):
        bsfl    %edi, %eax
        retq
```

gcc 7.1 with `-O3`:
```
my_ctz(unsigned int):
        xorl    %eax, %eax
        rep bsfl        %edi, %eax
        cltq           # This conversion from `int` to `unsigned long long` is
redundent.
        ret
```

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]