#include <stdint.h> uint32_t countLeadingZeros32(uint32_t x) { if (x == 0) return 32; return (31 - __builtin_clz(x)) ^ 31; } On x86, with `-mlzcnt`, GCC outputs this: countLeadingZeros32(unsigned int): mov eax, 32 test edi, edi je .L1 mov eax, 31 lzcnt edi, edi sub eax, edi xor eax, 31 .L1: ret LLVM instead outputs this: countLeadingZeros32(unsigned int): lzcnt eax, edi ret
There are two issues, both are tree level issues, though the second one works on the RTL level just fine. Right now we have: _1 = __builtin_clz (x_5(D)); _2 = 31 - _1; _3 = _2 ^ 31; But the _3 can be optimized to just _1.
The second issue can be seen with: #include <stdint.h> uint32_t countLeadingZeros32(uint32_t x) { if (x == 0) return 32; return (__builtin_clz(x)) ; } This gets optimized for aarch64 at the rtl level but not for x86_64 with -mlzcnt.
Filed PR 104378 for the (31 - x) ^ 31 issue.
(In reply to Andrew Pinski from comment #2) > The second issue can be seen with: > #include <stdint.h> > > uint32_t countLeadingZeros32(uint32_t x) > { > if (x == 0) > return 32; > return (__builtin_clz(x)) ; > } cond_removal_in_builtin_zero_pattern should have optimized the above but does not for some reason. Let me take a look.
(In reply to Andrew Pinski from comment #4) > cond_removal_in_builtin_zero_pattern should have optimized the above but > does not for some reason. > Let me take a look. So one problem is we have: <bb 2> [local count: 1073741824]: if (x_3(D) == 0) goto <bb 4>; [21.72%] else goto <bb 3>; [78.28%] <bb 3> [local count: 840525097]: _1 = __builtin_clz (x_3(D)); _4 = (uint32_t) _1; <bb 4> [local count: 1073741824]: # _2 = PHI <32(2), _4(3)> Which we don't handle in cond_removal_in_builtin_zero_pattern, this similar to PR 99997 and PR 101822, that is the code which added to fix PR 71016 is getting in the way.
The cast issue is basically PR 101822.
I have a fix for the secondary issue which does not cause PR 71016 to show up again. Basically we should allow nop conversions always in factor_out_conditional_operation .
Created attachment 56117 [details] Patch which I am testing to fix the second issue
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>: https://gcc.gnu.org/g:0fc13e8c0e39c51e82deb93f324d9d86ad8d7460 commit r14-4889-g0fc13e8c0e39c51e82deb93f324d9d86ad8d7460 Author: Andrew Pinski <pinskia@gmail.com> Date: Sun Oct 15 19:15:38 2023 +0000 Improve factor_out_conditional_operation for conversions and constants In the case of a NOP conversion (precisions of the 2 types are equal), factoring out the conversion can be done even if int_fits_type_p returns false and even when the conversion is defined by a statement inside the conditional. Since it is a NOP conversion there is no zero/sign extending happening which is why it is ok to be done here; we were trying to prevent an extra sign/zero extend from being moved away from definition which no-op conversions are not. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/104376 PR tree-optimization/101541 * tree-ssa-phiopt.cc (factor_out_conditional_operation): Allow nop conversions even if it is defined by a statement inside the conditional. gcc/testsuite/ChangeLog: PR tree-optimization/101541 * gcc.dg/tree-ssa/phi-opt-39.c: New test.