This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/86544] Popcount detection generates different code on C and C++


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544

--- Comment #1 from kugan at gcc dot gnu.org ---
(In reply to ktkachov from comment #0)
> Great to see that GCC now detects the popcount loop in PR 82479!
> I am seeing some curious differences between gcc and g++ though.
> int
> pc (unsigned long long b)
> {
>     int c = 0;
> 
>     while (b) {
>         b &= b - 1;
>         c++;
>     }
> 
>     return c;
> }
> 
> If compiled with gcc -O3 on aarch64 this gives:
> pc:
>         fmov    d0, x0
>         cnt     v0.8b, v0.8b
>         addv    b0, v0.8b
>         umov    w0, v0.b[0]
>         ret
> 
> whereas if compiled with g++ -O3 it gives:
> _Z2pcy:
> .LFB0:
>         .cfi_startproc
>         fmov    d0, x0
>         cmp     x0, 0
>         cnt     v0.8b, v0.8b
>         addv    b0, v0.8b
>         umov    w0, v0.b[0]
>         and     x0, x0, 255
>         csel    w0, w0, wzr, ne
>         ret
> 
> which is suboptimal. It seems that phiopt3 manages to optimise the C version
> better. The GIMPLE dumps just before the phiopt pass are:
> For the C (good version):
> 
>   int c;
>   int _7;
> 
>   <bb 2> [local count: 118111601]:
>   if (b_4(D) != 0)
>     goto <bb 3>; [89.00%]
>   else
>     goto <bb 4>; [11.00%]
> 
>   <bb 3> [local count: 105119324]:
>   _7 = __builtin_popcountl (b_4(D));
> 
>   <bb 4> [local count: 118111601]:
>   # c_12 = PHI <0(2), _7(3)>
>   return c_12;
> 
> 
> For the C++ (bad version):
> 
>   int c;
>   int _7;
> 
>   <bb 2> [local count: 118111601]:
>   if (b_4(D) == 0)
>     goto <bb 4>; [11.00%]
>   else
>     goto <bb 3>; [89.00%]
> 
>   <bb 3> [local count: 105119324]:
>   _7 = __builtin_popcountl (b_4(D));
> 
>   <bb 4> [local count: 118111601]:
>   # c_12 = PHI <0(2), _7(3)>
>   return c_12;
> 
> As you can see the order of the gotos and the jump conditions is inverted.
> 
> It seems to me that the two are equivalent and GCC could be doing a better
> job of optimising.
> 
> Can we improve phiopt to handle this more effectively?

Thanks for the test case. I will look at it.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]