This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/86544] Popcount detection generates different code on C and C++
- From: "kugan at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 17 Jul 2018 09:30:07 +0000
- Subject: [Bug tree-optimization/86544] Popcount detection generates different code on C and C++
- Auto-submitted: auto-generated
- References: <bug-86544-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544
--- Comment #1 from kugan at gcc dot gnu.org ---
(In reply to ktkachov from comment #0)
> Great to see that GCC now detects the popcount loop in PR 82479!
> I am seeing some curious differences between gcc and g++ though.
> int
> pc (unsigned long long b)
> {
> int c = 0;
>
> while (b) {
> b &= b - 1;
> c++;
> }
>
> return c;
> }
>
> If compiled with gcc -O3 on aarch64 this gives:
> pc:
> fmov d0, x0
> cnt v0.8b, v0.8b
> addv b0, v0.8b
> umov w0, v0.b[0]
> ret
>
> whereas if compiled with g++ -O3 it gives:
> _Z2pcy:
> .LFB0:
> .cfi_startproc
> fmov d0, x0
> cmp x0, 0
> cnt v0.8b, v0.8b
> addv b0, v0.8b
> umov w0, v0.b[0]
> and x0, x0, 255
> csel w0, w0, wzr, ne
> ret
>
> which is suboptimal. It seems that phiopt3 manages to optimise the C version
> better. The GIMPLE dumps just before the phiopt pass are:
> For the C (good version):
>
> int c;
> int _7;
>
> <bb 2> [local count: 118111601]:
> if (b_4(D) != 0)
> goto <bb 3>; [89.00%]
> else
> goto <bb 4>; [11.00%]
>
> <bb 3> [local count: 105119324]:
> _7 = __builtin_popcountl (b_4(D));
>
> <bb 4> [local count: 118111601]:
> # c_12 = PHI <0(2), _7(3)>
> return c_12;
>
>
> For the C++ (bad version):
>
> int c;
> int _7;
>
> <bb 2> [local count: 118111601]:
> if (b_4(D) == 0)
> goto <bb 4>; [11.00%]
> else
> goto <bb 3>; [89.00%]
>
> <bb 3> [local count: 105119324]:
> _7 = __builtin_popcountl (b_4(D));
>
> <bb 4> [local count: 118111601]:
> # c_12 = PHI <0(2), _7(3)>
> return c_12;
>
> As you can see the order of the gotos and the jump conditions is inverted.
>
> It seems to me that the two are equivalent and GCC could be doing a better
> job of optimising.
>
> Can we improve phiopt to handle this more effectively?
Thanks for the test case. I will look at it.