This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations
- From: "cpphackster at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 21 Apr 2018 05:31:08 +0000
- Subject: [Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations
- Auto-submitted: auto-generated
- References: <bug-85466-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466
--- Comment #20 from Daniel Elliott <cpphackster at gmail dot com> ---
cool. just tried that.
gets gcc down to
GCC:
-------------------------------------------------------
ifStandard 596892 ns
ifNoConditional 148075 ns <--- with "result[n] = tab[item > .5f];" trick
Clang:(no change)
ifStandard 88777 ns
ifNoConditional 89818 ns
------------------------------------------------------
still clang is 1.64x faster. had a look at the assembly. My limited
understanding makes me think that the ucomiss is not fully vectorized and the
clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss 0x218b4(%rip),%xmm0).
Feel free to correct me if I am wrong.
clang:
movss 0x61a80(%r15,%rcx,1),%xmm1
22.95% xor %eax,%eax
ucomiss %xmm0,%xmm1
13.81% seta %al
22.55% mov 0x4335d0(,%rax,4),%eax
4.31% mov %eax,0x61a80(%rbx,%rcx,1)
22.03% movss 0x61a84(%rbx,%rcx,1),%xmm1
0.40% movss %xmm1,0xc(%rsp)
13.93% add $0x4,%rcx
jne 404b50 <ifNoConditional(benchmark::State&)+0x180>
gcc:
14.45% movss 0x0(%r13,%rax,1),%xmm0
0.18% xor %edx,%edx
21.27% ucomiss 0x218b4(%rip),%xmm0 # 426bf4 <_IO_stdin_used+0x34>
16.84% seta %dl
21.79% movss 0x8(%rsp,%rdx,4),%xmm0
1.41% movss %xmm0,(%r12,%rax,1)
23.94% add $0x4,%rax
cmp $0x61a80,%rax
jne 405330 <ifNoConditional(benchmark::State&)+0x160>