This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #20 from Daniel Elliott <cpphackster at gmail dot com> ---
cool. just tried that.

gets gcc down to 

GCC:
-------------------------------------------------------
ifStandard          596892 ns       
ifNoConditional     148075 ns <--- with "result[n] = tab[item > .5f];" trick

Clang:(no change)
ifStandard           88777 ns   
ifNoConditional      89818 ns      

------------------------------------------------------

still clang is 1.64x faster. had a look at the assembly. My limited
understanding makes me think that the ucomiss is not fully vectorized and the
clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss 0x218b4(%rip),%xmm0).
Feel free to correct me if I am wrong.


clang:

       movss  0x61a80(%r15,%rcx,1),%xmm1
22.95% xor    %eax,%eax
       ucomiss %xmm0,%xmm1
13.81% seta   %al
22.55% mov    0x4335d0(,%rax,4),%eax
4.31%  mov    %eax,0x61a80(%rbx,%rcx,1)
22.03% movss  0x61a84(%rbx,%rcx,1),%xmm1
0.40%  movss  %xmm1,0xc(%rsp)
13.93% add    $0x4,%rcx
       jne    404b50 <ifNoConditional(benchmark::State&)+0x180>


gcc:

14.45% movss  0x0(%r13,%rax,1),%xmm0
0.18%  xor    %edx,%edx
21.27% ucomiss 0x218b4(%rip),%xmm0        # 426bf4 <_IO_stdin_used+0x34>
16.84% seta   %dl
21.79% movss  0x8(%rsp,%rdx,4),%xmm0
1.41%  movss  %xmm0,(%r12,%rax,1)
23.94% add    $0x4,%rax
       cmp    $0x61a80,%rax
       jne    405330 <ifNoConditional(benchmark::State&)+0x160>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]