This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

From: "cpphackster at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Sat, 21 Apr 2018 05:31:08 +0000
Subject: [Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations
Auto-submitted: auto-generated
References: <bug-85466-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #20 from Daniel Elliott <cpphackster at gmail dot com> ---
cool. just tried that.

gets gcc down to 

GCC:
-------------------------------------------------------
ifStandard          596892 ns       
ifNoConditional     148075 ns <--- with "result[n] = tab[item > .5f];" trick

Clang:(no change)
ifStandard           88777 ns   
ifNoConditional      89818 ns      

------------------------------------------------------

still clang is 1.64x faster. had a look at the assembly. My limited
understanding makes me think that the ucomiss is not fully vectorized and the
clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss 0x218b4(%rip),%xmm0).
Feel free to correct me if I am wrong.


clang:

       movss  0x61a80(%r15,%rcx,1),%xmm1
22.95% xor    %eax,%eax
       ucomiss %xmm0,%xmm1
13.81% seta   %al
22.55% mov    0x4335d0(,%rax,4),%eax
4.31%  mov    %eax,0x61a80(%rbx,%rcx,1)
22.03% movss  0x61a84(%rbx,%rcx,1),%xmm1
0.40%  movss  %xmm1,0xc(%rsp)
13.93% add    $0x4,%rcx
       jne    404b50 <ifNoConditional(benchmark::State&)+0x180>


gcc:

14.45% movss  0x0(%r13,%rax,1),%xmm0
0.18%  xor    %edx,%edx
21.27% ucomiss 0x218b4(%rip),%xmm0        # 426bf4 <_IO_stdin_used+0x34>
16.84% seta   %dl
21.79% movss  0x8(%rsp,%rdx,4),%xmm0
1.41%  movss  %xmm0,(%r12,%rax,1)
23.94% add    $0x4,%rax
       cmp    $0x61a80,%rax
       jne    405330 <ifNoConditional(benchmark::State&)+0x160>

References:
- [Bug c++/85466] New: Performance is slow when doing 'branchless' conditional style math operations
  - From: cpphackster at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]