This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #11 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
With Jonathon's suggested change, copied in to the original poster's framework
(without -fno-trapping-math), Clang hot loop ( score: 165065
http://quick-bench.com/6NaD8ay0f8qMh9n0aMriYEiuKNA ) is:

0.16%  movups 0x61a80(%r15,%rax,4),%xmm6
1.15%  movups 0x61a90(%r15,%rax,4),%xmm7
0.60%  movaps %xmm1,%xmm3
5.44%  cmpltps %xmm6,%xmm3
0.44%  movaps %xmm1,%xmm6
0.40%  cmpltps %xmm7,%xmm6
0.44%  movaps %xmm5,%xmm7
4.97%  andps  %xmm3,%xmm7
0.20%  andnps %xmm4,%xmm3
0.36%  orps   %xmm7,%xmm3
1.04%  movaps %xmm5,%xmm7
4.97%  andps  %xmm6,%xmm7
0.11%  andnps %xmm4,%xmm6
4.95%  orps   %xmm7,%xmm6
5.53%  movups %xmm3,0x61a80(%rbx,%rax,4)
0.47%  movups %xmm6,0x61a90(%rbx,%rax,4)
4.42%  movups 0x61aa0(%r15,%rax,4),%xmm3
20.42% movups 0x61ab0(%r15,%rax,4),%xmm6
1.00%  movaps %xmm1,%xmm7
0.49%  cmpltps %xmm3,%xmm7
9.79%  movaps %xmm1,%xmm3
0.16%  cmpltps %xmm6,%xmm3
2.26%  movaps %xmm5,%xmm6
0.60%  andps  %xmm7,%xmm6
4.20%  andnps %xmm4,%xmm7
1.18%  orps   %xmm6,%xmm7
2.22%  movaps %xmm5,%xmm6
0.47%  andps  %xmm3,%xmm6
4.24%  andnps %xmm4,%xmm3
4.88%  movups %xmm7,0x61aa0(%rbx,%rax,4)
0.27%  orps   %xmm6,%xmm3
5.22%  movups %xmm3,0x61ab0(%rbx,%rax,4)
6.02%  add    $0x10,%rax
       jne    405b30 <ifStandard(benchmark::State&)+0x4a0>

GCC hot loop ( score: 2385754
http://quick-bench.com/ehLe-aqkpXkkx2sHLd6TWq_p4g4 ) is:

0.56%  movss  0x0(%rbp,%rdx,1),%xmm0
1.47%  xor    %eax,%eax
2.00%  subss  %xmm2,%xmm0
7.02%  ucomiss %xmm1,%xmm0
6.77%  seta   %al
4.96%  xor    %ecx,%ecx
0.25%  ucomiss %xmm0,%xmm1
0.84%  pxor   %xmm0,%xmm0
0.09%  seta   %cl
5.40%  sub    %ecx,%eax
3.22%  cvtsi2ss %eax,%xmm0
9.87%  ucomiss %xmm0,%xmm1
6.53%  ja     4053a8 <ifNoConditional(benchmark::State&)+0x1d8>
10.24% mulss  %xmm4,%xmm0
11.55% addss  %xmm3,%xmm0
5.46%  movss  %xmm0,(%rbx,%rdx,1)
2.00%  add    $0x4,%rdx
       cmp    $0x61a80,%rdx
       jne    405350 <ifNoConditional(benchmark::State&)+0x180>

Daniel Elliott does that better match your expectations? If so, I think this
can be resolved as missed optimization of invalid code.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]