[Bug tree-optimization/79389] [7 Regression] 30% performance regression in SciMark2 MonteCarlo
ubizjak at gmail dot com
gcc-bugzilla@gcc.gnu.org
Mon Feb 6 14:43:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79389
--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Please note that clang if-converts:
if ( x*x + y*y <= 1.0)
under_curve ++;
to SETcc + ADD:
5,64 │ movsd (%rsp),%xmm1
│ mulsd %xmm1,%xmm1
│ mulsd %xmm0,%xmm0
23,59 │ addsd %xmm1,%xmm0
18,54 │ ucomis 0xb32(%rip),%xmm0 # 403190 <TINY_LU_SIZE+0x258>
10,88 │ setbe %al
12,31 │ movzbl %al,%eax
10,68 │ add %eax,%ebp
7,11 │ dec %ebx
│ ↑ jne 30
OTOH, gcc emits LEA + CMOVE to conditionally increase under_curve:
7,46 │ movsd 0x8(%rsp),%xmm1
│ lea 0x1(%rbx),%eax
│ movsd 0x861(%rip),%xmm2 # 403448
<RESOLUTION_DEFAULT+0x48>
│ mulsd %xmm0,%xmm0
25,81 │ mulsd %xmm1,%xmm1
│ addsd %xmm0,%xmm1
22,23 │ comisd %xmm1,%xmm2
8,43 │ cmovae %eax,%ebx
21,60 │ add $0x1,%ebp
│ cmp %ebp,%r13d
│ ↑ jne 30
LEA insn moves ebx+1 to eax and CMOVE effectively decides between ebx and
ebx+1. Considering that CMOVE is problematic for some x86 targets, IMO clang
code is more "universal" for these short loops, since CMOVE is avoided.
Note also that clang reverses the loop to use DEC insn instead of ADD/CMP.
Putting loop reversal aside, this is the case of missing if-conversion.
More information about the Gcc-bugs
mailing list