[Bug tree-optimization/79389] [7 Regression] 30% performance regression in SciMark2 MonteCarlo

Mon Feb 6 14:43:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79389

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Please note that clang if-converts:

            if ( x*x + y*y <= 1.0)
                 under_curve ++;

to SETcc + ADD:

  5,64 │      movsd  (%rsp),%xmm1
       │      mulsd  %xmm1,%xmm1
       │      mulsd  %xmm0,%xmm0
 23,59 │      addsd  %xmm1,%xmm0
 18,54 │      ucomis 0xb32(%rip),%xmm0        # 403190 <TINY_LU_SIZE+0x258>
 10,88 │      setbe  %al
 12,31 │      movzbl %al,%eax
 10,68 │      add    %eax,%ebp
  7,11 │      dec    %ebx
       │    ↑ jne    30

OTOH, gcc emits LEA + CMOVE to conditionally increase under_curve:

  7,46 │      movsd  0x8(%rsp),%xmm1
       │      lea    0x1(%rbx),%eax
       │      movsd  0x861(%rip),%xmm2        # 403448
<RESOLUTION_DEFAULT+0x48>
       │      mulsd  %xmm0,%xmm0
 25,81 │      mulsd  %xmm1,%xmm1
       │      addsd  %xmm0,%xmm1
 22,23 │      comisd %xmm1,%xmm2
  8,43 │      cmovae %eax,%ebx
 21,60 │      add    $0x1,%ebp
       │      cmp    %ebp,%r13d
       │    ↑ jne    30

LEA insn moves ebx+1 to eax and CMOVE effectively decides between ebx and
ebx+1. Considering that CMOVE is problematic for some x86 targets, IMO clang
code is more "universal" for these short loops, since CMOVE is avoided.

Note also that clang reverses the loop to use DEC insn instead of ADD/CMP.

Putting loop reversal aside, this is the case of missing if-conversion.