This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-02-06
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so what I see is (good assembly):

.L4:
        movq    (%r15,%rax,8), %rcx
        vmovsd  (%rcx,%rbx), %xmm0
        vandpd  %xmm3, %xmm0, %xmm0
        vucomisd        %xmm1, %xmm0
        vmaxsd  %xmm1, %xmm0, %xmm1
        cmova   %eax, %edx
        addq    $1, %rax
        cmpl    %eax, %r14d
        jg      .L4

vs.

.L4:
        movq    (%r15,%rax,8), %rdx
        movl    %eax, %edi
        addq    $1, %rax
        vmovsd  (%rdx,%rbx), %xmm0
        vandpd  %xmm3, %xmm0, %xmm0
        vucomisd        %xmm1, %xmm0
        jbe     .L56
        cmpl    %eax, %r14d
        jle     .L68
        vmovapd %xmm0, %xmm1
        movl    %edi, %r8d
        jmp     .L4
        .p2align 4,,10
        .p2align 3
.L56:
        cmpl    %eax, %r14d
        jg      .L4
...
.L68:
        movl    %edi, %r8d
        jmp     .L8
...

which is split-paths going amok again on the no longer GIMPLE if-converted IL:

  <bb 6> [16.86%]:
  # jp_62 = PHI <j_94(5), jp_176(8)>
  # t_184 = PHI <t_70(5), t_175(8)>
  # ivtmp.59_327 = PHI <ivtmp.59_329(5), ivtmp.59_328(8)>
  i_185 = (int) ivtmp.59_327;
  _180 = MEM[base: A_69(D), index: ivtmp.59_327, step: 8, offset: 0B];
  _179 = _180 + _2;
  _178 = *_179;
  ab_177 = ABS_EXPR <_178>;
  if (ab_177 > t_184)
    goto <bb 7>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 7> [8.43%]:

  <bb 8> [16.86%]:
  # jp_176 = PHI <jp_62(6), i_185(7)>
  # t_175 = PHI <t_184(6), ab_177(7)>
  ivtmp.59_328 = ivtmp.59_327 + 1;
  if (ivtmp.59_328 != _339)
    goto <bb 6>; [85.00%]


so I wonder whether -fno-split-paths restores the performance?  It's the
threading heuristic again btw. and both preds of the joiner are empty.

The loop is basically a max-index reduction.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]