This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 06 Feb 2017 13:51:50 +0000
Subject: [Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550
Auto-submitted: auto-generated
References: <bug-79390-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-02-06
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so what I see is (good assembly):

.L4:
        movq    (%r15,%rax,8), %rcx
        vmovsd  (%rcx,%rbx), %xmm0
        vandpd  %xmm3, %xmm0, %xmm0
        vucomisd        %xmm1, %xmm0
        vmaxsd  %xmm1, %xmm0, %xmm1
        cmova   %eax, %edx
        addq    $1, %rax
        cmpl    %eax, %r14d
        jg      .L4

vs.

.L4:
        movq    (%r15,%rax,8), %rdx
        movl    %eax, %edi
        addq    $1, %rax
        vmovsd  (%rdx,%rbx), %xmm0
        vandpd  %xmm3, %xmm0, %xmm0
        vucomisd        %xmm1, %xmm0
        jbe     .L56
        cmpl    %eax, %r14d
        jle     .L68
        vmovapd %xmm0, %xmm1
        movl    %edi, %r8d
        jmp     .L4
        .p2align 4,,10
        .p2align 3
.L56:
        cmpl    %eax, %r14d
        jg      .L4
...
.L68:
        movl    %edi, %r8d
        jmp     .L8
...

which is split-paths going amok again on the no longer GIMPLE if-converted IL:

  <bb 6> [16.86%]:
  # jp_62 = PHI <j_94(5), jp_176(8)>
  # t_184 = PHI <t_70(5), t_175(8)>
  # ivtmp.59_327 = PHI <ivtmp.59_329(5), ivtmp.59_328(8)>
  i_185 = (int) ivtmp.59_327;
  _180 = MEM[base: A_69(D), index: ivtmp.59_327, step: 8, offset: 0B];
  _179 = _180 + _2;
  _178 = *_179;
  ab_177 = ABS_EXPR <_178>;
  if (ab_177 > t_184)
    goto <bb 7>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 7> [8.43%]:

  <bb 8> [16.86%]:
  # jp_176 = PHI <jp_62(6), i_185(7)>
  # t_175 = PHI <t_184(6), ab_177(7)>
  ivtmp.59_328 = ivtmp.59_327 + 1;
  if (ivtmp.59_328 != _339)
    goto <bb 6>; [85.00%]


so I wonder whether -fno-split-paths restores the performance?  It's the
threading heuristic again btw. and both preds of the joiner are empty.

The loop is basically a max-index reduction.

References:
- [Bug tree-optimization/79390] New: 10% performance drop in SciMark2 LU after r242550
  - From: krister.walfridsson at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]