This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 06 Feb 2017 13:51:50 +0000
- Subject: [Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550
- Auto-submitted: auto-generated
- References: <bug-79390-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2017-02-06
CC| |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so what I see is (good assembly):
.L4:
movq (%r15,%rax,8), %rcx
vmovsd (%rcx,%rbx), %xmm0
vandpd %xmm3, %xmm0, %xmm0
vucomisd %xmm1, %xmm0
vmaxsd %xmm1, %xmm0, %xmm1
cmova %eax, %edx
addq $1, %rax
cmpl %eax, %r14d
jg .L4
vs.
.L4:
movq (%r15,%rax,8), %rdx
movl %eax, %edi
addq $1, %rax
vmovsd (%rdx,%rbx), %xmm0
vandpd %xmm3, %xmm0, %xmm0
vucomisd %xmm1, %xmm0
jbe .L56
cmpl %eax, %r14d
jle .L68
vmovapd %xmm0, %xmm1
movl %edi, %r8d
jmp .L4
.p2align 4,,10
.p2align 3
.L56:
cmpl %eax, %r14d
jg .L4
...
.L68:
movl %edi, %r8d
jmp .L8
...
which is split-paths going amok again on the no longer GIMPLE if-converted IL:
<bb 6> [16.86%]:
# jp_62 = PHI <j_94(5), jp_176(8)>
# t_184 = PHI <t_70(5), t_175(8)>
# ivtmp.59_327 = PHI <ivtmp.59_329(5), ivtmp.59_328(8)>
i_185 = (int) ivtmp.59_327;
_180 = MEM[base: A_69(D), index: ivtmp.59_327, step: 8, offset: 0B];
_179 = _180 + _2;
_178 = *_179;
ab_177 = ABS_EXPR <_178>;
if (ab_177 > t_184)
goto <bb 7>; [50.00%]
else
goto <bb 8>; [50.00%]
<bb 7> [8.43%]:
<bb 8> [16.86%]:
# jp_176 = PHI <jp_62(6), i_185(7)>
# t_175 = PHI <t_184(6), ab_177(7)>
ivtmp.59_328 = ivtmp.59_327 + 1;
if (ivtmp.59_328 != _339)
goto <bb 6>; [85.00%]
so I wonder whether -fno-split-paths restores the performance? It's the
threading heuristic again btw. and both preds of the joiner are empty.
The loop is basically a max-index reduction.