This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/69274] [6 Regression] 435.gromacs performance regression after r231814 on x86 Haswell and bdver2
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 04 Feb 2016 15:44:13 +0000
- Subject: [Bug target/69274] [6 Regression] 435.gromacs performance regression after r231814 on x86 Haswell and bdver2
- Auto-submitted: auto-generated
- References: <bug-69274-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69274
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Samples: 2M of event 'cycles', Event count (approx.): 1928893785632
36.40% gromacs_base.am gromacs_base.amd64-m64-gcc42-nn [.] inl1130_
28.60% gromacs_peak.am gromacs_peak.amd64-m64-gcc42-nn [.] inl1130_
7.51% gromacs_base.am gromacs_base.amd64-m64-gcc42-nn [.] search_neighbour
7.38% gromacs_peak.am gromacs_peak.amd64-m64-gcc42-nn [.] search_neighbour
2.00% gromacs_base.am gromacs_base.amd64-m64-gcc42-nn [.] inl1100_
2.00% gromacs_peak.am gromacs_peak.amd64-m64-gcc42-nn [.] inl1100_
so that's innerf.f
Ok, so I spot one non-scheduling/RA difference:
- vmovss 52(%rsp), %xmm6
- vsubss -4(%r13,%rdi,4), %xmm6, %xmm6
-.LVL250:
+ vsubss -4(%r13,%rdi,4), %xmm6, %xmm4
+.LVL253:
leaq (%r15,%rsi,4), %r12
.loc 1 662 0
- vmulss %xmm4, %xmm4, %xmm2
- vmovss %xmm4, 24(%rsp)
- vmovss %xmm5, 20(%rsp)
- vmovss %xmm6, 16(%rsp)
- vfmadd231ss %xmm5, %xmm5, %xmm2
- vfmadd231ss %xmm6, %xmm6, %xmm2
-.LVL251:
+ vmulss %xmm2, %xmm2, %xmm1
+ vmovss %xmm2, 24(%rsp)
+ vmovss %xmm3, 20(%rsp)
+ vmovss %xmm4, 16(%rsp)
+ vfmadd231ss %xmm3, %xmm3, %xmm1
+ vmovaps %xmm1, %xmm7
+ vfmadd231ss %xmm4, %xmm4, %xmm7
+.LVL254:
thus there seems to be some more spilling.