This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PING][PATCH][REVISED] Fix PR middle-end/PR28690, modify swap_commutative_operands_p
- From: "H.J. Lu" <hjl at lucon dot org>
- To: Pat Haugen <pthaugen at us dot ibm dot com>
- Cc: Peter Bergner <bergner at vnet dot ibm dot com>, Paolo Bonzini <bonzini at gnu dot org>, Dave Korn <dave dot korn at artimi dot com>, gcc-patches at gcc dot gnu dot org, Ian Lance Taylor <iant at google dot com>, Rask Ingemann Lambertsen <rask at sygehus dot dk>, Richard Guenther <richard dot guenther at gmail dot com>
- Date: Wed, 11 Jul 2007 15:32:08 -0700
- Subject: Re: [PING][PATCH][REVISED] Fix PR middle-end/PR28690, modify swap_commutative_operands_p
- References: <20070626230126.GA14180@lucon.org> <OFCCCEC8E3.B30CC185-ON86257315.006E3FA2-86257315.006FC7FB@us.ibm.com>
On Wed, Jul 11, 2007 at 03:19:11PM -0500, Pat Haugen wrote:
> "H. J. Lu" <hjl@lucon.org> wrote on 06/26/2007 06:01:26 PM:
>
> > I got followings on Linux/Intel64:
> >
> > New: r125740 + r125920 patch + PR28690 patch
> > Old: r125740 + r125920 patch
> >
> > (New - Old)/Old
> > 200.sixtrack -7.37606%
>
> I did some looking into sixtrack using HJ's binaries. Looks like most of
> the degradation is coming from a single loop. I've included oprofile
> annotations of the two different versions of the loop measuring cycles, but
> the main difference appears isolated to the following lines. Hopefully
> someone with more knowledge of the architecture than myself can comment on
> the reason one code sequence is better than the other.
>
>
> Hits %
> ------ ------
> Base:
> 50709 2.8611 : 4afd4b: movsd %xmm2,15326069(%rip) #
> 134d8c8 <crkveuk.2248>
> 2 1.1e-04 : 4afd53: mulsd 0x35eed00(,%rax,8),%xmm0
> 101904 5.7495 : 4afd5c: addsd %xmm1,%xmm0
>
> Patched:
> 100275 5.4151 : 4b01ba: movapd %xmm7,%xmm1
> 12062 0.6514 : 4b01be: mulsd 0x3441800(,%rax,8),%xmm1
> 96240 5.1972 : 4b01c7: addsd %xmm1,%xmm0
>
>
I suspect that the change makes it harder for OOO scheduler to hide
memory/insn latency.
There are several changes in the patch. Is that possible to break
them down into smaller independent pieces so that we can evaluate
them individually?
Thanks.
H.J.