[PING][PATCH][REVISED] Fix PR middle-end/PR28690, modify swap_commutative_operands_p

H.J. Lu hjl@lucon.org
Wed Jul 11 23:33:00 GMT 2007


On Wed, Jul 11, 2007 at 03:19:11PM -0500, Pat Haugen wrote:
> "H. J. Lu" <hjl@lucon.org> wrote on 06/26/2007 06:01:26 PM:
> 
> > I got followings on Linux/Intel64:
> >
> > New: r125740 + r125920 patch + PR28690 patch
> > Old: r125740 + r125920 patch
> >
> >                              (New - Old)/Old
> > 200.sixtrack                     -7.37606%
> 
> I did some looking into sixtrack using HJ's binaries. Looks like most of
> the degradation is coming from a single loop.  I've included oprofile
> annotations of the two different versions of the loop measuring cycles, but
> the main difference appears isolated to the following lines.  Hopefully
> someone with more knowledge of the architecture than myself can comment on
> the reason one code sequence is better than the other.
> 
> 
> Hits      %
> ------  ------
> Base:
>  50709  2.8611 :  4afd4b:       movsd  %xmm2,15326069(%rip)        #
> 134d8c8 <crkveuk.2248>
>      2 1.1e-04 :  4afd53:       mulsd  0x35eed00(,%rax,8),%xmm0
> 101904  5.7495 :  4afd5c:       addsd  %xmm1,%xmm0
> 
> Patched:
> 100275  5.4151 :  4b01ba:       movapd %xmm7,%xmm1
>  12062  0.6514 :  4b01be:       mulsd  0x3441800(,%rax,8),%xmm1
>  96240  5.1972 :  4b01c7:       addsd  %xmm1,%xmm0
> 
> 

I suspect that the change makes it harder for OOO scheduler to hide
memory/insn latency.

There are several changes in the patch. Is that possible to break
them down into smaller independent pieces so that we can evaluate
them individually?

Thanks.


H.J.



More information about the Gcc-patches mailing list