This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: An unusual Performance approach using Synthetic registers

From: "Marcel Cox" <marcel_cox at hotmail dot com>
To: gcc at gcc dot gnu dot org
Date: Wed, 8 Jan 2003 09:37:04 +0100
Subject: Re: An unusual Performance approach using Synthetic registers
References: <DAV14qQkvUmmB61ZAe30000ca12@hotmail.com> <Pine.LNX.4.21.0301071341070.18968-100000@mail.kloo.net>

> The stack reordering pass posted by Sanjiv Gupta can do this.
> This was posted about a few days ago in gcc-patches.

Thanks for the pointer. I do not actively monitor the GCC-PATCHES list, so I
didn't know about it. Anyway, this work shows that at least there seems to
be some beneft in rearranging the stack layout.

> What you're describing is actually bad on the Pentium, and probably
> subsequent implementations as well.
> The Pentium can dual-issue loads as long as they reference separate cache
> ways. So, manually sorting the stack so contiguous accesses are localized
> increases the probability of the loads accessing the same cache way, thus
> decreasing the probability of single-issuing.
>
> You guys really should read the processor manual instead of making
> incorrect assumptions about what features would improve the code quality.
>

I don't know if it is bad for Pentium II or higher processors. The
Optimization guides from Intel for Pentium II/III and for Pentium 4
processors don't seem to mention that multiple accesses to a same cache line
are bad from the performance point of view.
http://developer.intel.com/design/pentiumii/manuals/245127.htm
http://developer.intel.com/design/pentium4/manuals/248966.htm


> > 2) Running the RA over the stack slots will cause the slots to be reused
> > when the life range of variables does not overlap. This even increases
the
> > compactness that already gives the benefit of point 1. Also, overall
> > reducing stack usage will always be a small gain.
>
> The stack slots are already reused.

Are you sure about that? I though some time ago, it was mentioned on this
list that this was not the case. I guess I will try to write a small test
program to check this.

> > 3) The "compact" memory access pattern and the reuse of stack slots
might
> > increase the opportunity for the processor to use "shortcut" features in
> > memory access. For example successive writes to the same memory location
> > might be optimised to a single write, or read access to a memory
location
> > may be fast if there is still a pending write on the same location
>
> The first feature you're describing is called "dead write elimination" and
> is already done in gcc.

There are a number of cases where the compiler can't eiliminate the store.
For example a store from within a loop, and another store after the loop, or
a store / load  / store cycle where the second store occurs before the first
one has retired.

> The second feature you're describing is called "write FIFO snooping" and
> is a hardware feature.

Intel calls it "store-forwarding"


Marcel

References:
- Re: An unusual Performance approach using Synthetic registers
  - From: Marcel Cox
- Re: An unusual Performance approach using Synthetic registers
  - From: tm_gccmail

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]