This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: An unusual Performance approach using Synthetic registers
- From: Andy Walker <ja_walker at earthlink dot net>
- To: dewar at gnat dot com (Robert Dewar),lord at emf dot net
- Cc: denisc at overta dot ru,gcc at gcc dot gnu dot org
- Date: Sun, 5 Jan 2003 22:39:08 -0600
- Subject: Re: An unusual Performance approach using Synthetic registers
- References: <20030105130259.D1FEBF2D87@nile.gnat.com>
Again, Thank You for your time and consideration in making responses.
First: you have made several requests that I somehow demonstrate that
something like the Synthetic register approach is workable. Fair enough. I
will make a stab at it, if for no other reason than that you are the only
person to show me the courtesy of answering simple questions.
How about this for a thought experiment: I call up Bell Labs and have them do
it for me. As far as I am concerned, their corporate integrity is at the
top, and a paper from them is routinely more reliable than most reviewed
works in scientific journals. I will ask them to have one of their
researchers modify an old version of gcc to use a few memory locations as
artificial registers. Then run some timing tests on compiled code just to
see if it makes any difference. If they report that it makes an improvement,
I will go ahead with my investigations. If not, I will not need to waste
anymore posts to this list about a silly, naive, and useless approach.
OK. Time is up. The dear souls at Bell Labs used their crystal ball,
figured out that I would need this information, and have conveniently posted
it here: http://cm.bell-labs.com/cm/cs/what/smlnj/compiler-notes/k32.ps .
I am releived to report that their investigation was a success. Lal George's
implementation is different than mine in several respects. I am satisfied
that it is near enough to "Synthetic registers" to validate my investigation.
No guarantee of success or value, but a solid indication that value might
exist.
On Sunday 05 January 2003 07:02 am, Robert Dewar wrote:
<snip>
> Remember that
> "retrieving from memory" is *EXACTLY* the same code sequence as reading
> a synthetic register, assuming both are on the current stack frame.
I am not at all convinced of this. I surmise that Reload knows nothing about
the meaning of a piece of data in a stack slot. I conclude that because RTL
does not keep that information. Reload's only option, then, is to physically
move the data back into the register and try the specified instruction. This
should pretty well obliterate any previous attempts at pipeline/instruction
scheduling, and generate a tremendous amount of pipeline stalls. And gcc
does.
Simulated comparison of a loop end:
w/o Synth
...
mov eax,[StackSlot27] ; Load the increment from spill
mov edx,[StackSlot23] ; Load the index -- spilled for lack of registers
lea ecx,[eax + edx] ; Nicely optimized "add"
mov edx,[StackSlot28]; Load the loop limit from spill
cmp ecx,edx ; Compare the index to the loop limit.
...
w/ Synth
...
add ecx,[ebp -20] ; Add synthreg 27, the increment, to the index.
cmp ecx,[ebp -16]; Compare the index to the loop limit in synthreg 28.
...
This is my concept. Is it reality? I will not know until I have tried it.
<snip>
> Once again, I would just love to see one (1) example of what is being
> talked about here.
Me too.
>Let's see a small kernel in source, the current GCC code
> being generated, and the amazing improved code that can be generated with
> synthetic registers (which are nothing more than local memory locations).
> At this stage I really can't imagine such an example, so, assuming this is
> a failure of my imagination (I am not the only one with this handicap),
> please enlighten with one convincing example :-)
IIUYC, you want me to hand compile a small kernel source, and compare it to
GCC, after I have repeatedly stated that the smaller the module, the less
value there is in Synthetic registers? Or would you prefer that I hand
compile a large kernel source, wildly guessing all along as to how gcc will
REALLY do the allocations, to really demonstrate any value of the
approach?
Thank you for your suggestion, but no.
Andy