This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: An unusual Performance approach using Synthetic registers

From: Andy Walker <ja_walker at earthlink dot net>
To: dewar at gnat dot com (Robert Dewar),lord at emf dot net
Cc: denisc at overta dot ru,gcc at gcc dot gnu dot org
Date: Sun, 5 Jan 2003 22:39:08 -0600
Subject: Re: An unusual Performance approach using Synthetic registers
References: <20030105130259.D1FEBF2D87@nile.gnat.com>

Again, Thank You for your time and consideration in making responses.

First: you have made several requests that I somehow demonstrate that 
something like the Synthetic register approach is workable.  Fair enough.  I 
will make a stab at it, if for no other reason than that you are the only 
person to show me the courtesy of answering simple questions.  

How about this for a thought experiment: I call up Bell Labs and have them do 
it for me.  As far as I am concerned, their corporate integrity is at the 
top, and a paper from them is routinely more reliable than most reviewed 
works in scientific journals.  I will ask them to have one of their 
researchers modify an old version of gcc to use a few memory locations as 
artificial registers.  Then run some timing tests on compiled code just to 
see if it makes any difference.  If they report that it makes an improvement, 
I will go ahead with my investigations.  If not, I will not need to waste 
anymore posts to this list about a silly, naive, and useless approach.  

OK.  Time is up.  The dear souls at Bell Labs used their crystal ball, 
figured out that I would need this information, and have conveniently posted 
it here: http://cm.bell-labs.com/cm/cs/what/smlnj/compiler-notes/k32.ps .
I am releived to report that their investigation was a success.  Lal George's 
implementation is different than mine in several respects.  I am satisfied 
that it is near enough to "Synthetic registers" to validate my investigation. 
 No guarantee of success or value, but a solid indication that value might 
exist.     

On Sunday 05 January 2003 07:02 am, Robert Dewar wrote:
<snip>
> Remember that
> "retrieving from memory" is *EXACTLY* the same code sequence as reading
> a synthetic register, assuming both are on the current stack frame.

I am not at all convinced of this.  I surmise that Reload knows nothing about 
the meaning of a piece of data in a stack slot.  I conclude that because RTL 
does not keep that information.  Reload's only option, then, is to physically 
move the data back into the register and try the specified instruction.  This 
should pretty well obliterate any previous attempts at pipeline/instruction 
scheduling, and generate a tremendous amount of pipeline stalls.  And gcc 
does.  

Simulated comparison of a loop end:

w/o Synth 
...
mov eax,[StackSlot27] ; Load the increment from spill   
mov edx,[StackSlot23] ; Load the index -- spilled for lack of registers
lea ecx,[eax + edx] ;      Nicely optimized "add" 
mov edx,[StackSlot28];  Load the loop limit from spill
cmp ecx,edx  ;             Compare the index to the loop limit.
...

w/ Synth
...
add ecx,[ebp -20] ;  Add synthreg 27, the increment, to the index.
cmp ecx,[ebp -16];  Compare the index to the loop limit in synthreg 28.
...

This is my concept.  Is it reality?  I will not know until I have tried it.  

<snip>
> Once again, I would just love to see one (1) example of what is being
> talked about here. 

Me too.

>Let's see a small kernel in source, the current GCC code
> being generated, and the amazing improved code that can be generated with
> synthetic registers (which are nothing more than local memory locations).
> At this stage I really can't imagine such an example, so, assuming this is
> a failure of my imagination (I am not the only one with this handicap),
> please enlighten with one convincing example :-)

IIUYC, you want me to hand compile a small kernel source, and compare it to 
GCC, after I have repeatedly stated that the smaller the module, the less 
value there is in Synthetic registers?  Or would you prefer that I hand 
compile a large kernel source, wildly guessing all along as to how gcc will 
REALLY do the allocations, to really demonstrate any value of the 
approach?

Thank you for your suggestion, but no.   

Andy

Follow-Ups:
- Re: An unusual Performance approach using Synthetic registers
  - From: Michael S. Zick

References:
- Re: An unusual Performance approach using Synthetic registers
  - From: Robert Dewar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]