This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: An unusual Performance approach using Synthetic registers


On Sunday 05 January 2003 07:33 am, Tom Lord wrote:
>        Remember that "retrieving from memory" is *EXACTLY*
>        the same code sequence as reading a synthetic register,
>        assuming both are on the current stack frame.

Better not be, or I have wasted a lot of time.

> Two replies:
>
> 1) I don't fully understand why synthregs aren't a common area rather
>    than part of stack frames.  A common area _adds_ code to
>    save/restore synthregs -- but it also increases the number and
>    frequency of references to synthregs.  I don't think L1 is the only
>    cache that can be used better by synthregs.

Threading processess using synthregs from a common area gets complicated, and 
I am avoiding complication.  

I am implementing a relatively simple and straightforward approach, with a 
primary goal of getting what Robert keeps requesting: real, measurable, 
output code.  If it is a complete failure, I will quit.  Experience tells me 
it will be a partial success.  I will learn from it what none of us knew, and 
build on it.

Another common area objection: I want to link synthreg compiled routines with 
non-synthreg compiled routines.  Have synthreg routines call non-synthreg 
routines which in turn call synthreg routines.  Now I have a problem of 
"who's on first?"  Every synthreg routine using a common area would have to 
test to see if the common area is allocated or if it exists.  Worse, this 
could easily be a memory access to (horrors) a location that is not in any 
cache.      

Finally, I do not see any benefit to common area Synthetic registers.  If I 
get the alignment correct, the Synthetic registers will fully occupy two, or 
maybe one, cache buffer blocks.  The very first access will probably have a 
processor stall/memory fetch cost.  However, this exact same cost will be 
added to the multple register store instructions used for common area 
Synthetics.  Where would we save the contents of the common area Synthetic 
registers?  In exactly the same memory location I am using for the frame 
based Synthetic registers.  Once the first access has occurred, the Synthetic 
registers will be in L1 cache, and will stay there because they are now being 
constantly hit during program execution.  

My estimate is that the _only_ difference will be the extra execution time 
for the register save/restore processes.

>
> 2) Same code sequence (or "worse"), yes.  Same cache interaction, no.
>
>    > It might eventually lead to some hw advances: give synthregs with
>    > absolute locations cache preference.  Or, if synthregs are on the
>    > stack, give locations near the frame pointer cache preference (or is
>    > that done already?).
>
>    I don't see that as a good idea at all. The stack frame indeed will
>    almost always be in cache with current designs, and locking cache
>    seems a bad idea.
>
> If you're right -- then storing additional non-intermediate values
> on the stack (as stack-based synthregs) may very well be a win.
>
> If I'm right, then the net effect of the proposed HW changes is to
> bump the number of registers, but to have some registers accessed by
> shorter code sequences than others.
>
> Either way, synthregs (plausibly at least) wins.
>
<snip>
> But, sketchingly, let's think of a function that manipulates a dozen
> C++ objects, each with a vtable.  It also manipulates some fields in
> each object.  The vtable pointers are going to be used lots of times --
> each field, just once.  (Maybe the fields are array elements and
> we're talking about a loop here.)
>
> I can't fit all those vtable pointers in regs, but I can fit them in
> synthregs.  Do agree that the reg allocator, applied to synthregs,
> will keep those vtable pointers in synthregs?
>
> So now the generated code (looking just at the instruction count) with
> synthregs will be slightly _worse_ than the code without synthregs --
> but if the synthregs really do wind up with noticably better cache
> performance, it'll run faster.

My goal is that the generated code with Synthetic registers will be 
significantly better (looking just at the instruction count).  If the fields 
have individual values, in your example, then Reload imposes an overhead.  It 
is a two step process: move the spilled data back into a register, then use 
it.  With a Synthetic, you get to just use it.  

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]