This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: An unusual Performance approach using Synthetic registers


On Sunday 29 December 2002 08:18 pm, Daniel Egger wrote:
> Am Mon, 2002-12-30 um 00.45 schrieb Michael S.Zick:
> > Daniel, have you considered that the "16 used registers" observation
> > could be an artifact?
>
> Yes.
No offense intended, I really did presume that you had.

>
> > Presume an arbitrarily sophisticated optimization algorithm working
> > on a symbolic machine with an infinite number of registers...
> >
> > Wouldn't the "16 used registers" observation fail as the size of the
> > source file approached infinite size?
>
> Depends on the code. Numerical applications tend to need far more
> temporary values then say a notepad.
>
> > Similarly with the size and complexity of a single expression statement.
> > Those things are usually limited in size and complexity to what the
> > human mind can comprehend.
>
> The problem here is that you can surely recursively inline the full
> application into the main function, compile it as one chunk and then
> be happy about the maximum use of registers; however at the same time
> you absolutely blew code reuse and performance because of cache abuse.
>
Indeed.
Every one of those things.

In fact, I would like to quote your entire, clear, description of the result 
of recursively in-lining the entire application (including libc, libgcc, 
libstdc++, etc.) into the main function, BUT;

NOT in support of failing to do it where the optimization's on the symbolic
machine with a infinite sized set of registers can get hold of it, INSTEAD;

In support that there is an entire pass (just prior to hard register / 
synthetic register / stack slot assignments with spill & reload of whatever
is left over) missing from the compiler design.

It is at this point that the compiler should address (no pun intended) 
things like cache line utilization and locality, memory footprint (and
implied code reuse), etc.

For the purpose of this thread, lets skip over the point that there are
much better ways of achieving the goal of exposing the entire
application to the optimization pass(es) than simply, recursively,
in-lining the entire application into the main function.
Just pretend we did something realistic with a similar result.

I'll refer to that as "effectively exposing the entire application" to
the optimization pass(es).

Presume further that the optimization pass(es) on the symbolic machine
with an infinite number of registers has done it's thing.

At which point we have arrived at the front door of the port specific
"back end".  Where (currently) the task is to translate the symbolic machine
with a fully, usage, optimized, infinite register set into whatever our
silicon really has, plus any artificial restrictions, such as ABI definitions.

At this point we insert the compiler pass that I feel is missing.

For ease of discussion and visualization, presume that this symbolic machine
arrives in the form of a tree.

This pass traverses the tree, looking for leafs, twigs, small branches, major 
limbs having the same (or similar after transformation) patterns.
Leafs, twigs, etc occurring withing a loop count as that many occurrences of
the leaf, twig, etc.

At some point the pass "decides" that a certain pattern in the tree occurs
often enough that it should be UN-INLINED and turned into a function call.

The cost of all the prolog, epilogue, register bashing about to meet an ABI 
specification, and other havoc that has to be done to the beautiful symbolic
machine code is considered in that "decision" process.

All of that takes care of code reuse and cache line assignment.

With this design, any "inline function (.....)" and common functions that the
programmer factored out of the body of his code is only a HINT to this
new pass of what common instruction groupings should be.*

While still, in source form, meeting their PRIMARY, intended, goal of making
the source human comprehensible.

Then you assign hard registers, synthetic registers, and stack slots with
spill & reload of whatever is left over.

*Other than the use of the new "never-inline" attribute.

Mike


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]