This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: An unusual Performance approach using Synthetic registers
Thank you. You have clearly stated some things that I have been implying.
On Sunday 05 January 2003 06:24 am, Tom Lord wrote:
> dewar:
>
> This is a bit of an odd statement. In practice on a machine
> like the x86, the current stack frame will typically be
> resident in L1 cache, and that's where the register allocator
> spills to. What some of us still don't see is the difference
> in final resulting code between your "synthetic registers" and
> normal spill locations from the register allocator.
>
>
> Register spills clearly don't equal synthetic registers.
>
> Presumably, the number of locations dedicated to register spills never
> exceeds (approximately) the maximum number of simultaneously live
> _intermediate_ values minus the number of general purpose registers.
> Any non-intermediate value (i.e., one that has a main memory
> location), rather than being spilled, will be written to its location.
> If that value is later re-used, it will be retrieved from memory.
>
> The number of synthetic registers can be much larger than the number
> of simultaneously live intermediate values.
>
> So, with synthetic registers, some values that are not intermediates
> can be retained (in synthetic registers). Without synthetic
> registers, the next time those values are used, they have to be
> fetched from (non-special) memory.
>
> In other words, with synthregs, the CPU can ship some value off to
> memory and not care how long it takes to get there or to get back from
> there -- because it also ships it off to the synthreg, which it
> hypothetically has faster access to.
>
> In practice, that means that synthregs will store some values in
> memory twice: once in the location the program text says they go in;
> again in the synthetic register. If the synthetic register is indeed
> cache-favored, maybe there's a performance win there -- and if so, a
> register allocator is the right algorithm to decide which values to
> keep duplicated in synthetic registers (so the proposed implementation
> strategy is sensible).
>
> (Another weird interaction is intermediate values that can be
> recalculated -- I don't know if GCC ever makes that trade-off --
> if it does, it needs to be tuned for synthregs.)
>
> So, does that hypothesis (that synthreg access is faster than general
> memory access) hold? Quite possibly. For example, a re-used synthreg
> inherits cache-presence (at all levels, not just L1) from the previous
> uses. synthregs may win for some apps for more than just L1 reasons.
>
> This brings in new alignment issues, too. If you can, you might want
> to make sure that your allocator locates its metadata where it will
> cache-collide with the synthregs, to help push allocated memory out of
> those locations (presuming here that allocator meta-data is relatively
> infrequently accessed). It's probably not all that hard to do this
> "by accident". Just in general: do things to protect the
> cache-presence of the synthregs.
>
> It might eventually lead to some hw advances: give synthregs with
> absolute locations cache preference. Or, if synthregs are on the
> stack, give locations near the frame pointer cache preference (or is
> that done already?).
>
> I'd therefore guess it will be a very system-specific optimization --
> but that it will win often enough to be useful. And given what I
> understand about trends in architecture, the cases in which it will
> win will sharply increase over time.
>
> No?
>
> -t
>
> p.s.: arch foo thinking about non-disruptive ways to improve gcc's
> rev ctl practices:
>
>
> http://lists.fifthvision.net/pipermail/arch-users/2003-January/001856.html
>
> and some of the follow-ups. It's a pretty "noisy" list,
> though.