This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Add an "early rematerialisation" pass


Jeff Law <law@redhat.com> writes:
> On 12/14/2017 12:26 PM, Richard Sandiford wrote:
>>>> How does it relate to what LRA can do?  AFAIK LRA doesn't try to find
>>>> any global optimal solution and previous hardreg assignments may work
>>>> against it?
>> 
>> Yeah, both of those are problems.  But the more important problem is
>> that it can't increase the live ranges of input registers as easily.
>> Doing it before RA means that IRA gets to see the new ranges.
> LRA does not work on a global basis.  It's somewhere between basic block
> and extended basic block in its scope.
>
> Remat (along with caller-saves) is really just a case of range splitting
> in my mind.  So you really want the pass to either directly integrate
> with IRA or run prior to IRA.
>
> You can do splitting in response to failure to get a hard register and
> try to hook back into IRA to color those new objects.  I had reasonably
> good success with that approach when I was looking at the allocators
> prior to LRA.
>
> Basically I let IRA do its thing.  WHen it was done I walked through the
> IL splitting ranges to make hard registers available at key points.
> Then I'd call back into IRA (using existing mechanisms) to try
> allocation again for the allocnos that had not been colored and any new
> ones.  The key was there was some very simple and easy range splitting
> you could do on already allocated allocnos that in turn would free up
> hard registers.
>
> That kind of model doesn't seem to fit here terribly well.  It's not a
> lack of hard regs that's the problem, but simply not having any hard
> regs available across calls.  So splitting the range of some allocno
> that did get a hard register isn't going to help color any of the
> allocanos that did not get a register.
>
>
> If we go back further (circa 1998) we did a pre-allocation range
> splitting pass.  We had it working marginally OK, but never really as
> well as we wanted.
>
> In that model we looked at pseudos that were likely going to be hard to
> allocate and split them into multiple new pseudos.  We tracked the
> relationship between the new and original pseudo so that reload could
> shove them back together in cases where that made the most sense.  We
> had copyin/copyout insns to move back and forth between the range copies
> and the original pseudo as needed.
>
> I don't remember the heuristics that drove when/where to split.
> Meissner might since my recollection is that he did the major lifting
> there.
>
> But again, I don't think that model works here either.  It did nothing
> WRT remat.
>
> I know we pondered remat in the context of revamping caller-saves in the
> early 90s to help Sparc FP.  But my recollection was that once we had
> caller-saves handling the basics well, the performance gains were enough
> that digging into remat was never really explored.
>
> Anyway, that's a bit of history.  IMHO remat has to run prior to
> allocation or integrated with allocation.   In general I'd expect
> running before and independent of IRA to be easier to implement, but
> slightly less performant than tightly integrated with IRA.
>
> In addition to potentially avoiding spilling, we have an added benefit
> for SVE that we avoid variable sized stack frames if we can eliminate
> *all* instances of SVE regsiters live across calls.
>
> I'm guessing that they're relatively rare to begin with based on
> comments within the actual code.

Yeah, spills due to excess register pressure in vectorised code are
relatively rare, and like you say, it's good to avoid variable-sized
frames if we can.  A lot of the remaining cases are due to duplicated
invariants being spilled, where it would be better to spill the duplicated
scalar instead.  That's future work.

The problem of SVE values becoming artificially live across calls is
actually relatively common though.  Adding early remat was important
in quite a few benchmarks.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]