Add an "early rematerialisation" pass

Richard Sandiford richard.sandiford@linaro.org
Thu Dec 14 23:22:00 GMT 2017


Richard Biener <richard.guenther@gmail.com> writes:
> On December 14, 2017 8:26:49 PM GMT+01:00, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>>Jeff Law <law@redhat.com> writes:
>>> On 12/14/2017 04:09 AM, Richard Biener wrote:
>>>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> This patch looks for pseudo registers that are live across a call
>>>>> and for which no call-preserved hard registers exist.  It then
>>>>> recomputes the pseudos as necessary to ensure that they are no
>>>>> longer live across a call.  The comment at the head of the file
>>>>> describes the approach.
>>>>>
>>>>> A new target hook selects which modes should be treated in this
>>way.
>>>>> By default none are, in which case the pass is skipped very early.
>>>>>
>>>>> It might also be worth looking for cases like:
>>>>>
>>>>>    C1: R1 := f (...)
>>>>>    ...
>>>>>    C2: R2 := f (...)
>>>>>    C3: R1 := C2
>>>>>
>>>>> and giving the same value number to C1 and C3, effectively treating
>>>>> it like:
>>>>>
>>>>>    C1: R1 := f (...)
>>>>>    ...
>>>>>    C2: R2 := f (...)
>>>>>    C3: R1 := f (...)
>>>>>
>>>>> Another (much more expensive) enhancement would be to apply value
>>>>> numbering to all pseudo registers (not just rematerialisation
>>>>> candidates), so that we can handle things like:
>>>>>
>>>>>   C1: R1 := f (...R2...)
>>>>>   ...
>>>>>   C2: R1 := f (...R3...)
>>>>>
>>>>> where R2 and R3 hold the same value.  But the current pass seems
>>>>> to catch the vast majority of cases.
>>>>>
>>>>> Tested on aarch64-linux-gnu (with and without SVE),
>>x86_64-linux-gnu
>>>>> and powerpc64le-linux-gnu.  OK to install?
>>>> 
>>>> Can you tell anything about the complexity of the algorithm?
>>
>>Have to get back to you on that one. :-)
>>
>>>> How does it relate to what LRA can do?  AFAIK LRA doesn't try to
>>find
>>>> any global optimal solution and previous hardreg assignments may
>>work
>>>> against it?
>>
>>Yeah, both of those are problems.  But the more important problem is
>>that it can't increase the live ranges of input registers as easily.
>>Doing it before RA means that IRA gets to see the new ranges.
>>
>>>> That said - I would have expected remat to be done before the
>>>> first scheduling pass?  Even before pass_sms (not sure
>>>> what pass_live_range_shrinkage does).  Or be integrated
>>>> with scheduling and it's register pressure cost model.
>>
>>SMS shouldn't be a problem.  Early remat wouldn't introduce new
>>instructions into a loop unless the loop also had a call, which would
>>prevent SMS.  And although it's theoretically possible that it could
>>remove instructions from a loop, that would only happen if:
>>
>>  (a) the instruction actually computes the same value every time, so
>>      could have been moved outside the loop; and
>>
>>  (b) the result is only used after a following call (and in particular
>>      isn't used within the loop itself)
>>
>>(a) is a missed optimisation and (b) seems unlikely.
>>
>>Integrating remat into scheduling would make it much less powerful,
>>since scheduling does only limited code motion between blocks.
>>
>>Doing it before scheduling would be good in principle, but there
>>would then need to be a fake dependency between the call and remat
>>instructions to stop the scheduler moving the remat instructions
>>back before the call.  Adding early remat was a way of avoiding such
>>fake dependencies in "every" pass, but it might be that scheduling
>>is one case in which the dependencies make sense.
>>
>>Either way, being able to run the pass before scheduling seems
>>like a future enhancement, blocked on a future enhancement to
>>the scheduler.
>>
>>>> Also I would have expected the approach to apply to all modes,
>>>> just the cost of spilling is different.  But if you can, say, reduce
>>>> register pressure by one by rematerializing a bit-not then that
>>>> should be always profitable, no?  postreload-cse will come to
>>>> the rescue anyhow.
>>
>>But that would then mean taking the register pressure into account when
>>deciding whether to rematerialise.  On the one hand would make it hard
>>to do before scheduling (which decides the final pre-RA pressure).
>>It would also make it a significantly different algorithm, since it
>>wouldn't be a standard availability problem any more.
>>
>>For that use case, pressure-dependent remat in the scheduler might
>>be a better approach, like you were suggesting.  The early remat
>>pass is specifically for the extreme case of no registers being
>>call-preserved, where it's more important that we don't miss
>>remat opportunities, and more important that we treat it as
>>a global problem.
>
> On x86_64 all xmm registers are caller saved for example. That means all
> FP regs and all vectors. (yeah, stupid ABI decision....)

OK.  The patch uses a target hook to select the modes -- basing it off
whether they're variable-length is just the default.  So if this turns
out to be a win for x86_64 or for SPARC (was originally going to reply
to that in Jeff's message, sorry), then the target could opt in if it
wants to.

Thanks,
Richard



More information about the Gcc-patches mailing list