Add an "early rematerialisation" pass

Thu Dec 14 19:26:00 GMT 2017

Jeff Law <law@redhat.com> writes:
> On 12/14/2017 04:09 AM, Richard Biener wrote:
>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch looks for pseudo registers that are live across a call
>>> and for which no call-preserved hard registers exist.  It then
>>> recomputes the pseudos as necessary to ensure that they are no
>>> longer live across a call.  The comment at the head of the file
>>> describes the approach.
>>>
>>> A new target hook selects which modes should be treated in this way.
>>> By default none are, in which case the pass is skipped very early.
>>>
>>> It might also be worth looking for cases like:
>>>
>>>    C1: R1 := f (...)
>>>    ...
>>>    C2: R2 := f (...)
>>>    C3: R1 := C2
>>>
>>> and giving the same value number to C1 and C3, effectively treating
>>> it like:
>>>
>>>    C1: R1 := f (...)
>>>    ...
>>>    C2: R2 := f (...)
>>>    C3: R1 := f (...)
>>>
>>> Another (much more expensive) enhancement would be to apply value
>>> numbering to all pseudo registers (not just rematerialisation
>>> candidates), so that we can handle things like:
>>>
>>>   C1: R1 := f (...R2...)
>>>   ...
>>>   C2: R1 := f (...R3...)
>>>
>>> where R2 and R3 hold the same value.  But the current pass seems
>>> to catch the vast majority of cases.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linux-gnu.  OK to install?
>> 
>> Can you tell anything about the complexity of the algorithm?

Have to get back to you on that one. :-)

>> How does it relate to what LRA can do?  AFAIK LRA doesn't try to find
>> any global optimal solution and previous hardreg assignments may work
>> against it?

Yeah, both of those are problems.  But the more important problem is
that it can't increase the live ranges of input registers as easily.
Doing it before RA means that IRA gets to see the new ranges.

>> That said - I would have expected remat to be done before the
>> first scheduling pass?  Even before pass_sms (not sure
>> what pass_live_range_shrinkage does).  Or be integrated
>> with scheduling and it's register pressure cost model.

SMS shouldn't be a problem.  Early remat wouldn't introduce new
instructions into a loop unless the loop also had a call, which would
prevent SMS.  And although it's theoretically possible that it could
remove instructions from a loop, that would only happen if:

  (a) the instruction actually computes the same value every time, so
      could have been moved outside the loop; and

  (b) the result is only used after a following call (and in particular
      isn't used within the loop itself)

(a) is a missed optimisation and (b) seems unlikely.

Integrating remat into scheduling would make it much less powerful,
since scheduling does only limited code motion between blocks.

Doing it before scheduling would be good in principle, but there
would then need to be a fake dependency between the call and remat
instructions to stop the scheduler moving the remat instructions
back before the call.  Adding early remat was a way of avoiding such
fake dependencies in "every" pass, but it might be that scheduling
is one case in which the dependencies make sense.

Either way, being able to run the pass before scheduling seems
like a future enhancement, blocked on a future enhancement to
the scheduler.

>> Also I would have expected the approach to apply to all modes,
>> just the cost of spilling is different.  But if you can, say, reduce
>> register pressure by one by rematerializing a bit-not then that
>> should be always profitable, no?  postreload-cse will come to
>> the rescue anyhow.

But that would then mean taking the register pressure into account when
deciding whether to rematerialise.  On the one hand would make it hard
to do before scheduling (which decides the final pre-RA pressure).
It would also make it a significantly different algorithm, since it
wouldn't be a standard availability problem any more.

For that use case, pressure-dependent remat in the scheduler might
be a better approach, like you were suggesting.  The early remat
pass is specifically for the extreme case of no registers being
call-preserved, where it's more important that we don't miss
remat opportunities, and more important that we treat it as
a global problem.

Thanks,
Richard