This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Add an "early rematerialisation" pass
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Richard Sandiford <richard dot sandiford at linaro dot org>,Jeff Law <law at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 14 Dec 2017 20:32:07 +0100
- Subject: Re: Add an "early rematerialisation" pass
- Authentication-results: sourceware.org; auth=none
- References: <87efowyjzw.fsf@linaro.org> <CAFiYyc0KPO96WP5wioirPS_3wh1=6tUdK5q=_uhdRHHLr3DBtA@mail.gmail.com> <0030fd16-580a-f82b-c589-fa3d3370217c@redhat.com> <87wp1pp0ue.fsf@linaro.org>
On December 14, 2017 8:26:49 PM GMT+01:00, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>Jeff Law <law@redhat.com> writes:
>> On 12/14/2017 04:09 AM, Richard Biener wrote:
>>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch looks for pseudo registers that are live across a call
>>>> and for which no call-preserved hard registers exist. It then
>>>> recomputes the pseudos as necessary to ensure that they are no
>>>> longer live across a call. The comment at the head of the file
>>>> describes the approach.
>>>>
>>>> A new target hook selects which modes should be treated in this
>way.
>>>> By default none are, in which case the pass is skipped very early.
>>>>
>>>> It might also be worth looking for cases like:
>>>>
>>>> C1: R1 := f (...)
>>>> ...
>>>> C2: R2 := f (...)
>>>> C3: R1 := C2
>>>>
>>>> and giving the same value number to C1 and C3, effectively treating
>>>> it like:
>>>>
>>>> C1: R1 := f (...)
>>>> ...
>>>> C2: R2 := f (...)
>>>> C3: R1 := f (...)
>>>>
>>>> Another (much more expensive) enhancement would be to apply value
>>>> numbering to all pseudo registers (not just rematerialisation
>>>> candidates), so that we can handle things like:
>>>>
>>>> C1: R1 := f (...R2...)
>>>> ...
>>>> C2: R1 := f (...R3...)
>>>>
>>>> where R2 and R3 hold the same value. But the current pass seems
>>>> to catch the vast majority of cases.
>>>>
>>>> Tested on aarch64-linux-gnu (with and without SVE),
>x86_64-linux-gnu
>>>> and powerpc64le-linux-gnu. OK to install?
>>>
>>> Can you tell anything about the complexity of the algorithm?
>
>Have to get back to you on that one. :-)
>
>>> How does it relate to what LRA can do? AFAIK LRA doesn't try to
>find
>>> any global optimal solution and previous hardreg assignments may
>work
>>> against it?
>
>Yeah, both of those are problems. But the more important problem is
>that it can't increase the live ranges of input registers as easily.
>Doing it before RA means that IRA gets to see the new ranges.
>
>>> That said - I would have expected remat to be done before the
>>> first scheduling pass? Even before pass_sms (not sure
>>> what pass_live_range_shrinkage does). Or be integrated
>>> with scheduling and it's register pressure cost model.
>
>SMS shouldn't be a problem. Early remat wouldn't introduce new
>instructions into a loop unless the loop also had a call, which would
>prevent SMS. And although it's theoretically possible that it could
>remove instructions from a loop, that would only happen if:
>
> (a) the instruction actually computes the same value every time, so
> could have been moved outside the loop; and
>
> (b) the result is only used after a following call (and in particular
> isn't used within the loop itself)
>
>(a) is a missed optimisation and (b) seems unlikely.
>
>Integrating remat into scheduling would make it much less powerful,
>since scheduling does only limited code motion between blocks.
>
>Doing it before scheduling would be good in principle, but there
>would then need to be a fake dependency between the call and remat
>instructions to stop the scheduler moving the remat instructions
>back before the call. Adding early remat was a way of avoiding such
>fake dependencies in "every" pass, but it might be that scheduling
>is one case in which the dependencies make sense.
>
>Either way, being able to run the pass before scheduling seems
>like a future enhancement, blocked on a future enhancement to
>the scheduler.
>
>>> Also I would have expected the approach to apply to all modes,
>>> just the cost of spilling is different. But if you can, say, reduce
>>> register pressure by one by rematerializing a bit-not then that
>>> should be always profitable, no? postreload-cse will come to
>>> the rescue anyhow.
>
>But that would then mean taking the register pressure into account when
>deciding whether to rematerialise. On the one hand would make it hard
>to do before scheduling (which decides the final pre-RA pressure).
>It would also make it a significantly different algorithm, since it
>wouldn't be a standard availability problem any more.
>
>For that use case, pressure-dependent remat in the scheduler might
>be a better approach, like you were suggesting. The early remat
>pass is specifically for the extreme case of no registers being
>call-preserved, where it's more important that we don't miss
>remat opportunities, and more important that we treat it as
>a global problem.
On x86_64 all xmm registers are caller saved for example. That means all FP regs and all vectors. (yeah, stupid ABI decision....)
Richard.
>Thanks,
>Richard