This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Add an "early rematerialisation" pass


On 12/14/2017 04:09 AM, Richard Biener wrote:
> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch looks for pseudo registers that are live across a call
>> and for which no call-preserved hard registers exist.  It then
>> recomputes the pseudos as necessary to ensure that they are no
>> longer live across a call.  The comment at the head of the file
>> describes the approach.
>>
>> A new target hook selects which modes should be treated in this way.
>> By default none are, in which case the pass is skipped very early.
>>
>> It might also be worth looking for cases like:
>>
>>    C1: R1 := f (...)
>>    ...
>>    C2: R2 := f (...)
>>    C3: R1 := C2
>>
>> and giving the same value number to C1 and C3, effectively treating
>> it like:
>>
>>    C1: R1 := f (...)
>>    ...
>>    C2: R2 := f (...)
>>    C3: R1 := f (...)
>>
>> Another (much more expensive) enhancement would be to apply value
>> numbering to all pseudo registers (not just rematerialisation
>> candidates), so that we can handle things like:
>>
>>   C1: R1 := f (...R2...)
>>   ...
>>   C2: R1 := f (...R3...)
>>
>> where R2 and R3 hold the same value.  But the current pass seems
>> to catch the vast majority of cases.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.  OK to install?
> 
> Can you tell anything about the complexity of the algorithm?  How
> does it relate to what LRA can do?  AFAIK LRA doesn't try to find
> any global optimal solution and previous hardreg assignments
> may work against it?
> 
> That said - I would have expected remat to be done before the
> first scheduling pass?  Even before pass_sms (not sure
> what pass_live_range_shrinkage does).  Or be integrated
> with scheduling and it's register pressure cost model.
> 
> Also I would have expected the approach to apply to all modes,
> just the cost of spilling is different.  But if you can, say, reduce
> register pressure by one by rematerializing a bit-not then that
> should be always profitable, no?  postreload-cse will come to
> the rescue anyhow.
> 
> Disclaimer: didn't look at the pass implementation, RTL isn't my
> primary expertise.
I'd just started looking at this yesterday.

At a high level, yes, we could consider doing this on other modes.  It
certainly would have helped Sparc in the past (I have no idea if it
still would).  The basic problem was that all FP registers were call
clobbered, so our FP performance really sucked.

We eventually fixed by dramatically improving the basic caller-saves
costing model, avoid unnecessary saves/restores and save/restore in
wider modes (it tended to use word_mode chunks, even for things like 64
bit FP values).

Something like an aggressive remat pass would likely have helped as well.

Things have changed quite a lot since those days (1990-ish), thankfully.

There's other targets that have register classes which are call
clobbered and very painful to spill -- but they tend not to be heavily
used classes.  I'm thinking of things like loop counter registers, shift
amount registers and the like.

Anyway, hoping to dig further into the remat bits today.  And there's
the fully-predicated loop bits that are still unreviewed as well -- want
to take those Richi?


Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]