Improving reload inheritance code generation and predictability

Fri Nov 19 16:19:00 GMT 2010

On 11/18/10 14:27, Vladimir Makarov wrote:
>>
>
> I like this idea and also thought long ago to try it too.  Because of 
> better inheritance I think it should show some code size improvement 
> and probably some performance improvement too besides better debugging.
There's a definite code size improvement.

>
> I am afraid only that it will take some compilation time too (which 
> will be probably compensated partially by less final insns processing) 
> and IMHO that is not because of insn traversing but mostly because of 
> usage of DF-infrastructure.
I'm also more concerned about the DF scanning than the BB scan when we 
need a reload register.  Obviously for something with huge blocks (say 
our friend fpppp) scanning the insns in the BB could get expensive and 
we could clamp the number of insns scanned on a PARAM value.

Anyway, I quickly inserted some counters to measure some data and ran a 
bootstrap (without java).

The first thing I note only 56% of the source files we compile even end 
up calling allocate_reload_reg.  I did not track total number of 
function's compiled.  56% is low enough that lazily initializing the DF 
data is probably worth it since DF scans the entire insn stream.   If we 
could lazily initialize DF within a block only, then that'd probably 
save even more.

Within the files that called allocate_reload_reg, we had 207003 calls to 
allocate_reload_reg and we scanned 2071962 insns in the loop, or 10 
insns per call.  That seemed rather high to me as I was expecting a scan 
rate of 5-7 insns per call.

Two related obvious improvements came to mind.  If there is only one 
spill reg, then scanning is totally unnecessary and if there is only one 
spill reg left to find during a scan, we can stop the scan, in both 
cases the remaining reg is the most desirable reg and scanning insns is 
totally unnecessary.  These two improvements get us down to 7.5 insns 
scanned per call to allocate_reload_reg.  Still more than I would have 
expected.

libgcc's bid_round results in 918 calls and 60627 insns scanned (* 3 
since libgcc is built 3 times during a bootstrap), which represents more 
than 10% of the total insns scanned.  If we factored out bid_round's 
effects we'd be looking at 6.5 insns scanned per call which seems about 
right.

>
> Some time ago I analyzed how many memory is used by DF during an IRA 
> snapshot.  It was about 25% vs 7% allocated by IRA for its IR (% of 
> all heap memory).  Touching this huge footprint will worse code 
> locality and result in slow code.
>
> Reload does not use DF and even automatic insn rescanning is switched 
> off.  I believe that if reload were rewritten to use DF, it could 
> result in much slower code.  This is just some my speculations which 
> really hard to confirm or reject.
Note that we still have DF structures lying around because ira doesn't 
call df_finish prior to calling reload.  So the memory increase should 
be minimal (basically just the increase due to insns inserted by 
caller-saves and the like).

The alternative would be to deep scan each insn in the loop.

Jeff