This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Improving reload inheritance code generation and predictability

From: Jeff Law <law at redhat dot com>
To: Vladimir Makarov <vmakarov at redhat dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Fri, 19 Nov 2010 11:10:18 -0700
Subject: Re: Improving reload inheritance code generation and predictability
References: <4CE53C3D.6020001@redhat.com> <4CE59A33.707@redhat.com> <4CE69D7F.4000400@redhat.com> <4CE6B2FC.7080800@redhat.com>

On 11/19/10 10:25, Vladimir Makarov wrote:

On 11/19/2010 10:53 AM, Jeff Law wrote:
On 11/18/10 14:27, Vladimir Makarov wrote:

I like this idea and also thought long ago to try it too. Because of better inheritance I think it should show some code size improvement and probably some performance improvement too besides better debugging.
There's a definite code size improvement.

I am afraid only that it will take some compilation time too (which will be probably compensated partially by less final insns processing) and IMHO that is not because of insn traversing but mostly because of usage of DF-infrastructure.
I'm also more concerned about the DF scanning than the BB scan when we need a reload register. Obviously for something with huge blocks (say our friend fpppp) scanning the insns in the BB could get expensive and we could clamp the number of insns scanned on a PARAM value.

Anyway, I quickly inserted some counters to measure some data and ran a bootstrap (without java).

The first thing I note only 56% of the source files we compile even end up calling allocate_reload_reg. I did not track total number of function's compiled. 56% is low enough that lazily initializing the DF data is probably worth it since DF scans the entire insn stream. If we could lazily initialize DF within a block only, then that'd probably save even more.

Within the files that called allocate_reload_reg, we had 207003 calls to allocate_reload_reg and we scanned 2071962 insns in the loop, or 10 insns per call. That seemed rather high to me as I was expecting a scan rate of 5-7 insns per call.

Two related obvious improvements came to mind. If there is only one spill reg, then scanning is totally unnecessary and if there is only one spill reg left to find during a scan, we can stop the scan, in both cases the remaining reg is the most desirable reg and scanning insns is totally unnecessary. These two improvements get us down to 7.5 insns scanned per call to allocate_reload_reg. Still more than I would have expected.

libgcc's bid_round results in 918 calls and 60627 insns scanned (* 3 since libgcc is built 3 times during a bootstrap), which represents more than 10% of the total insns scanned. If we factored out bid_round's effects we'd be looking at 6.5 insns scanned per call which seems about right.

Interesting.

Some time ago I analyzed how many memory is used by DF during an IRA snapshot. It was about 25% vs 7% allocated by IRA for its IR (% of all heap memory). Touching this huge footprint will worse code locality and result in slow code.

Reload does not use DF and even automatic insn rescanning is switched off. I believe that if reload were rewritten to use DF, it could result in much slower code. This is just some my speculations which really hard to confirm or reject.
Note that we still have DF structures lying around because ira doesn't call df_finish prior to calling reload. So the memory increase should be minimal (basically just the increase due to insns inserted by caller-saves and the like).

That is not about memory increase. It is about DF data expelling rtl data from caches.

I wonder if I could convince the DF machinery to do an incremental update to add the caller-save insns and deal with the change from pseudo->hard regs external from DF. That might mitigate this issue.

I've just did some measurements of compilation time on your patch on all_cp2k_gfortran.f90 (> 400K lines of fortran)
without patch                                     219.20user
with only df calls in reload (see patch below)    221.39user
with all your patch                               221.17user
So 1% of degradation is only because the patch touches DF-data (not scanning insns in finding reload reg as someone can think). Better inheritance might improve compilation time because less insns are generated (although it is hard to say the difference on two last lines is too small).

I'd consider a 1% degradation way too much for this patch to be acceptable.

I think for -O0, we could just fall back to the round-robin and avoid the scan.

jeff

References:
- Improving reload inheritance code generation and predictability
  - From: Jeff Law
- Re: Improving reload inheritance code generation and predictability
  - From: Vladimir Makarov
- Re: Improving reload inheritance code generation and predictability
  - From: Jeff Law
- Re: Improving reload inheritance code generation and predictability
  - From: Vladimir Makarov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]