This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improving reload inheritance code generation and predictability


On 11/19/2010 10:53 AM, Jeff Law wrote:
On 11/18/10 14:27, Vladimir Makarov wrote:


I like this idea and also thought long ago to try it too. Because of better inheritance I think it should show some code size improvement and probably some performance improvement too besides better debugging.
There's a definite code size improvement.


I am afraid only that it will take some compilation time too (which will be probably compensated partially by less final insns processing) and IMHO that is not because of insn traversing but mostly because of usage of DF-infrastructure.
I'm also more concerned about the DF scanning than the BB scan when we need a reload register. Obviously for something with huge blocks (say our friend fpppp) scanning the insns in the BB could get expensive and we could clamp the number of insns scanned on a PARAM value.

Anyway, I quickly inserted some counters to measure some data and ran a bootstrap (without java).

The first thing I note only 56% of the source files we compile even end up calling allocate_reload_reg. I did not track total number of function's compiled. 56% is low enough that lazily initializing the DF data is probably worth it since DF scans the entire insn stream. If we could lazily initialize DF within a block only, then that'd probably save even more.

Within the files that called allocate_reload_reg, we had 207003 calls to allocate_reload_reg and we scanned 2071962 insns in the loop, or 10 insns per call. That seemed rather high to me as I was expecting a scan rate of 5-7 insns per call.

Two related obvious improvements came to mind. If there is only one spill reg, then scanning is totally unnecessary and if there is only one spill reg left to find during a scan, we can stop the scan, in both cases the remaining reg is the most desirable reg and scanning insns is totally unnecessary. These two improvements get us down to 7.5 insns scanned per call to allocate_reload_reg. Still more than I would have expected.

libgcc's bid_round results in 918 calls and 60627 insns scanned (* 3 since libgcc is built 3 times during a bootstrap), which represents more than 10% of the total insns scanned. If we factored out bid_round's effects we'd be looking at 6.5 insns scanned per call which seems about right.


Interesting.



Some time ago I analyzed how many memory is used by DF during an IRA snapshot. It was about 25% vs 7% allocated by IRA for its IR (% of all heap memory). Touching this huge footprint will worse code locality and result in slow code.


Reload does not use DF and even automatic insn rescanning is switched off. I believe that if reload were rewritten to use DF, it could result in much slower code. This is just some my speculations which really hard to confirm or reject.
Note that we still have DF structures lying around because ira doesn't call df_finish prior to calling reload. So the memory increase should be minimal (basically just the increase due to insns inserted by caller-saves and the like).

That is not about memory increase. It is about DF data expelling rtl data from caches.

I've just did some measurements of compilation time on your patch on all_cp2k_gfortran.f90 (> 400K lines of fortran)

without patch                                     219.20user
with only df calls in reload (see patch below)    221.39user
with all your patch                               221.17user

So 1% of degradation is only because the patch touches DF-data (not scanning insns in finding reload reg as someone can think). Better inheritance might improve compilation time because less insns are generated (although it is hard to say the difference on two last lines is too small).

That data was for -O2. I guess that the compilation time degradation will be even more for -O0.

Index: reload1.c
===================================================================
--- reload1.c   (revision 166913)
+++ reload1.c   (working copy)
@@ -1045,6 +1045,11 @@
        }
     }

+   df_finish_pass (true);
+   df_scan_alloc (NULL);
+   df_scan_blocks ();
+   df_analyze ();
+
   /* Use the reload registers where necessary
      by generating move instructions to move the must-be-register
      values into or out of the reload registers.  */
@@ -1061,6 +1066,8 @@
       gcc_assert (verify_initial_elim_offsets ());
     }

+   df_finish_pass (true);
+
   /* If we were able to eliminate the frame pointer, show that it is no
      longer live at the start of any basic block.  If it ls live by
      virtue of being in a pseudo, that pseudo will be marked live

The alternative would be to deep scan each insn in the loop.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]