Tree inlining for the C front end (part 3 of 3)

Alexandre Oliva aoliva@redhat.com
Sat Sep 29 15:33:00 GMT 2001


On Sep 25, 2001, Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at> wrote:

> On 24 Sep 2001, Alexandre Oliva wrote:
>> And finally, this patch enables tree inlining in the C front end.
>> [...]
>> Here's patch #3/3.  Bootstrapped and tested on athlon-pc-linux-gnu,
>> along with patches #1 and #2.  Ok to install?

> I firmly believe this should *NOT* be installed before it has passed
> serious compile-time and run-time (=code quality) performance bench-
> marking.

> We really, really, really should avoid having a huge regression in
> these respects as we had for the C++ frontend in GCC 3.0 and 3.0.1.

> The absolutely minimal set of tests required IMNSHO are timings for a
> compiler bootstrap (with and without this inliner), SPEC results, and
> at least another application benchmark (build time and run-time).

Ok, so I went ahead and verified that my patches had indeed disabled
-finline-functions, which I fixed with the patch below.  While I was
at it, I improved memory usage by only saving trees of inlinable
functions, which reduced the gap between bootstrap with and without my
patch.  I have also fixed all regressions in the testsuite, going back
to the same passes and failures as without my patches.

Here are today's numbers for bootstrap times of the entire GCC tree
(i.e., all languages, libraries included), with `make -j8 bootstrap',
starting from a just-configured build tree.

Bootstrap with none of my patches and standard BOOT_CFLAGS:
real    41m29.371s
user    36m29.430s
sys     4m7.420s

Bootstrap with patches 1 and 2 only, standard BOOT_CFLAGS:
real    41m32.907s
user    36m30.400s
sys     4m11.080s

Bootstrap with all 4 patches and standard BOOT_CFLAGS:
real    42m2.477s
user    37m2.970s
sys     4m7.510s

Bootstrap with patches 1 and 2 only and BOOT_CFLAGS='-O3 -g'.
real    43m48.287s
user    38m50.090s
sys     4m7.500s

Bootstrap with all 4 patches and BOOT_CFLAGS='-O3 -g'
real    44m30.794s
user    39m25.950s
sys     4m13.910s


That's a 1.3% increase in bootstrap time (all libraries, all
languages) for standard BOOT_CFLAGS, and 1.6% for a -O3 bootstrap.
Much better, eh? :-)


Here are some SPEC95 cint results, averaged over 3 runs, first with
patches 1 and 2 only (i.e., no tree inlining in C, just infrastructure
changes), then with all of 4 patches:

                     Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time     Ratio  Ref Time  Run Time     Ratio
   ------------  --------  --------  --------  --------  --------  --------
   099.go            4600      87.7      52.4      4600      88.2      52.1 
   124.m88ksim       1900      37.2      51.1      1900      41.2      46.1 
   126.gcc           1700      33.9      50.1      1700      34.0      50.0 
   129.compress      1800      72.7      24.8      1800      72.4      24.9 
   130.li            1900      38.5      49.3      1900      36.4      52.2 
   132.ijpeg         2400      55.5      43.2      2400      56.9      42.2 
   134.perl          1900      29.0      65.6      1900      28.9      65.8 
   147.vortex        2700      67.2      40.2      2700      65.4      41.3 
   SPECint_base95 (Geom. Mean)           45.6                              
   SPECint95 (Geom. Mean)                                              45.3
   ------------  --------  --------  --------  --------  --------  --------
   099.go            4600      87.8      52.4      4600      88.5      52.0 
   124.m88ksim       1900      37.2      51.1      1900      41.1      46.2 
   126.gcc           1700      33.9      50.1      1700      34.1      49.9 
   129.compress      1800      72.9      24.7      1800      73.3      24.5 
   130.li            1900      38.3      49.6      1900      36.4      52.2 
   132.ijpeg         2400      55.5      43.2      2400      56.9      42.2 
   134.perl          1900      28.9      65.9      1900      28.9      65.8 
   147.vortex        2700      67.4      40.0      2700      65.2      41.4 
   SPECint_base95 (Geom. Mean)           45.6                              
   SPECint95 (Geom. Mean)                                              45.3

No significant differences in performance, as expected, and a
hopefully acceptable increase in build time.

Some notes:

- I have mixed feelings about disabling inlining of functions that
  call __builtin_return_address() (see the #if 0 block in
  inline_forbidden_p().  It seems to me that we shouldn't be messing
  up with such functions.  In particular, if some functions in
  gcc.c-torture/execute/20010122-1.c are reordered, they end up
  inlined and produce incorrect results.  The patch below doesn't
  address this issue, though; it just uses a mechanism other than
  calling alloca() to prevent inlining of a function in those
  functions that would be currently inlined while they shouldn't.

- Some changes in tree-inline.c were necessary to cope with
  differences between C++ and C.  For example, C functions without
  prototypes may be called with excess or missing arguments; I've
  arranged for trailing parameters to be declared uninitialized, and
  for trailing arguments to be evaluated and discarded.  Also, type
  promotion of return types is different in C and in C++; the type of
  the DECL_RESULT of a C function is the promoted type, but the inline
  site expects the actual return type, so I introduced a conversion if
  types don't match.  Also, I found out varargs_function_p() is not
  enough to detect C functions with variable number of arguments, so I
  moved the function back to the C++ front-end, and moved its test
  into the language-specific section of the inlinability test.  While
  I was at it, I introduced the test for
  function_attribute_inlinable_p() in the C++ inlinability tests too.
  I wonder if I should move some more tests from the C version too,
  such as the test for calls to setjmp(), va_start, return_address,
  non-local gotos, etc.  I also plan on integrating the move of
  varags_function_p() back into the C++ front-end into some previous
  patch for purposes of installing them; perhaps some other changes
  may also be put in together.  Of course, I'll test whatever patch I
  check in individually.

- save_for_inline() is still called, so that we can defer the output
  of functions and still choose not to emit them when not necessary,
  but I have arranged for it to not do much in case we know we won't
  inline the function.  In particular, I've arranged for it to not
  mess up with TREE_READONLY of parameters, which broke
  gcc.dg/980709-1.c because the declaration of addr was marked as
  readonly, despite my efforts to get the code to recognize the
  (mem(addressof(reg))) as (reg) when marking the parameter as
  updated.  The reg was, for some reason, beyond the max_parm_reg, and
  I couldn't figure out how to fix max_parm_reg.  I decided I was just
  beating a dead horse, and there was no reason to try to keep the RTL
  inliner working, so I just went ahead and disabled that piece of
  code.

- The change in cse.c fixes the problem of cse moving the only use of
  a LABEL, stored in a register, without moving the REG_LABEL along
  with it, causing the compiler to enter an endless loop.  It turned
  out that the problem was that the label was already deleted, and we
  don't want REG_LABEL for deleted labels, but check_for_label_ref
  still demanded them; such a demand would never be fulfilled, though.

This new patch was bootstrapped and regression tested on
athlon-pc-linux-gnu, along with the 3 previously-posted patches.  Are
they ok to install?



More information about the Gcc-patches mailing list