Tree inlining for the C front end (part 3 of 3)
Alexandre Oliva
aoliva@redhat.com
Sat Sep 29 15:33:00 GMT 2001
On Sep 25, 2001, Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at> wrote:
> On 24 Sep 2001, Alexandre Oliva wrote:
>> And finally, this patch enables tree inlining in the C front end.
>> [...]
>> Here's patch #3/3. Bootstrapped and tested on athlon-pc-linux-gnu,
>> along with patches #1 and #2. Ok to install?
> I firmly believe this should *NOT* be installed before it has passed
> serious compile-time and run-time (=code quality) performance bench-
> marking.
> We really, really, really should avoid having a huge regression in
> these respects as we had for the C++ frontend in GCC 3.0 and 3.0.1.
> The absolutely minimal set of tests required IMNSHO are timings for a
> compiler bootstrap (with and without this inliner), SPEC results, and
> at least another application benchmark (build time and run-time).
Ok, so I went ahead and verified that my patches had indeed disabled
-finline-functions, which I fixed with the patch below. While I was
at it, I improved memory usage by only saving trees of inlinable
functions, which reduced the gap between bootstrap with and without my
patch. I have also fixed all regressions in the testsuite, going back
to the same passes and failures as without my patches.
Here are today's numbers for bootstrap times of the entire GCC tree
(i.e., all languages, libraries included), with `make -j8 bootstrap',
starting from a just-configured build tree.
Bootstrap with none of my patches and standard BOOT_CFLAGS:
real 41m29.371s
user 36m29.430s
sys 4m7.420s
Bootstrap with patches 1 and 2 only, standard BOOT_CFLAGS:
real 41m32.907s
user 36m30.400s
sys 4m11.080s
Bootstrap with all 4 patches and standard BOOT_CFLAGS:
real 42m2.477s
user 37m2.970s
sys 4m7.510s
Bootstrap with patches 1 and 2 only and BOOT_CFLAGS='-O3 -g'.
real 43m48.287s
user 38m50.090s
sys 4m7.500s
Bootstrap with all 4 patches and BOOT_CFLAGS='-O3 -g'
real 44m30.794s
user 39m25.950s
sys 4m13.910s
That's a 1.3% increase in bootstrap time (all libraries, all
languages) for standard BOOT_CFLAGS, and 1.6% for a -O3 bootstrap.
Much better, eh? :-)
Here are some SPEC95 cint results, averaged over 3 runs, first with
patches 1 and 2 only (i.e., no tree inlining in C, just infrastructure
changes), then with all of 4 patches:
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
099.go 4600 87.7 52.4 4600 88.2 52.1
124.m88ksim 1900 37.2 51.1 1900 41.2 46.1
126.gcc 1700 33.9 50.1 1700 34.0 50.0
129.compress 1800 72.7 24.8 1800 72.4 24.9
130.li 1900 38.5 49.3 1900 36.4 52.2
132.ijpeg 2400 55.5 43.2 2400 56.9 42.2
134.perl 1900 29.0 65.6 1900 28.9 65.8
147.vortex 2700 67.2 40.2 2700 65.4 41.3
SPECint_base95 (Geom. Mean) 45.6
SPECint95 (Geom. Mean) 45.3
------------ -------- -------- -------- -------- -------- --------
099.go 4600 87.8 52.4 4600 88.5 52.0
124.m88ksim 1900 37.2 51.1 1900 41.1 46.2
126.gcc 1700 33.9 50.1 1700 34.1 49.9
129.compress 1800 72.9 24.7 1800 73.3 24.5
130.li 1900 38.3 49.6 1900 36.4 52.2
132.ijpeg 2400 55.5 43.2 2400 56.9 42.2
134.perl 1900 28.9 65.9 1900 28.9 65.8
147.vortex 2700 67.4 40.0 2700 65.2 41.4
SPECint_base95 (Geom. Mean) 45.6
SPECint95 (Geom. Mean) 45.3
No significant differences in performance, as expected, and a
hopefully acceptable increase in build time.
Some notes:
- I have mixed feelings about disabling inlining of functions that
call __builtin_return_address() (see the #if 0 block in
inline_forbidden_p(). It seems to me that we shouldn't be messing
up with such functions. In particular, if some functions in
gcc.c-torture/execute/20010122-1.c are reordered, they end up
inlined and produce incorrect results. The patch below doesn't
address this issue, though; it just uses a mechanism other than
calling alloca() to prevent inlining of a function in those
functions that would be currently inlined while they shouldn't.
- Some changes in tree-inline.c were necessary to cope with
differences between C++ and C. For example, C functions without
prototypes may be called with excess or missing arguments; I've
arranged for trailing parameters to be declared uninitialized, and
for trailing arguments to be evaluated and discarded. Also, type
promotion of return types is different in C and in C++; the type of
the DECL_RESULT of a C function is the promoted type, but the inline
site expects the actual return type, so I introduced a conversion if
types don't match. Also, I found out varargs_function_p() is not
enough to detect C functions with variable number of arguments, so I
moved the function back to the C++ front-end, and moved its test
into the language-specific section of the inlinability test. While
I was at it, I introduced the test for
function_attribute_inlinable_p() in the C++ inlinability tests too.
I wonder if I should move some more tests from the C version too,
such as the test for calls to setjmp(), va_start, return_address,
non-local gotos, etc. I also plan on integrating the move of
varags_function_p() back into the C++ front-end into some previous
patch for purposes of installing them; perhaps some other changes
may also be put in together. Of course, I'll test whatever patch I
check in individually.
- save_for_inline() is still called, so that we can defer the output
of functions and still choose not to emit them when not necessary,
but I have arranged for it to not do much in case we know we won't
inline the function. In particular, I've arranged for it to not
mess up with TREE_READONLY of parameters, which broke
gcc.dg/980709-1.c because the declaration of addr was marked as
readonly, despite my efforts to get the code to recognize the
(mem(addressof(reg))) as (reg) when marking the parameter as
updated. The reg was, for some reason, beyond the max_parm_reg, and
I couldn't figure out how to fix max_parm_reg. I decided I was just
beating a dead horse, and there was no reason to try to keep the RTL
inliner working, so I just went ahead and disabled that piece of
code.
- The change in cse.c fixes the problem of cse moving the only use of
a LABEL, stored in a register, without moving the REG_LABEL along
with it, causing the compiler to enter an endless loop. It turned
out that the problem was that the label was already deleted, and we
don't want REG_LABEL for deleted labels, but check_for_label_ref
still demanded them; such a demand would never be fulfilled, though.
This new patch was bootstrapped and regression tested on
athlon-pc-linux-gnu, along with the 3 previously-posted patches. Are
they ok to install?
More information about the Gcc-patches
mailing list