This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] New c++ inliner
- To: gcc-patches at gcc dot gnu dot org
- Subject: Re: [PATCH] New c++ inliner
- From: Loren James Rittle <rittle at latour dot rsch dot comm dot mot dot com>
- Date: Tue, 17 Jul 2001 03:46:21 -0500 (CDT)
- Cc: nathan at codesourcery dot com
- Organization: Networks and Infrastructure Lab (IL02/2240), Motorola Labs
In article <3B4F1363.4BB4F5C5@codesourcery.com> you write:
> here's a patch which changes the C++ AST inliner. There are a number of
> changes, and a number of pieces of future work indicated. However, the
> high lights are
> a) no horrible compile time & memory degradation.
> b) produced object code is faster and smaller [...]
Hi Nathan,
Here are my early experiences with your patch (I added it to an
already bootstrapped tree, quickstrapped it and re-installed the
compiler only). I have a non-public C++ code base that solves a class
of symbolic math problems known as cypto addition (e.g. ``the + earth
+ venus + saturn + uranus = neptune'', find the value of each digit
symbol). The code was written to be a straightforward CPU-bound use
of STL. It was never explicitly tuned for any particular
implementation.
Table Notes: code generator was for i386, actual machine is i686.
-static was used in all cases. size(1) was used to obtain binary size
of the object file before linking. Process time is reported in
seconds with built-in time command (u is user time, s is system time).
compiler options compile time binary size execution time
2.95.2 2.7u+0.3s 25283 64.2u
-O 3.8u+0.3s 15867 13.8u
-O2 5.3u+0.3s 15637 13.4u
-O3 5.3u+0.3s 15384 12.8u
3.0 8.0u+0.4s 39627 59.7u
-O 10.4u+0.4s 23803 11.1u
-O2 13.0u+0.3s 23239 10.5u
-O3 33.7u+0.8s 64811 10.4u
-O3 -finline-limit=64000 82.6u+4.4s 111051 9.1u
mainline 8.9u+0.4s 38735 58.5u
-O 11.7u+0.3s 25035 10.8u
-O2 15.0u+0.4s 25131 10.3u
-O3 44.9u+0.9s 70927 10.0u
IMHO, this is a nice example since it already clearly beats 2.95.2 in
terms of execution time with default parameters (the STL
implementation did change but not radically). BTW, from past detailed
study, I can confirm that the compile time and binary size bloat is
mainly due to the libstdc++-v2 to libstdc++-v3 transition. Now, we
turn to recompiling with your proposed patch:
mainline+optimize4.patch 9.0u+0.4s 38735 58.6u
-O 9.6u+0.4s 19507 14.2u
-O2 12.3u+0.4s 19847 13.6u
-O3 12.8u+0.4s 20019 13.5u
I wondered what I would have to do for this example to obtain the
performance seen above (-O3).
-O3 --param max-inline-ast=100 12.5u+0.5s 18759 12.5u
-O3 --param max-inline-ast=200 15.5u+0.4s 23843 11.0u
-O3 --param max-inline-ast=400 20.2u+0.4s 33519 10.1u
-O3 --param max-inline-ast=800 21.4u+0.6s 35007 9.4u
-O3 --param max-inline-ast=1600 42.9u+0.7s 66103 8.7u
-O3 --param max-inline-ast=3200 50.0u+1.0s 74723 8.5u
-O3 --param max-inline-ast=6400 76.7u+1.2s 89727 7.9u
-O3 --param max-inline-ast=12800 102.8u+3.5s 107203 8.0u
(Wow, I know by cranking -finline-limit up, I never saw this code
perform this well, but maybe I never cranked it high enough)
I consider that last move to be "negative improvement according to all
metrics" thus I stopped. I did not attempt to tune with the other new
parameter since this one appeared to be the "big hammer". ;-)
But, then, consider that performance of the code compiled with -O2 was
fairly good before your patch. How do I get that back?
-O2 --param max-inline-ast=800 15.7u+0.4s 24639 10.6u
-O2 --param max-inline-ast=1600 15.6u+0.5s 24571 10.4u
[higher values of max-inline-ast not seen to hurt compile time or help
executable run time.]
I do not know that my use of STL in this example is representative of
all other uses, but it might be nice if max-inline-ast could be set by
default to a value that covers the nominal STL cases. I think you
have max-inline-ast set to low by default. I look at it this way,
with max-inline-ast={200, 400, 800}, the optimizing and inlining
compiler at -O3 is still 2 to 3 times faster than with the old
algorithm (for this one example, more data obviously needed, but I did
study results from Gerald's code).
As I told you in earlier private e-mail, thank you for being so
complete is describing how you propose to change the algorithm. As
usual, great work Nathan.
Regards,
Loren