This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] New c++ inliner
- To: Nathan Sidwell <nathan at codesourcery dot com>
- Subject: Re: [PATCH] New c++ inliner
- From: Nathan Sidwell <nathan at acm dot org>
- Date: Mon, 16 Jul 2001 15:41:04 +0100
- CC: gcc-patches at gcc dot gnu dot org, Gerald Pfeifer <pfeifer at dbai dot tuwien dot ac dot at>, mark at codesourcery dot com, Daniel Berlin <dan at cgsoftware dot com>, Jason Merrill <jason_merrill at redhat dot com>
- References: <3B4F1363.4BB4F5C5@codesourcery.com>
- Reply-To: nathan at compsci dot bristol dot ac dot uk
Hi,
here are the numbers I gathered from Gerald's program. I'm attaching
a tar gzipped set of postscript graphs. These are original-N.ps and
new-N.ps
for N = 0..5
original == gcc 3.1
new == gcc-3.1 + inliner patch
N == 0 is executable size
N == 1..4 execution time
All code was compiled with -O2 -finline-functions -static. The
environment was a 600MHz P3 with 192MB memory running 6.2 Redhat
Gnu/linux. I did not recompile the libraries when varying the
inlining parameters.
For the original graphs I varied the -finline-limit parameter from 64
to 8192. I also established a no-inlining case with -finline-limit=0
For the new graphs I varied both --param max-inline-ast over the set
[0, 3, 6, 12, 25, 50, 100, 200, 400] and --param arg-inline-ast over
the set [0, 1, 2, 4, 8, 16, 32, 64]. The no inlining case is with
both those at zero.
All graphs are normalized against the no-inlining case. All graphs are
logarithmic along X for the 2D graphs and X&Y for the 3D graphs.
I mapped zero onto -1.
The scaling between -finline-limit and --param max-inline-ast is 10 to
1 (i.e -finline-limit=10 is equivalent to --param max-inline-ast=1).
All the 2D graphs have a straight line from (-1,1) to (6,y). That's
because I did not measure any other settings in that range. The curves
look stable enough to see that nothing interesting occurs there.
The execution times are the times for various input data, and different
code paths are exercised for each test. It is a DLV is a computational
logic/logic programming system and more information is available at
http://www.dbai.tuwien.ac.at/proj/dlv/. (The source is not freely
available, though.) I'm sure Gerald will answer questions about what
each problem each one is solving.
You'll notice that original-0.ps (the original size graph), has a dip
at about 2^8. Beyond that the size increases again. The execution speeds
show a minimum reached at about 2^10, in some cases getting worse with
higher numbers, and not in others.
The new graphs show a size dip at about (2^5, 2^4), however there is a
reasonably flat bottomed valley forming an arc around the origin.
The execution times all show a flat plateau with a circular
peak centered around the origin.
comparing absolute size & speed, here are the numbers
O2, no-inline O2 O2 inline-functions
gcc 2.95 3890 3544 3530 (size in K)
9.77 4.99 5.20 stratcomp1-all
100 19.4 19.4 primeimpl2
gcc 3.1 4500 5048 6786, (size in K)
72.2 55.3 54.7 stratcomp1-all
214 23.4 27.7 primeimpl2
3.1+patch 4495 3509 (size in K, best)
75.9 55.0 stratcomp1-all, best
218 18.3 primeimpl2, best
you'll notice that 3.1 is significantly slower that 2.95, I don't
know if this is caused by the difference in the stl's or not (or
anything like IO sync degradation) The execution speeds
with the patch are not different to that without it, but the size
is better.
Mark Mitchell wrote:
> > 3) fully optimize each of those callees. This includes the
> > step of inlining into the body of the callee. There is a new
> > flag for FUNCTION_DECLS indicating whether it has already under
> > gone this optimization step.
>
> I am concerned about this. In particular, the reason we did top-down
> inlining was that if you have A calls B calls C calls D and all
> are inline, then you get four copies of D, three of C, two of B, i.e.,
> a quadratic-ish explosion in copying. By not inlining into B (unless it
> is output directly) we eliminate this problem. I understand your
> response to this issue, but I'm still concerned.
Ok, I've thought of a way round this, Rather than complete step3,
we can generate the inlinable call-site list, optimize those callees
(to the same level as I'm about to describe), then estimate what the
expanded size would be by summing the sizes of the inlinable functions.
When we do the actual inlining, we copy the callee + its inlining
plan, then continue much as the current inliner does. This will
give us bottom-up inlining, without the memory expansion you're
worried about. Our estimate of the expanded size will come adrift
if inlining introduces eliminable unreachable code.
You might well be right about the DECL_NUM_STMT's estimate, though
if we have unreachable code elimination, constant propagation and
a constant parameter, our estimate might start getting inaccurate.
Jason Merrill wrote:
> Thoughts:
Yup, all things which should be on the todo list.
Anyway, can I put this one in? or would you prefer a version
with the above described expanding first?
nathan
--
Dr Nathan Sidwell :: Computer Science Department :: Bristol University
Never hand someone a gun unless you are sure where they will point it
nathan@acm.org http://www.cs.bris.ac.uk/~nathan/ nathan@cs.bris.ac.uk
inline.tgz