This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [PATCH] New c++ inliner


Hi,
here are the numbers I gathered from Gerald's program. I'm attaching
a tar gzipped set of postscript graphs. These are original-N.ps and
new-N.ps
for N = 0..5

original == gcc 3.1
new == gcc-3.1 + inliner patch
N == 0 is executable size
N == 1..4 execution time

All code was compiled with -O2 -finline-functions -static. The
environment was a 600MHz P3 with 192MB memory running 6.2 Redhat
Gnu/linux. I did not recompile the libraries when varying the
inlining parameters.

For the original graphs I varied the -finline-limit parameter from 64
to 8192. I also established a no-inlining case with -finline-limit=0

For the new graphs I varied both --param max-inline-ast over the set
[0, 3, 6, 12, 25, 50, 100, 200, 400] and --param arg-inline-ast over
the set [0, 1, 2, 4, 8, 16, 32, 64]. The no inlining case is with
both those at zero.

All graphs are normalized against the no-inlining case. All graphs are
logarithmic along X for the 2D graphs and X&Y for the 3D graphs.
I mapped zero onto -1.
The scaling between -finline-limit and --param max-inline-ast is 10 to
1 (i.e -finline-limit=10 is equivalent to --param max-inline-ast=1).
All the 2D graphs have a straight line from (-1,1) to (6,y). That's
because I did not measure any other settings in that range. The curves
look stable enough to see that nothing interesting occurs there.

The execution times are the times for various input data, and different
code paths are exercised for each test. It is a DLV is a computational
logic/logic programming system and more information is available at
http://www.dbai.tuwien.ac.at/proj/dlv/. (The source is not freely
available, though.) I'm sure Gerald will answer questions about what
each problem each one is solving.

You'll notice that original-0.ps (the original size graph), has a dip
at about 2^8. Beyond that the size increases again. The execution speeds
show a minimum reached at about 2^10, in some cases getting worse with
higher numbers, and not in others.

The new graphs show a size dip at about (2^5, 2^4), however there is a
reasonably flat bottomed valley forming an arc around the origin.
The execution times all show a flat plateau with a circular
peak centered around the origin.

comparing absolute size & speed, here are the numbers
	  O2, no-inline		O2		O2 inline-functions
gcc 2.95	3890		3544		3530 (size in K)
	 	9.77		4.99		5.20 stratcomp1-all
		100		19.4		19.4 primeimpl2

gcc 3.1	   	4500		5048		6786, (size in K)
	   	72.2		55.3		54.7 stratcomp1-all
	   	214		23.4		27.7 primeimpl2
3.1+patch	4495				3509 (size in K, best)
		75.9				55.0 stratcomp1-all, best
		218				18.3 primeimpl2, best

you'll notice that 3.1 is significantly slower that 2.95, I don't
know if this is caused by the difference in the stl's or not (or
anything like IO sync degradation) The execution speeds
with the patch are not different to that without it, but the size
is better.

Mark Mitchell wrote:
> > 3) fully optimize each of those callees. This includes the
> > step of inlining into the body of the callee. There is a new
> > flag for FUNCTION_DECLS indicating whether it has already under
> > gone this optimization step.
> 
> I am concerned about this.  In particular, the reason we did top-down
> inlining was that if you have A calls B calls C calls D and all
> are inline, then you get four copies of D, three of C, two of B, i.e.,
> a quadratic-ish explosion in copying.  By not inlining into B (unless it
> is output directly) we eliminate this problem.  I understand your
> response to this issue, but I'm still concerned.
Ok, I've thought of a way round this, Rather than complete step3,
we can generate the inlinable call-site list, optimize those callees
(to the same level as I'm about to describe), then estimate what the
expanded size would be by summing the sizes of the inlinable functions.
When we do the actual inlining, we copy the callee + its inlining
plan, then continue much as the current inliner does. This will
give us bottom-up inlining, without the memory expansion you're
worried about. Our estimate of the expanded size will come adrift
if inlining introduces eliminable unreachable code.

You might well be right about the DECL_NUM_STMT's estimate, though
if we have unreachable code elimination, constant propagation and
a constant parameter, our estimate might start getting inaccurate.

Jason Merrill wrote:
> Thoughts:
Yup, all things which should be on the todo list.

Anyway, can I put this one in? or would you prefer a version
with the above described expanding first?

nathan

-- 
Dr Nathan Sidwell :: Computer Science Department :: Bristol University
Never hand someone a gun unless you are sure where they will point it
nathan@acm.org  http://www.cs.bris.ac.uk/~nathan/  nathan@cs.bris.ac.uk

inline.tgz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]