This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

recursive inlining (gcc-3.0.1)


Hi,

during my own tests with 3.0.1 (the TBCI C++ NumLIb stuff I reported a few
days ago) stuff, I had to up the -finline-insns to 2500 to get reasonable
runtime performance in all my tests. While for double as datatype the huge
jump happened between 900 and 1000, for complex numbers it happened between
2100 and 2200. So for my own applications I would use something as 2500 as
default and a higher value (>5000) for the people who tell that they really
want to optimize.

Now, I guess, most people would prefer lower values to limit compilation
time by too heavy recursive inlining and I guess this is why the value was
set to such a low value (600).
I think this is a problem for C++ applications.

I wonder why such high values are needed.
One reason is, I guess, that we don't have eliminated instructions that yet,
that will be so later. So the counting is not quite correct.
Maybe somebody can find a way to estimate this in advance or to move the
inlining decision to a later stage. I hope we at least don't count function
entry and leave code ...

I also have another idea what could go wrong when looking at cp/optimize.c

  /* Even if this function is not itself too big to inline, it might
     be that we've done so much inlining already that we don't want to
     risk inlining any more.  */
  if ((DECL_NUM_STMTS (fn) + id->inlined_stmts) * INSNS_PER_STMT
      > MAX_INLINE_INSNS)
	inlinable = 0;
		       
Suppose I have fn A that calls fn B that calls fn C in a loop.
Now, we compile A. We decide to inline B because it's small enough.
C itself would also be small enough to be integrated in B, but because 
we already have inlined, we exceed the limit.
The question is what happens:
(1) Integrate B in A, but leave calls to C
(2) Don't integrate B in A, but (later) integrate C in B

Obviously, (2) is preferable. In this example it's obvious, because of the
loop, but this is a rather common case. In any case, if we call the number
of function calls that needs to be made, (2) will never result in a higher
number that (1), so it's the better solution in general.

I suspect, gcc currently does (1) and that's why I see so poor performance
with low -finline-limit.

Anybody familiar with gcc code able to confirm or contradict my suspicion?

If I'm right: Can we change this? Considering that we drastically lowered
-finline-insns from 10k to 600 going from 3.0 to 3.0.1, I would consider
such a change appropriate for 3.0.2. If it's not too involved ...

Regards,
-- 
Kurt Garloff                   <kurt@garloff.de>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <garloff@suse.de>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)

PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]