This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: -finline-functions tuning (Using on ET)

On Fri, Aug 31, 2001 at 08:18:38AM +0200, Olaf Petzold wrote:
> Hello Kurt,
> as I promised here some results on blitz-20001213. The file is:

I'll look into it. It would be nice if g++-3.0.2 would produce good code
with Blitz. It might be necessary to use -finline-limit-X though with a
relatively high limit.

> code for reflect produced by g++-2.96-0.48mdk:
> # g++ -O2 -I. -S
> ----8<----
[38 lines cut]
> ---->8----
> Looks good. 


> Code produced by -O not worth to show...

No wonder.

> # /opt/gcc/bin/g++ --version
> 2.95.3 
> # /opt/gcc/bin/g++ -O -I. -S
> ----8<----
[69 lines cut]
> ---->8----

> # /opt/gcc/bin/g++ -O2 -I. -S
> ----8<----
[71 lines cut]
> ---->8----
> Slighly better, but many indirect adressing.

Still 2.95.3 does not do so bad. It has no recursive inline limiting, which
may make your compile time and memory requirements explode, but it does not
fail inlining.

> Let's start with the g++-3.0.1-inline-heuristic-v2
> # /opt/gcc3/bin/g++ --version
> 3.0.1
> # /opt/gcc3/bin/g++ -O2 -I. -S
> ----8<----
[awful code deleted!]
> [...etc...]
> ---->8----

Not nice!

> # /opt/gcc3/bin/g++ -O3 -I. -S

-O3 should not make any difference, I'd guess.

> ----8<----
[108 lines of code deleted]
> ---->8----
> Some Xpr aren't inlined even with -O3 and keyword inline

To be expected, as the reason for the -O2 not inlining code was the inlining
limit, not the missing keyword inline.

> # /opt/gcc3/bin/g++ -O3 -finline-limit=3000 -I. -S
> ----8<----
[46 lines of rather good code deleted]
> ---->8----
> Some indirect calls are still there (but better code than before).

You mean indirect access, don't you?

I'd say that this result is acceptable, though I wonder why the 2.96 code
looked slightly better.
I don't think the inliner makes the difference for this.

> # /opt/gcc/bin/g++ -O2 -finline-limit=3000 -I. -S
    ^^^^^^^^^^^^^^^^                   ^
> doesn't make any difference related to reflect. The inline-limit seems
> to between 1500 and 2000 for this case. Your ideas I haven't checked yet.

2.95.x did not understand -finline-limit=X but only -finline-limit-X

> Well, lets start with applied  g++-rec-inline-heuristics-v3 and 
> gcc-inline-func-acct-v1 patch (the next day):

The latter patch should not be relevant for your case.
It just says that functions which are considered to be inlined by
consequence of -finline-functions (aka -O3) only are allowed to have half of
the size than those which are marked inline by the keyword (or by in class
decl definition).

> # /opt/gcc3/bin/g++ -O2  -I. -S
> ----8<----
[86 lines deleted]
> ---->8----
> Doesn't look good. Similar results with

No; slightly better than before, but far from acceptable.

> # /opt/gcc3/bin/g++ -O3  -I. -S
> So lets looking forward with 
> # /opt/gcc/bin/g++ -O2 -finline-limit=3000 -I. -S
           ^^                          ^
> cc1plus: Invalid option `-finline-limit=3000'
> Opps ????? Somethings gone wrong ?

You called gcc-2.95.3 (and the parameter is -finline-limit-X there).

> Well, info pages shows -fkeep-inline-functions as well:

Should not make any difference.

> # /opt/gcc/bin/g++ -O2 -fkeep-inline-functions  -I. -S

That's 2.95 again, isn't it? 
[2.95.3 code deleted].

Could you check how far you have to increase the inline-limit with 3.0.1 and
patch v3, please, to get acceptable code? (The gcc-inline-func-acct-v1.diff
is irrelevant in your case, AFAICT, but should of course also not do any
harm. It just throttles inlining for -finline-functions selected functions
a bit more than for inline declared ones.)

Question to the gcc experts: When accounting the already inlined code in the
C++ tree inliner, I add the number of statements minus one, as we save the
call statement. Should we substract more? I guess we can't in the general
case, though I've found that inlining functions below 13 statements to
decrease code size in my tests (ix86).

I guess, the Blitz code expands to a number of statements > 1 again and
again and in the end we end up lowering the limit because we think we've
been doing too much inlining. In the end the code collapses to only very 
few assembly instructions.
* Why are there that many statements produced?
  I would expect a double operator () (const int i) { return vec[i]); } 
  to be exactly one statement and something like a dot product with 3D
  vectors to be something like 6.
* It seems that in reality more statements are produced, as everything below
  (and incl.) 13 gets always inlined, unless we exceeded inlining 7680
  statements (76800 extimated insns). Why?
* Is there a way to look at the type of statements and to guess into how
  much assembly instructions they'll expand?
  Currently, we estimate 10, whereas the dot product, e.g. needs like
  8 assembly instructions.

Indirect addressing, I fear, is outside of the scope of what tuning the
inliner can do for you. Somebody else's call!
I assume, it's not catastrophic for your performance unlike the failure to
inline code.

Thanks for your results!
Kurt Garloff                   <>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)

PGP signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]