openbabel doesn't build on ppc64-linux with 4.3 (works with 4.1), with -mminimal-toc -O2 -fPIC .toc1 section overflows (is bigger than 64KB). About 2/3 of .toc1 entries are as with 4.1 various pointers to .rodata.str1.8 (i.e. string literals), but newly there are over 3600 pointers into .text. This comes down to: static __attribute__ ((__unused__)) const char* SWIG_Perl_ErrorType(int code) { const char* type = 0; switch(code) { case -12: type = "MemoryError"; break; case -2: type = "IOError"; break; case -3: type = "RuntimeError"; break; case -4: type = "IndexError"; break; case -5: type = "TypeError"; break; case -6: type = "ZeroDivisionError"; break; case -7: type = "OverflowError"; break; case -8: type = "SyntaxError"; break; case -9: type = "ValueError"; break; case -10: type = "SystemError"; break; case -11: type = "AttributeError"; break; default: type = "RuntimeError"; } return type; } extern "C" int puts (const char *); void f1 (int code) { puts (SWIG_Perl_ErrorType (code)); } void f2 (int code) { puts (SWIG_Perl_ErrorType (code)); } void f3 (int code) { puts (SWIG_Perl_ErrorType (code)); } void f4 (int code) { puts (SWIG_Perl_ErrorType (code)); } void f5 (int code) { puts (SWIG_Perl_ErrorType (code)); } where 4.1 doesn't inline the SWIG_Perl_ErrorType function at -O2, but 4.3 does because of -finline-small-functions. SWIG_Perl_ErrorType isn't very small though, especially with -fPIC it is fairly expensive, a bigger SWITCH_EXPR which needs to use a jump table and then a dozen of string literal loads. estimate_num_insns guesses 1 though :(.
mine.
Subject: Re: New: Inlining heuristics issue I should also add that this is one of examples where Martin's switch optimization pass would do miracles. If organized before inlining we would end up with one static array. Honza
Subject: Re: New: Inlining heuristics issue Hi, I am testing the attached patch. It simply accounts two instructions for each case label, I guess it does not make much sense to try to do something smarter until we move lowering of swithc constructs to tree level. Interestingly enough the function manages to be small when just one switch is accounted. I wonder if removing "* 2" still fixes the real testcase (ie the extreme growth is caused by too much of cascaded inlining). The inlining costs are quite biassed not taking into account the cost of address operations assuming that real work dominates the CPU time, that is probably still the case but we get to extreme side cases like this in other metricts.. Honza Index: tree-inline.c =================================================================== *** tree-inline.c (revision 131386) --- tree-inline.c (working copy) *************** estimate_num_insns_1 (tree *tp, int *wal *** 2387,2395 **** break; case SWITCH_EXPR: ! /* TODO: Cost of a switch should be derived from the number of ! branches. */ ! d->count += d->weights->switch_cost; break; /* Few special cases of expensive operations. This is useful --- 2387,2400 ---- break; case SWITCH_EXPR: ! /* Take into account cost of the switch + guess 2 conditional jumps for ! each case label. ! ! TODO: once switch expansion algorithm is sufficiently separated ! from RTL expansion, we might ask it for real cost of the switch ! construct. */ ! d->count += (d->weights->switch_cost ! + TREE_VEC_LENGTH (SWITCH_LABELS (x)) * 2); break; /* Few special cases of expensive operations. This is useful
I think we want to account PHI nodes as real copies instead (d->count += PHI_NUM_ARGS (...)) -- if they involve real operands (not VOPs).
Subject: Re: Inlining heuristics issue > I think we want to account PHI nodes as real copies instead (d->count += > PHI_NUM_ARGS (...)) -- if they involve real operands (not VOPs). We ignore cost of MODIFY_EXPR in assumption that it will get quite likely optimized out (that is important for simple containers). So ignoring PHI nodes is consistent in this manner, but on the other hand PHIs are less often in containers and more likely to stay after inlining. I will add code for that. Honza
Subject: Re: Inlining heuristics issue On Tue, 8 Jan 2008, hubicka at ucw dot cz wrote: > ------- Comment #5 from hubicka at ucw dot cz 2008-01-08 12:15 ------- > Subject: Re: Inlining heuristics issue > > > I think we want to account PHI nodes as real copies instead (d->count += > > PHI_NUM_ARGS (...)) -- if they involve real operands (not VOPs). > > We ignore cost of MODIFY_EXPR in assumption that it will get quite > likely optimized out (that is important for simple containers). So > ignoring PHI nodes is consistent in this manner, but on the other hand > PHIs are less often in containers and more likely to stay after > inlining. I will add code for that. Well, while register copies are likely to be optimized out, by definition a PHI node copy cannot be optimized out (unless, of course, clever optimization applies). Richard.
Subject: Re: Inlining heuristics issue With some experimentation, the PHI change makes the costs to go up in quite weird ways not taking into account that most of PHIs are elliminated by coalescing. So I will stay with the SWITCH cost * 2 approach. Honza
Created attachment 14900 [details] openbabel_perl.ii.bz2 Here is the original testcase, compile on ppc64-linux or with -> ppc64-linux cross, with -O2 -m64 -fPIC -mminimal-toc.
yes, many PHIs are eliminated by coalescing. If you wanted to experiment at some point for more accuracy, you can look at the PHI arguments. Any argument which has a different base name than the LHS of the PHI, or is a constant, *will* be a copy if the PHI remains.
Subject: Re: Inlining heuristics issue > yes, many PHIs are eliminated by coalescing. If you wanted to experiment at > some point for more accuracy, you can look at the PHI arguments. Any argument > which has a different base name than the LHS of the PHI, or is a constant, > *will* be a copy if the PHI remains. Hmm, good point, figuring out PHI arguemnts with constant or different bases is easy. Still we for reason don't account constants nor copy instructions in attempt to guess what will remain important after optimization and I am not sure we want to make PHIs an exception. This function is typical example that has chance to optimize well after inlining, so the decision is not that unreasonable. I guess for now I will stay with the simple SWITCH statement cost estimate (we don't want to copy too giant SWITCH and we don't want to overestimate very simple SWITChes as we do now) and play with this more with my patch that separate speed and size estiamtes for inliner I have in queue for next stage1. Honza
Subject: Bug 34708 Author: hubicka Date: Wed Jan 9 19:19:40 2008 New Revision: 131433 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131433 Log: PR tree-optimization/34708 * tree-inline.c (estimate_num_insns_1): Compute cost of SWITCH_EXPR based on number of case labels. (init_inline_once): Remove switch_cost. * tree-inline.h (eni_weights_d): Remove switch_cost. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-inline.c trunk/gcc/tree-inline.h
Fixed on mainline (at least checking number of references from cross compiler).