Bug 34708 - Inlining heuristics issue
Summary: Inlining heuristics issue
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Jan Hubicka
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-07 21:18 UTC by Jakub Jelinek
Modified: 2008-01-12 13:34 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-01-07 22:09:34


Attachments
openbabel_perl.ii.bz2 (283.32 KB, application/x-bzip2)
2008-01-08 13:02 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2008-01-07 21:18:19 UTC
openbabel doesn't build on ppc64-linux with 4.3 (works with 4.1), with -mminimal-toc -O2 -fPIC .toc1 section overflows (is bigger than 64KB).
About 2/3 of .toc1 entries are as with 4.1 various pointers to .rodata.str1.8
(i.e. string literals), but newly there are over 3600 pointers into .text.

This comes down to:

static __attribute__ ((__unused__)) const char*
SWIG_Perl_ErrorType(int code) {
  const char* type = 0;
  switch(code) {
  case -12:
    type = "MemoryError";
    break;
  case -2:
    type = "IOError";
    break;
  case -3:
    type = "RuntimeError";
    break;
  case -4:
    type = "IndexError";
    break;
  case -5:
    type = "TypeError";
    break;
  case -6:
    type = "ZeroDivisionError";
    break;
  case -7:
    type = "OverflowError";
    break;
  case -8:
    type = "SyntaxError";
    break;
  case -9:
    type = "ValueError";
    break;
  case -10:
    type = "SystemError";
    break;
  case -11:
    type = "AttributeError";
    break;
  default:
    type = "RuntimeError";
  }
  return type;
}

extern "C" int puts (const char *);

void f1 (int code)
{
  puts (SWIG_Perl_ErrorType (code));
}

void f2 (int code)
{
  puts (SWIG_Perl_ErrorType (code));
}

void f3 (int code)
{
  puts (SWIG_Perl_ErrorType (code));
}

void f4 (int code)
{
  puts (SWIG_Perl_ErrorType (code));
}

void f5 (int code)
{
  puts (SWIG_Perl_ErrorType (code));
}

where 4.1 doesn't inline the SWIG_Perl_ErrorType function at -O2, but 4.3 does
because of -finline-small-functions.  SWIG_Perl_ErrorType isn't very small
though, especially with -fPIC it is fairly expensive, a bigger SWITCH_EXPR
which needs to use a jump table and then a dozen of string literal loads.
estimate_num_insns guesses 1 though :(.
Comment 1 Jan Hubicka 2008-01-07 22:09:34 UTC
mine.
Comment 2 Jan Hubicka 2008-01-07 22:38:59 UTC
Subject: Re:   New: Inlining heuristics issue

I should also add that this is one of examples where Martin's switch
optimization pass would do miracles.  If organized before inlining we
would end up with one static array.

Honza
Comment 3 Jan Hubicka 2008-01-07 23:46:30 UTC
Subject: Re:   New: Inlining heuristics issue

Hi,
I am testing the attached patch.  It simply accounts two instructions
for each case label, I guess it does not make much sense to try to do
something smarter until we move lowering of swithc constructs to tree
level.

Interestingly enough the function manages to be small when just one
switch is accounted.  I wonder if removing "* 2" still fixes the real
testcase (ie the extreme growth is caused by too much of cascaded
inlining).

The inlining costs are quite biassed not taking into account the cost of
address operations assuming that real work dominates the CPU time, that
is probably still the case but we get to extreme side cases like this in
other metricts..

Honza

Index: tree-inline.c
===================================================================
*** tree-inline.c	(revision 131386)
--- tree-inline.c	(working copy)
*************** estimate_num_insns_1 (tree *tp, int *wal
*** 2387,2395 ****
        break;
  
      case SWITCH_EXPR:
!       /* TODO: Cost of a switch should be derived from the number of
! 	 branches.  */
!       d->count += d->weights->switch_cost;
        break;
  
      /* Few special cases of expensive operations.  This is useful
--- 2387,2400 ----
        break;
  
      case SWITCH_EXPR:
!       /* Take into account cost of the switch + guess 2 conditional jumps for
!          each case label.
!      
! 	 TODO: once switch expansion algorithm is sufficiently separated
! 	 from RTL expansion, we might ask it for real cost of the switch
! 	 construct.  */
!       d->count += (d->weights->switch_cost
! 		   + TREE_VEC_LENGTH (SWITCH_LABELS (x)) * 2);
        break;
  
      /* Few special cases of expensive operations.  This is useful
Comment 4 Richard Biener 2008-01-08 10:26:19 UTC
I think we want to account PHI nodes as real copies instead (d->count += PHI_NUM_ARGS (...)) -- if they involve real operands (not VOPs).
Comment 5 Jan Hubicka 2008-01-08 12:15:46 UTC
Subject: Re:  Inlining heuristics issue

> I think we want to account PHI nodes as real copies instead (d->count +=
> PHI_NUM_ARGS (...)) -- if they involve real operands (not VOPs).

We ignore cost of MODIFY_EXPR in assumption that it will get quite
likely optimized out (that is important for simple containers).  So
ignoring PHI nodes is consistent in this manner, but on the other hand
PHIs are less often in containers and more likely to stay after
inlining.  I will add code for that.

Honza
Comment 6 rguenther@suse.de 2008-01-08 12:21:25 UTC
Subject: Re:  Inlining heuristics issue

On Tue, 8 Jan 2008, hubicka at ucw dot cz wrote:

> ------- Comment #5 from hubicka at ucw dot cz  2008-01-08 12:15 -------
> Subject: Re:  Inlining heuristics issue
> 
> > I think we want to account PHI nodes as real copies instead (d->count +=
> > PHI_NUM_ARGS (...)) -- if they involve real operands (not VOPs).
> 
> We ignore cost of MODIFY_EXPR in assumption that it will get quite
> likely optimized out (that is important for simple containers).  So
> ignoring PHI nodes is consistent in this manner, but on the other hand
> PHIs are less often in containers and more likely to stay after
> inlining.  I will add code for that.

Well, while register copies are likely to be optimized out, by definition
a PHI node copy cannot be optimized out (unless, of course, clever
optimization applies).

Richard.
Comment 7 Jan Hubicka 2008-01-08 12:22:09 UTC
Subject: Re:  Inlining heuristics issue

With some experimentation, the PHI change makes the costs to go up in
quite weird ways not taking into account that most of PHIs are
elliminated by coalescing.  So I will stay with the SWITCH cost * 2
approach.

Honza
Comment 8 Jakub Jelinek 2008-01-08 13:02:04 UTC
Created attachment 14900 [details]
openbabel_perl.ii.bz2

Here is the original testcase, compile on ppc64-linux or with -> ppc64-linux
cross, with -O2 -m64 -fPIC -mminimal-toc.
Comment 9 Andrew Macleod 2008-01-08 13:27:43 UTC
yes, many PHIs are eliminated by coalescing. If you wanted to experiment at some point for more accuracy, you can look at the PHI arguments. Any argument which has a different base name than the LHS of the PHI, or is a constant, *will* be a copy if the PHI remains.  
Comment 10 Jan Hubicka 2008-01-08 14:14:00 UTC
Subject: Re:  Inlining heuristics issue

> yes, many PHIs are eliminated by coalescing. If you wanted to experiment at
> some point for more accuracy, you can look at the PHI arguments. Any argument
> which has a different base name than the LHS of the PHI, or is a constant,
> *will* be a copy if the PHI remains.  

Hmm, good point, figuring out PHI arguemnts with constant or different
bases is easy.  Still we for reason don't account constants nor copy
instructions in attempt to guess what will remain important after
optimization and I am not sure we want to make PHIs an exception.

This function is typical example that has chance to optimize well after
inlining, so the decision is not that unreasonable.  I guess for now I
will stay with the simple SWITCH statement cost estimate (we don't want
to copy too giant SWITCH and we don't want to overestimate very simple
SWITChes as we do now) and play with this more with my patch that
separate speed and size estiamtes for inliner I have in queue for next
stage1.

Honza
Comment 11 Jan Hubicka 2008-01-09 19:20:26 UTC
Subject: Bug 34708

Author: hubicka
Date: Wed Jan  9 19:19:40 2008
New Revision: 131433

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131433
Log:

	PR tree-optimization/34708
	* tree-inline.c (estimate_num_insns_1): Compute cost of SWITCH_EXPR
	based on number of case labels.
	(init_inline_once): Remove switch_cost.
	* tree-inline.h (eni_weights_d): Remove switch_cost.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-inline.c
    trunk/gcc/tree-inline.h

Comment 12 Jan Hubicka 2008-01-12 13:34:38 UTC
Fixed on mainline (at least checking number of references from cross compiler).