Bug 40874 - Function object abstraction penalty with inline functions.
Summary: Function object abstraction penalty with inline functions.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.2
: P3 enhancement
Target Milestone: 4.7.0
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-07-27 15:06 UTC by Dave Abrahams
Modified: 2011-05-23 14:36 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-07-29 08:06:04


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Abrahams 2009-07-27 15:06:04 UTC
The following program shows that wrapping a simple class around a function pointer costs important optimizations.  If you change the #if 1 to #if 0, you'll see that the call is completely optimized away.  

#if 1
struct pf
{
  inline pf(int(*x)(int)) : x(x) {}

  inline int operator()(int a) const
  {
    return x(a);
  }
  
  int (*x)(int);
};
#else
typedef int(*pf)(int);
#endif

inline int g(int x) { return x/x - 1; }

int main(int argc, char* argv[])
{
  pf f(g);
  return f(3);
}
Comment 1 Andrew Pinski 2009-07-27 15:36:36 UTC
So one problem is that nothing after SRA (before inlining) constant props &g into the call expression.
Comment 2 Andrew Pinski 2009-07-27 15:45:22 UTC
(In reply to comment #1)
> So one problem is that nothing after SRA (before inlining) constant props &g
> into the call expression.

So there is no abstraction penalty really just a missing inlining.
Comment 3 Dave Abrahams 2009-07-27 16:26:03 UTC
The missing inlining is the cause, abstraction penalty is the symptom.
Comment 4 Paolo Carlini 2009-07-27 17:57:26 UTC
I would say let's add Martin (and Honza) in CC, he did a lot of improvements for similar issues (which I pointed out a lot of time ago to Honza).
Comment 5 Richard Biener 2009-07-28 11:22:45 UTC
After early SRA we get

  f$x_8 = g;
  D.2142_6 = f$x_8;
  D.2141_7 = D.2142_6 (3);

which now misses a constant propagation of &g into the call which is why
inlining doesn't catch this opportunity.  Put one in and the abstraction
goes away.

Puting FRE into early optimizations also would get this.
Comment 6 Dave Abrahams 2009-07-28 18:42:23 UTC
The next step would be to verify that the penalty is eliminated when using boost::function / tr1::function
Comment 7 Paolo Carlini 2009-07-28 19:38:56 UTC
One step at a time, Dave ;)
Comment 8 Martin Jambor 2009-07-28 21:33:34 UTC
I can confirm that if we schedule pass_ccp right after pass_sra_early,
g gets inlined.  Moreover, if we schedule one more pass_forwprop right
afterwards, even the testcase for PR 3713, comment #12 gets optimized
as it should :-)

So, like with PR 3713, we either have to schedule ccp or add some
specific pattern matching to the inlining preparation phase.  I guess
that people will find running one more ccp and fwprop unacceptable and
so some pattern matching will have to be done anyway for the other PR
(and we already do some awkward stuff like that for indirect member
pointer calls).  Perhaps we can match both, this one would be very
easy.  (Or is scheduling the two extra passes an option?)
Comment 9 rguenther@suse.de 2009-07-29 08:05:59 UTC
Subject: Re:  Function object abstraction penalty
 with inline functions.

On Tue, 28 Jul 2009, paolo dot carlini at oracle dot com wrote:

> ------- Comment #7 from paolo dot carlini at oracle dot com  2009-07-28 19:38 -------
> One step at a time, Dave ;)

Indeed ;)  Just adding a CCP pass is likely not going to happen here.
Moving / adding FRE to early optimizations might get done though.

Dependent on time available for evaluation ...

Richard.
Comment 10 Richard Biener 2009-07-29 08:06:04 UTC
I'll take this for now.
Comment 11 Jan Hubicka 2009-07-29 08:08:52 UTC
Subject: Re:  Function object abstraction penalty with inline functions.

> I'll take this for now.
My preferred way of fixing this would be to include FRE pass.
Unfortunately my last benchmarks adding FRE early wasn't showing much of
win on our benchmark suite...  Still it seems right thing to do.

Honza
Comment 12 rguenther@suse.de 2009-07-29 08:12:34 UTC
Subject: Re:  Function object abstraction penalty
 with inline functions.

On Tue, 28 Jul 2009, jamborm at gcc dot gnu dot org wrote:

> ------- Comment #8 from jamborm at gcc dot gnu dot org  2009-07-28 21:33 -------
> I can confirm that if we schedule pass_ccp right after pass_sra_early,
> g gets inlined.  Moreover, if we schedule one more pass_forwprop right
> afterwards, even the testcase for PR 3713, comment #12 gets optimized
> as it should :-)
> 
> So, like with PR 3713, we either have to schedule ccp or add some
> specific pattern matching to the inlining preparation phase.  I guess
> that people will find running one more ccp and fwprop unacceptable and
> so some pattern matching will have to be done anyway for the other PR
> (and we already do some awkward stuff like that for indirect member
> pointer calls).  Perhaps we can match both, this one would be very
> easy.  (Or is scheduling the two extra passes an option?)

Not really.  Or maybe it is ... at least scheduling FRE is still on
the list of possible things todo (can you check if that fixes 3713 as 
well?)

Richard.
Comment 13 Martin Jambor 2009-07-29 10:16:44 UTC
(In reply to comment #12)
> ... at least scheduling FRE is still on the list of possible things
> todo (can you check if that fixes 3713 as well?)
> 

No, it doesn't (unlike the testcase above, for which FRE is enough).
We have to remove the if-statement in (foo is a function):

  p$__pfn_25 = foo;
  D.1739_3 = (int) p$__pfn_25;
  D.1740_4 = D.1739_3 & 1;
  if (D.1740_4 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

Not even FRE combined with a subsequent fwprop (in their current form)
can make this happen :-/
Comment 14 rguenther@suse.de 2009-07-29 10:57:01 UTC
Subject: Re:  Function object abstraction penalty
 with inline functions.

On Wed, 29 Jul 2009, jamborm at gcc dot gnu dot org wrote:

> ------- Comment #13 from jamborm at gcc dot gnu dot org  2009-07-29 10:16 -------
> (In reply to comment #12)
> > ... at least scheduling FRE is still on the list of possible things
> > todo (can you check if that fixes 3713 as well?)
> > 
> 
> No, it doesn't (unlike the testcase above, for which FRE is enough).
> We have to remove the if-statement in (foo is a function):
> 
>   p$__pfn_25 = foo;
>   D.1739_3 = (int) p$__pfn_25;
>   D.1740_4 = D.1739_3 & 1;
>   if (D.1740_4 != 0)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> 
> Not even FRE combined with a subsequent fwprop (in their current form)
> can make this happen :-/

FRE has to be teached the & 1 simplification via looking through the
int cast.

Richard.
Comment 15 Richard Biener 2011-05-23 14:33:11 UTC
Works on trunk (but not during early inlining).  From early inlining we now get

<bb 2>:
  f.x = g;
  D.2126_6 = f.x;
  D.2127_7 = D.2126_6 (3);

and early FRE turns that into a direct call:

  f$x_8 = g;
  D.2126_6 = f$x_8;
  D.2125_7 = g (3);

which is then inlined by IPA inlining:

int main(int, char**) (int argc, char * * argv)
{
<bb 2>:
  return 0;

}

Fixed.
Comment 16 Richard Biener 2011-05-23 14:36:33 UTC
Author: rguenth
Date: Mon May 23 14:36:28 2011
New Revision: 174068

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174068
Log:
2011-05-23  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/40874
	* g++.dg/tree-ssa/pr40874.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr40874.C
Modified:
    trunk/gcc/testsuite/ChangeLog