Bug 59948 - Optimize std::function
Summary: Optimize std::function
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2014-01-26 11:31 UTC by Marc Glisse
Modified: 2019-10-05 04:07 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-08-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2014-01-26 11:31:29 UTC
#include <functional>

inline int f(int i){return i-1;}
int m(){
  std::function<int(int)> h=f;
  return h(1);
}

clang manages to optimize this to just:
int m(){return 0;}

However, g++ is stuck with a much longer code (comments inline):

int m() ()
{
  void * D.29797;
  struct function h;
  bool (*<T5503>) (union _Any_data &, const union _Any_data &, _Manager_operation) _8;
  bool (*<T5503>) (union _Any_data &, const union _Any_data &, _Manager_operation) _9;
  int (*<T590e>) (const union _Any_data &, int) _24;
  int _26;

  <bb 2>:
  MEM[(struct _Function_base *)&h]._M_manager = 0B;
  if (f != 0B)
// Shouldn't we know that f!=0? It is defined just above.
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  MEM[(int (*<T5912>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.26519._M_manager = _M_manager;
  if (_M_manager == 0B)
// Same, shouldn't we know that _M_manager!=0?
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  std::__throw_bad_function_call ();

  <bb 5>:
  _24 = h._M_invoker;
// Why not _M_invoke directly, we can only arrive here from bb3
// and nothing can clobber h._M_invoker. Then we might have a chance
// to inline the next line.
  _26 = _24 (&h.D.26519._M_functor, 1);

  <bb 6>:
  _8 = MEM[(struct _Function_base *)&h]._M_manager;
// I guess we need to fix the call to _24 above to know
// there is nothing clobbering h._M_manager.
  if (_8 != 0B)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  _8 (&MEM[(struct _Function_base *)&h]._M_functor, &MEM[(struct _Function_base *)&h]._M_functor, 3);

  <bb 8>:
  h ={v} {CLOBBER};
  h ={v} {CLOBBER};
  return _26;

<L3>:
  _9 = MEM[(struct _Function_base *)&h]._M_manager;
  if (_9 != 0B)
    goto <bb 10>;
  else
    goto <bb 11>;

  <bb 10>:
  _9 (&MEM[(struct _Function_base *)&h]._M_functor, &MEM[(struct _Function_base *)&h]._M_functor, 3);

  <bb 11>:
  h ={v} {CLOBBER};
  _10 = __builtin_eh_pointer (2);
  __builtin_unwind_resume (_10);

}

Trying to modify std::function by removing tests, compiling with -fno-exceptions, etc, I got to situations with f(1) (not inlined), or we still had an uninlined call to _M_manager (which, inlined, would reduce to nothing), but never a clean return 0.

If this simple example gets fixed, please check on the example in http://gcc.gnu.org/ml/gcc/2014-01/msg00261.html

Related issues:
PR 45631
PR 45632
PR 47413
(PR 56551)
Comment 1 Marc Glisse 2014-01-26 13:24:53 UTC
(In reply to Marc Glisse from comment #0)
>   if (f != 0B)
> // Shouldn't we know that f!=0? It is defined just above.

This happens because the test in fold-const.c is:
          return !VAR_OR_FUNCTION_DECL_P (base) || !DECL_WEAK (base);
and f is marked as weak. If I remove 'inline' or put 'static' then this code is folded to true. However, I can't easily do the same for _M_manager. The condition in fold-const.c probably needs to be weakened (no pun), as I strongly doubt it is legal to replace an inline f with 0.

Note that cp/class.c contains this:
          /* ??? Probably should check DECL_WEAK here.  */
          if (t && DECL_P (t))
            *nonnull = 1;

which we may want to keep in sync.

With f static, we get the shorter but still not so good:

int m() ()
{
  struct function h;
  int _21;

  <bb 2>:
  MEM[(int (*<T58fa>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.26465._M_manager = _M_manager;
  if (_M_manager == 0B)
// Can't make it non-weak
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  std::__throw_bad_function_call ();

  <bb 4>:
  _21 = f (1);
// Did we notice the function is really f too late for inlining?
  std::_Function_base::_Base_manager<int (*)(int)>::_M_manager (&MEM[(struct _Function_base *)&h]._M_functor, &MEM[(struct _Function_base *)&h]._M_functor, 3);
// Again, not inlined
  h ={v} {CLOBBER};
  h ={v} {CLOBBER};
//Maybe we could remove one of the clobbers somewhere along the way?
  return _21;

}

If I change fold-const.c (the condition is probably wrong, I just wanted to move forward):
-  return !VAR_OR_FUNCTION_DECL_P (base) || !DECL_WEAK (base);
+  return !VAR_OR_FUNCTION_DECL_P (base) || !DECL_WEAK (base) || DECL_DECLARED_INLINE_P (base);

It removes the test _M_manager == 0, but it still inlines neither f nor _M_manager.
Comment 2 Richard Biener 2014-01-29 13:27:26 UTC
Honza will know the correct condition ...

As for optimization the issue is that

  <bb 2>:
  MEM[(struct _Function_base *)&h]._M_manager = 0B;
  MEM[(int (*<T5869>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.26161._M_manager = _M_manager;
  _4 = std::function<int(int)>::operator() (&h, 1);

  <bb 3>:
  _5 = _4;
  _11 = MEM[(struct _Function_base *)&h]._M_manager;
  if (_11 != 0B)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  _11 (&MEM[(struct _Function_base *)&h]._M_functor, &MEM[(struct _Function_base *)&h]._M_functor, 3);

the call to std::function<int(int)>::operator() (&h, 1) may clobber h and
thus the _M_manager field.  This function is not early-inlined at -O[23]
and thus the call is not made direct before IPA inlining (or earlier).

Considering inline candidate _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}].
   Estimating body: _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/360
   Known to be false: not inlined
   size:10 time:21
   Estimating body: _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/360
   Known to be false: not inlined
   size:10 time:21
  will not early inline: int m()/273->_Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/360, growth 6 exceeds --param early-inlining-insns divided by number of calls

forcing that with --param early-inlining-insns=100 arrives at

  <bb 2>:
  MEM[(struct _Function_base *)&h]._M_manager = 0B;
  MEM[(int (*<T5869>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.26161._M_manager = _M_manager;
  _14 = std::_Function_handler<int(int), int (*)(int)>::_M_invoke (&h.D.26161._M_functor, 1);

  <bb 3>:
  _15 = MEM[(struct _Function_base *)&h]._M_manager;

which has the same problem - just with another function call which the
early inliner doesn't see.
Comment 3 Jan Hubicka 2014-02-04 02:35:15 UTC
The code in fold-const for nonzero check is really broken.  I have somewerhe WIP symtab patch for doing this, but it is not completely trivial to hook it into fold-const when symtab is not built yet - just as in this case and in some cases for LTO you really want to know if symbol is defined...
Comment 4 Jan Hubicka 2014-02-04 03:00:43 UTC
About the inlining issue,  am not really sure how to handle this without iterating optimizers and inliner like llvm does (Maxim had patch for this).
I wonder if we can't just declare the operator () always_inline to make sure it is early inlined?

It gets me to _M_empty call that we fail to inline because we do not inline into alwaysinlines (because of cycles) but then we do not iterate the early inliner anymore so we do not inline after inlining always inlines.  This is quite broken :(

Marking those two always inline should get things possible to be analyzed...
Comment 5 Marc Glisse 2014-05-02 14:06:20 UTC
(In reply to Jan Hubicka from comment #3)
> The code in fold-const for nonzero check is really broken.  I have somewerhe
> WIP symtab patch for doing this, but it is not completely trivial to hook it
> into fold-const when symtab is not built yet - just as in this case and in
> some cases for LTO you really want to know if symbol is defined...

For this case I don't think you really need to do it in fold-const, it should be fine to handle it elsewhere later (VRP may not be enough though), or to do it in fold-const but skip it if symtab is not ready. But if LTO has stronger requirements...

(In reply to Jan Hubicka from comment #4)
> About the inlining issue,  am not really sure how to handle this without
> iterating optimizers and inliner like llvm does (Maxim had patch for this).
> I wonder if we can't just declare the operator () always_inline to make sure
> it is early inlined?

I'd rather avoid relying on always_inline, so it still works if people use boost::function or other similar code instead. But maybe temporarily...

> It gets me to _M_empty call that we fail to inline because we do not inline
> into alwaysinlines (because of cycles) but then we do not iterate the early
> inliner anymore so we do not inline after inlining always inlines.  This is
> quite broken :(

Any news on Maxim's patch? There was a discussion in 2011, but I can't find anything more recent, and things have changed a bit since then...
Comment 6 Jan Hubicka 2014-09-26 16:54:13 UTC
The nonzero check rewrite is already in tree, so .optimized dump is now better:

int m() ()
{
  struct function h;
  int _23;

  <bb 2>:
  MEM[(int (*<T5861>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.26130._M_manager = _M_manager;
  _23 = f (1);
  std::_Function_base::_Base_manager<int (*)(int)>::_M_manager (&MEM[(struct _Function_base *)&h]._M_functor, &MEM[(struct _Function_base *)&h]._M_functor, 3);
  h ={v} {CLOBBER};
  return _23;
}

We fail to inline f because at .release_ssa time it is still passed via memory:

int m() ()
{
  struct function h;
  bool (*<T5408>) (union _Any_data &, const union _Any_data &, _Manager_operation) _2;
  int _4;
  bool (*<T5408>) (union _Any_data &, const union _Any_data &, _Manager_operation) _5;

  <bb 2>:
  MEM[(struct _Function_base *)&h]._M_manager = 0B;
  MEM[(int (*<T5861>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.26130._M_manager = _M_manager;
  _4 = std::function<int(int)>::operator() (&h, 1);

Where operator () is:

_Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}] (const struct function * const this, int __args#0)
{
  bool (*<T543f>) (union _Any_data &, const union _Any_data &, _Manager_operation) _3;
  int (*<T585d>) (const union _Any_data &, int) _4;
  const union _Any_data * _5;
  int _7;

  <bb 2>:
  _3 = MEM[(bool (*<T5408>) (union _Any_data &, const union _Any_data &, _Manager_operation) *)this_1(D) + 16B];
...

We have

  Jump functions of caller  int m()/290:
    callsite  int m()/290 -> _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/361 :
       param 0: UNKNOWN
         Aggregate passed by reference:
           offset: 0, cst: f
           offset: 128, cst: _M_manager
           offset: 192, cst: _M_invoke
       param 1: CONST: 1

So we know that f is passed in the aggregate.

We also know operator () is using it:

  Jump functions of caller  _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/361:
    callsite  _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/361 -> void std::__throw_bad_function_call()/539 :
    indirect aggregate callsite, calling param 0, offset 192, by reference, for stmt _7 = _4 (_5, __args#0_9(D));
       param 0: UNKNOWN
       param 1: PASS THROUGH: 1, op nop_expr

Later we propagate this call into _M_invoke and we lost track of f() that is invoked by it.  We do have jump functoin here too:

  Jump functions of caller  static _Res std::_Function_handler<_Res(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&, _ArgTypes ...) [with _Res = int; _Functor = int (*)(int); _A
    indirect aggregate callsite, calling param 0, offset 0, by reference, for stmt _5 = _3 (__args#0_6(D));
       param 0: PASS THROUGH: 1, op nop_expr

Martin, what is going wrong here?
Comment 7 Marc Glisse 2015-02-25 22:27:14 UTC
There has been huge progress in gcc-5:

int m() ()
{
  struct function h;

  <bb 2>:
  MEM[(int (*<T5c6a>) (int) *)&h] = f;
  h._M_invoker = _M_invoke;
  h.D.27699._M_manager = _M_manager;
  std::_Function_base::_Base_manager<int (*)(int)>::_M_manager (&MEM[(struct _Function_base *)&h]._M_functor, &MEM[(struct _Function_base *)&h]._M_functor, 3);
  h ={v} {CLOBBER};
  return 0;

}

Apparently we managed to inline f! For this simple testcase, all that is missing is inlining _M_manager to realize that when the last argument is 3 it does nothing. Then we can go back to more complicated examples like the one linked at the end of comment #0.