Bug 46507 - std::get and devirtualization on non-automatic variables
Summary: std::get and devirtualization on non-automatic variables
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Jan Hubicka
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-11-16 17:57 UTC by miles
Modified: 2014-01-16 18:02 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-01-16 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description miles 2010-11-16 17:57:40 UTC
My understanding is that std::tuple should offer largely the same optimization opportunities as std::pair (while being more flexible).

However gcc-4.6 seems to miss some optimizations with std::tuple that it doesn't with pair.

E.g. the following code ("tp.cc"):

---- start ----
   #include <tuple>
   #include <utility>

   struct A
   {
     virtual void f () const;
   };

   void arg_tuple_test (const std::tuple<A> &t)
   {
     std::get<0>(t).f ();
   }

   void extern_tuple_test ()
   {
     extern const std::tuple<A> &t;
     std::get<0>(t).f ();
   }

   void arg_pair_test (const std::pair<A,A> &t)
   {
     t.first.f ();
   }
---- end ----

compiled with:

  g++-snapshot -std=c++0x -O2 -march=amdfam10 -S tp.cc

results in the following assembly:

   arg_tuple_test(std::tuple<A> const&):
	   movq	(%rdi), %rax
	   movq	(%rax), %rax
	   jmp	*%rax

   extern_tuple_test():
	   movq	t(%rip), %rdi
	   movq	(%rdi), %rax
	   movq	(%rax), %rax
	   jmp	*%rax

   arg_pair_test(std::pair<A, A> const&):
	   jmp	A::f() const

	   .ident	"GCC: (Debian 20101114-1) 4.6.0 20101114 (experimental) [trunk revision 166728]"


It seems like all of these functions should use a direct jump to A::f, but the tuple versions do not.

Note that when I tried this same test yesterday, on a different machine (but the same compiler version), "extern_tuple_test" (but not "arg_tuple_test") _did_ result in a direct jump to A::f!  So maybe something funny is going on...


Thanks,

-Miles
Comment 1 Paolo Carlini 2010-11-16 19:04:37 UTC
Interesting. Not that I think it would make a huge difference, but probably this is a more "fair" comparison:

   #include <tuple>
   #include <utility>

   struct A
   {
     virtual void f () const;
   };

   void arg_tuple_test (const std::tuple<A,A> &t)
   {
     std::get<0>(t).f ();
   }

   void extern_tuple_test ()
   {
     extern const std::tuple<A, A> &t;
     std::get<0>(t).f ();
   }

   void arg_pair_test (const std::pair<A,A> &t)
   {
     std::get<0>(t).f ();
   }
Comment 2 miles 2010-11-16 19:06:02 UTC
> Note that when I tried this same test yesterday, on a different machine (but
> the same compiler version), "extern_tuple_test" (but not "arg_tuple_test")
> _did_ result in a direct jump to A::f!  So maybe something funny is going on...

Actually this is wrong:  the version of "extern_tuple_test" which worked yesterday was really this:

   void extern_tuple_test ()
   {
     extern const std::tuple<A> t;
     std::get<0>(t).f ();
   }

resulting in:

   extern_tuple_test():
	movl	$t, %edi
	jmp	A::f() const

Still, it seems like "arg_tuple_test" (and the previous version of "extern_tuple_tst") should also received the same optimization.
Comment 3 Jonathan Wakely 2010-11-16 19:27:48 UTC
Paolo, your example *does* show a significant difference:  the missed optimisation is in std::get() and not tuple itself, as shown by 

#include <tuple>
#include <utility>

struct A
{
    virtual void f () const;
};

void arg_tuple_test (const std::pair<A,A> &t)
{
    std::get<0>(t).f ();
}

void arg_pair_test (const std::pair<A,A> &t)
{
    t.first.f ();
}

The second function jumps directly to A::f
Comment 4 Marc Glisse 2013-09-14 21:35:43 UTC
In t.first.f(), the front-end knows that t.first is really an object of type A, not something derived, so it doesn't need to do a virtual call (I am just guessing, I didn't check in the FE sources). On the other hand, std::get returns an A&, which as far as the front-end knows could really be an object of a derived type. The compiler can only know better after inlining. However, gcc now knows how to devirtualize a number of calls, so there is hope.

Martin, do you have an opinion on the testcase in comment 3? We get:

  _2 = &t_1(D)->first;
  _4 = MEM[(const struct type *)t_1(D)]._vptr.A;
  _5 = *_4;
  OBJ_TYPE_REF(_5;_2->0) (_2); [tail call]

vs

  _2 = &t_1(D)->first;
  A::f (_2); [tail call]
Comment 5 Marc Glisse 2013-10-12 20:13:35 UTC
I just read this comment in ipa-prop.c: "only for automatically allocated objects but at the moment we devirtualize only these". And indeed, if we define an automatic variable of type A and call arg_tuple_test on it (and that call is inlined), then the call to f is devirtualized. So this is a known limitation of the current devirtualization code. I don't think there is anything we can do at the library level.
Comment 6 Martin Jambor 2013-11-08 14:41:13 UTC
(In reply to Marc Glisse from comment #4)
> Martin, do you have an opinion on the testcase in comment 3? We get:
> 
>   _2 = &t_1(D)->first;
>   _4 = MEM[(const struct type *)t_1(D)]._vptr.A;
>   _5 = *_4;
>   OBJ_TYPE_REF(_5;_2->0) (_2); [tail call]

Assuming that first is a non-artificial (as in DECL_ARTIFICIAL) field,
we should be able to devirtualize this but as you have already
noticed, our current type-based middle-end intraprocedural
devirtualization machinery only works on automatic variables, but this
example shows another interesting situation.  Thanks for the testcase,
I will try to make it work after I cleanup the the code to better
interoperate with Honza's recent devirtualization work.
Comment 7 Jan Hubicka 2014-01-16 06:10:40 UTC
OK, GIMPLE representation is pre-inlining as follows:
void arg_tuple_test(const std::pair<A, A>&) (const struct pair & t)
{
  const struct type & _2;
  const struct type & _3;
  int (*__vtbl_ptr_type) () * _5;
  int (*__vtbl_ptr_type) () _6;

  <bb 2>:
  _2 = std::get<0ul, A, A> (t_1(D));
  _3 = _2;
  _5 = _3->_vptr.A;
  _6 = *_5;
  OBJ_TYPE_REF(_6;(const struct A)_3->0) (_3);
  return;

}
Here I do not think we can devirtualize, because std::get can return a derived
type in general.

After inlining we get:
void arg_tuple_test(const std::pair<A, A>&) (const struct pair & t)
{
  int (*__vtbl_ptr_type) () * _3;
  int (*__vtbl_ptr_type) () _4;
  const struct A & _6;

  <bb 2>:
  _6 = &t_1(D)->first;
  _3 = MEM[(const struct type *)t_1(D)]._vptr.A;
  _4 = *_3;
  OBJ_TYPE_REF(_4;(const struct A)_6->0) (_6); [tail call]
  return;

}
Here we look all the way to figuring out that the object comes as parameter of arg_tuple_test and give up on this point.
I think GIMPLE typing rules does not really allow us to take the type from COMPONENT_REF in statement defining _6, so it seems we completely lose track of the type info here. (and it indeed looks like a common case)

Richard?
Comment 8 Jan Hubicka 2014-01-16 06:17:56 UTC
Hmm, also if there is anything in standard that prevents user to pass object of type A by reference to type B (where A is not derived from B) then we can get the type info from type of parameter t.
Comment 9 rguenther@suse.de 2014-01-16 08:51:31 UTC
On Thu, 16 Jan 2014, hubicka at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46507
> 
> --- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---

void arg_tuple_test(const std::pair<A, A>&) (const struct pair & t)
{
  int (*__vtbl_ptr_type) () * _3;
  int (*__vtbl_ptr_type) () _4;
  const struct A & _6;

  <bb 2>:
  _6 = &t_1(D)->first;
  _3 = MEM[(const struct type *)t_1(D)]._vptr.A;
  _4 = *_3;
  OBJ_TYPE_REF(_4;(const struct A)_6->0) (_6); [tail call]
  return;

}
Here we look all the way to figuring out that the object comes as 
parameter
of
arg_tuple_test and give up on this point.
I think GIMPLE typing rules does not really allow us to take the type from
COMPONENT_REF in statement defining _6, so it seems we completely lose 
track
of
the type info here. (and it indeed looks like a common case)

Richard?

That is correct.  &t_1(D)->first is merely pointer offsetting and
we may not forward it to form t_1(D)->first._vptr.A for example
(there were wrong-code bugs because of such forwarding in the past).

In GIMPLE you don't have type information on pointed to types, you only
have type info for values.

> Hmm, also if there is anything in standard that prevents user to pass 
> object of type A by reference to type B (where A is not derived from B) 
> then we can get the type info from type of parameter t.

"Standards" do not apply to GIMPLE so you have to be very careful
in this area (one reason why I am not exactly comfortable about
all this devirt and ODR stuff done in the middle-end as opposed
to in a frontend).

Richard.
Comment 10 Jan Hubicka 2014-01-16 18:02:24 UTC
I agree we want to do as much as possible in FE.

ODR decisions are basically done in frontend now - we basically use mangling of virtual tables given to us by C++ FE.  Eventually we will want to get ODR information on all types, not only polymorphic ones.  Then I plan to go with Jason's suggested way of calling decl_assembler name on types to get mangling from C++ FE rather than my initial attempt to actually compare types for equivalence by their high level names that hits interesting issues with templates. It is left in code mostly as backup if the other solution blows up our memory footprint.

I do not think we can do type inheritance analysis + devirtualization purely by frotnend, since it is naturally an inter-procuedural problem that needs to be handled by LTO. We already have sane way to represent the type inheritance tree by binfos.  What we  need to work on is a sane and safe way representing the relevant type information that is given to you by C++ standard but thrown away on way to GIMPLE so the propagation can work.

I was thinking of type assert expressions that can be inserted by FE on places where dynamic type is known for non-trivial reasons. Those can be used as starting points for the propagation of type information and removed after IPA. This would allow us to preserve information while inlining and also make it easier to deal with dynamic type changes. For simple operations, like in this testcase, this approach may be however impractical producing too many new gimple statements.

I am slowly getting to stage where current devirt infrastructure seems to work as expected and seems to use most of information already there in gimple that seems to be safe to assume.

My plan was to basically wait for this release cycle, see how many bugs shows up so I am sure I do not miss something obvious (well, like the virtual inheritance fun).

I think only parts we miss now is to
 - move ipa-cp to new infrastructure
 - move dynamic type change code to ipa-devirt and make it work with contextes
 - understand ctors/dtors calls, since we already ave flag marking them specially.

I also collected testcases we need to handle. I will try to sit down and prepare some proposal about what we need to preserve in gimple.
I think we also want central place (probably in ipa-devirt toplevel comment) documenting all the information we take into account and assumptions we make.
So far they are mostly trivial (i.e. static vars/automatic vars/values passed by invisible references are known to have their type and not derived type) except for Martin's dynamic changing code.