Bug 45791 - Missed devirtualization
Summary: Missed devirtualization
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: mozillametabug
  Show dependency treegraph
 
Reported: 2010-09-25 15:34 UTC by Jan Hubicka
Modified: 2021-01-07 17:38 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-12-14 17:45:52


Attachments
partial fix (538 bytes, patch)
2010-09-25 16:44 UTC, Jan Hubicka
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2010-09-25 15:34:55 UTC
Compiling
// PR rtl-optimization/36185
// { dg-do run }
// { dg-options "-O2 -fgcse-sm" }

struct Base {
        virtual ~Base() {}
        virtual void f() = 0;
};
struct Derived : Base {
        Derived();
        virtual void f() {}
};
struct Foo {
        Foo(Base&);
};
Derived::Derived() {
        Foo foo(*this);
}
Foo::Foo(Base& base) {
        base.f();
}
int main() {
        Derived d;
}

makes einline to produce:
  MEM[(struct Base *)&d]._vptr.Base = &_ZTV4Base[2];
  d.D.2114._vptr.Base = &_ZTV7Derived[2];
  D.2243_5 = &d.D.2114;
  D.2241_6 = MEM[(struct Base *)&d]._vptr.Base;
  D.2242_7 = MEM[(int (*__vtbl_ptr_type) (void) *)D.2241_6 + 16B];
  OBJ_TYPE_REF(D.2242_7;D.2243_5->2) (D.2243_5);
this should get devirtualized but doesn't
Comment 1 Jan Hubicka 2010-09-25 16:44:26 UTC
Created attachment 21881 [details]
partial fix

The problem seems to be that folding of obj_refs never sees actual ADDR_EXPR there.  Perhaps forwprop is supposed to propagate it there, but it doesn't and doing so would be bit late for inlining anyway.

The attached patch solves this part of problem.  It gets just the simple case, in general we should do SSA walk to try to prove that all definitions leads to same virtual call.

We still however miss devirtualization in earlier function:

;; Function Derived::Derived() (_ZN7DerivedC2Ev)

Derived::Derived() (struct Derived * const this)
{
  int (*__vtbl_ptr_type) (void) * D.2236;
  int (*__vtbl_ptr_type) (void) D.2235;
  struct Base * D.2214;

<bb 2>:
  MEM[(struct Base *)this_1(D)]._vptr.Base = &_ZTV4Base[2];
  this_1(D)->D.2114._vptr.Base = &_ZTV7Derived[2];
  D.2214_3 = &this_1(D)->D.2114;
  D.2236_9 = MEM[(struct Base *)this_1(D)]._vptr.Base;
  D.2235_10 = MEM[(int (*__vtbl_ptr_type) (void) *)D.2236_9 + 16B];
  OBJ_TYPE_REF(D.2235_10;D.2214_3->2) (D.2214_3);

<bb 3>:
  return;

<L0>:
  MEM[(struct Base *)this_1(D)]._vptr.Base = &_ZTV4Base[2];

<L1>:
  resx 1

}

Here IMO we don't see the fact that we are just constructing the object so we know the type.  Perhaps we can special case this pointer of constructors or so?
This seems rather important as this prevents sane inlining of constructors.

Honza
Comment 2 Jan Hubicka 2010-09-25 16:52:25 UTC
The 800 missed devirtualizations on Mozilla seems to be mostly AddRef that I can imagine is exactly this case of reference counting in constructor.
Comment 3 Jan Hubicka 2010-09-25 22:10:15 UTC
Hmm,
normally we should see it from COMPONENT_REF:
  while (true)
    {
      if (TREE_CODE (ref) == COMPONENT_REF)
        {
          tree par_type;
          tree binfo, base_binfo;
          tree field = TREE_OPERAND (ref, 1);

          if (!DECL_ARTIFICIAL (field))
            {
              tree type = TREE_TYPE (field);
              if (TREE_CODE (type) == RECORD_TYPE)
                return TYPE_BINFO (type);
              else
                return NULL_TREE;
            }
but we don't since it has DECL_ARTIFICIAL set.  What is the logic here?
Also what about i.e. ARRAY_REF and arrays of objects and COMPONENT_REFs translated to MEM_REFs?

Honza
Comment 4 Jan Hubicka 2010-09-25 22:12:39 UTC
Note that the patch attached solves one indirect call in the testcase but has no effect on mozilla.
Comment 5 Jan Hubicka 2010-09-25 22:17:41 UTC
Another testcase where we devirtualize via folding is:
// { dg-do assemble  }
// { dg-options "-g -O2" }

//  Copyright (C) 1999 Free Software Foundation, Inc.
//  Contributed by Nathan Sidwell 21 Nov 1999 <nathan@acm.org>

// This causes assember relocation errors

struct X
{
  virtual ~X () {}
};

struct Y
{
  Y (){};
};

void foo ()
{
  X *x = new X;
  x->~X ();
  Y ys[2];
}

compiled with -O2 we get
  x_3 = operator new (8);
  # DEBUG this => x_3
  x_3->_vptr.X = &_ZTV1X[2];
  # DEBUG x => x_3 
  D.2142_7 = (int (*__vtbl_ptr_type) (void)) __comp_dtor ;
  OBJ_TYPE_REF(D.2142_7;x_3->0) (x_3);
that gets folded only in ccp3. We need FRE to fold:
  x_3->_vptr.X = &_ZTV1X[2];
  # DEBUG x => x_3
  D.2141_6 = &_ZTV1X[2];
  D.2142_7 = *D.2141_6;
  OBJ_TYPE_REF(D.2142_7;x_3->0) (x_3);
into
  x_3 = operator new (8);
  # DEBUG this => x_3
  x_3->_vptr.X = &_ZTV1X[2];
  # DEBUG x => x_3
  D.2141_6 = x_3->_vptr.X;
  D.2142_7 = *D.2141_6;
  OBJ_TYPE_REF(D.2142_7;x_3->0) (x_3);
Comment 6 Martin Jambor 2010-10-11 17:15:02 UTC
Hi,

(In reply to comment #3)
> Hmm,
> normally we should see it from COMPONENT_REF:

I don't understand the sentence above.

>   while (true)
>     {
>       if (TREE_CODE (ref) == COMPONENT_REF)
>         {
>           tree par_type;
>           tree binfo, base_binfo;
>           tree field = TREE_OPERAND (ref, 1);
> 
>           if (!DECL_ARTIFICIAL (field))
>             {
>               tree type = TREE_TYPE (field);
>               if (TREE_CODE (type) == RECORD_TYPE)
>                 return TYPE_BINFO (type);
>               else
>                 return NULL_TREE;
>             }
> but we don't since it has DECL_ARTIFICIAL set.  What is the logic here?
> Also what about i.e. ARRAY_REF and arrays of objects and COMPONENT_REFs
> translated to MEM_REFs?

See testcase g++.dg/ipa/ivinline-5.C.  We have to differentiate
between fields which represent ancestors and field which are really
put there by the user.  MEM_REFs would indeed pose a problem here, we
would have to find the field by (a simpler version of) something like
build_user_friendly_ref_for_offset in tree-sra.c.  ARRAY_REFs cannot
represent ancestors and so are not a problem.

But you wrote the field is artificial so the code above should not be
an issue.  In fact, the code specifically does make sure it does not
traverse BINFOs of the first ancestors with virtual methods because
they do not have their own list of virtual methods
(http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01458.html).
Comment 7 Martin Jambor 2010-10-11 18:40:42 UTC
And by the way, I'm afraid we really need to somehow address PR 45934
before we attempt to fold more O_T_Rs coming from
constructors/destructors.
Comment 8 Martin Jambor 2010-12-14 17:45:52 UTC
I've just confirmed that main of the testcase from the initial bug
description is optimized to nothing even by just the early optimizers
on trunk.  My dynamic-type change detection patches postpone that a
little bit, unfortunately (and inevitably) but the final result is the
same.  I believe we have testcases already for this.

As far as the testcase from comment #5 is concerned, that is quite
another matter because the object is dynamically allocated there.  If
the constructor is inlined, we may do this with improved folding of
O_T_R according to its first parameter.  If it is not, we would need
to be able to track the object interprocedurally to verify nothing bad
happens to it (like a call to a destructor followed by a call to
placement new).  And of course we would have to solve the "operator
new is not malloc" problem.
Comment 9 Jan Hubicka 2010-12-14 23:15:16 UTC
OK, main() code seems to optimize out that is an imrovement. Is it optimized away with your patch pre-IPA too? 

Derived() is also devirtualizable:

Derived::Derived() (struct Derived * const this)
{
  int (*__vtbl_ptr_type) (void) * D.2237;
  int (*__vtbl_ptr_type) (void) D.2236;
  struct Base * D.2215;

<bb 2>:
  D.2215_2 = &this_1(D)->D.2115;
  D.2215_2->_vptr.Base = &_ZTV4Base[2];
  this_1(D)->D.2115._vptr.Base = &_ZTV7Derived[2];
  D.2215_3 = &this_1(D)->D.2115;
  D.2237_9 = D.2215_3->_vptr.Base;
  D.2236_10 = MEM[(int (*__vtbl_ptr_type) (void) *)D.2237_9 + 16B];
  OBJ_TYPE_REF(D.2236_10;D.2215_3->2) (D.2215_3);

We fail to devirtualize pre-IPA.  After IPA we compile into:

<bb 2>:
  MEM[(struct Base *)this_1(D)]._vptr.Base = &_ZTV4Base[2];
  this_1(D)->D.2115._vptr.Base = &_ZTV7Derived[2];
  D.2215_3 = &this_1(D)->D.2115;
  D.2236_10 = (int (*__vtbl_ptr_type) (void)) f;

a type mismatch?
Comment 10 Jan Hubicka 2010-12-14 23:17:13 UTC
Eh,
wanted to paste:
  D.2236_10 = (int (*__vtbl_ptr_type) (void)) f;
  OBJ_TYPE_REF(D.2236_10;D.2215_3->2) (D.2215_3);
I told ccp should IMO optimize it, but doesn't. I guess it is because VRP1 works out the constant but then it doesn't know about OBJ_TYPE_REF folding.
Comment 11 Martin Jambor 2010-12-14 23:35:29 UTC
I believe Richi's comment #14 in PR 46076 applies here as well.
Comment 12 Jan Hubicka 2010-12-15 00:10:45 UTC
No, this is different, since OBJ_TYPE_REF is sitting here and it imply type conversion in the way we implement it right now.
There is no type mismatch in between the original address and the call argument, just the intermediate pointer is of wrong type.  VRP should do the same as CCP, see the code I added into CCP.
Comment 13 Jan Hubicka 2010-12-15 00:12:54 UTC
... but obviously the problem still is that we don't devirtualize this early enough for inlining.  The low level code should be able to do so if FRE+CCP was added as early pass or FRE was also extended to fold the OBJ_TYEP_REF calls away and added as early pass.

Now when Richi fixed it, it should even be possible. However, can't this be also handled by type based code?
Comment 14 Martin Jambor 2010-12-15 16:07:31 UTC
(In reply to comment #9)
> OK, main() code seems to optimize out that is an imrovement. Is it optimized
> away with your patch pre-IPA too? 

Yes.  Just before IPA, in fact.

> 
> Derived() is also devirtualizable:
> 

It could be done intraprocedurally if, unlike for automatically
allocated decls, we considered all calls as potentially changing the
dynamic type in unknown ways while doing the dynamic type change
detection.  This then however becomes essentially the same thing as
devirtualization based on the constant folding, perhaps only more
complicated.

It could be optimized by IPA-CP if we can confirm that all callers are
automatically allocated (or that we generally have them under
control).  This should be relatively easy, I will have a look at that.
Mainly because we may be able to widen scope of objects under our
control later.
Comment 15 Matthijs Kooijman 2014-02-24 09:44:08 UTC
I ran into another variant of this problem, which I reduced to the following testcase. I found the problem on 4.8.2, but it is already fixed in trunk / gcc-4.9 (Debian 4.9-20140218-1). Still, it might be useful to have the testcase here for reference.

class Base { };

class Sub : public Base {
public: 
        virtual void bar();
};

Sub foo;
Sub * const pointer = &foo;
Sub* function() { return &foo; };

int main() {
        // Gcc 4.8.2 devirtualizes this:
        pointer->bar();
        // but not this:
        function()->bar();
}
Comment 16 Jan Hubicka 2014-09-25 20:29:19 UTC
With new polymorphic call code we could handle the testcase from comment 15 by simple propagation of contextes down during early optimization. This may be easy enough to be worth to implement.

Still general solution is to add return functions to ipa-prop.
Comment 17 Eric Gallager 2019-05-25 22:31:19 UTC
This is the last bug still open blocking bug 45375
Comment 18 Martin Jambor 2021-01-07 17:38:56 UTC
The release_ssa dump of function main in the testcase from bug
description is just "return 0;"

The release_ssa dump of function main the testcase from comment #15
is:

int main ()
{
  <bb 2> [local count: 1073741824]:
  Sub::bar (&foo);
  Sub::bar (&foo);
  return 0;

}
...so devirtualized in my book (main is cold, so they are not inlined,
of course).


The release_ssa dump of function foo in the testcase from comment #5
is:

void foo ()
{
  void * _3;

  <bb 2> [local count: 1073741824]:
  _3 = operator new (8);
  MEM[(struct X *)_3]._vptr.X = &MEM <int (*) ()[4]> [(void *)&_ZTV1X + 16B];
  X::~X (_3);
  return;

}
...so also devirtualzied before IPA.

Therefore I am going to close this old bug as done.  If anyone finds a
different testcase where the early optimizers should do a better job
to help devirtualziation, please file a new bug.