Compiling // PR rtl-optimization/36185 // { dg-do run } // { dg-options "-O2 -fgcse-sm" } struct Base { virtual ~Base() {} virtual void f() = 0; }; struct Derived : Base { Derived(); virtual void f() {} }; struct Foo { Foo(Base&); }; Derived::Derived() { Foo foo(*this); } Foo::Foo(Base& base) { base.f(); } int main() { Derived d; } makes einline to produce: MEM[(struct Base *)&d]._vptr.Base = &_ZTV4Base[2]; d.D.2114._vptr.Base = &_ZTV7Derived[2]; D.2243_5 = &d.D.2114; D.2241_6 = MEM[(struct Base *)&d]._vptr.Base; D.2242_7 = MEM[(int (*__vtbl_ptr_type) (void) *)D.2241_6 + 16B]; OBJ_TYPE_REF(D.2242_7;D.2243_5->2) (D.2243_5); this should get devirtualized but doesn't
Created attachment 21881 [details] partial fix The problem seems to be that folding of obj_refs never sees actual ADDR_EXPR there. Perhaps forwprop is supposed to propagate it there, but it doesn't and doing so would be bit late for inlining anyway. The attached patch solves this part of problem. It gets just the simple case, in general we should do SSA walk to try to prove that all definitions leads to same virtual call. We still however miss devirtualization in earlier function: ;; Function Derived::Derived() (_ZN7DerivedC2Ev) Derived::Derived() (struct Derived * const this) { int (*__vtbl_ptr_type) (void) * D.2236; int (*__vtbl_ptr_type) (void) D.2235; struct Base * D.2214; <bb 2>: MEM[(struct Base *)this_1(D)]._vptr.Base = &_ZTV4Base[2]; this_1(D)->D.2114._vptr.Base = &_ZTV7Derived[2]; D.2214_3 = &this_1(D)->D.2114; D.2236_9 = MEM[(struct Base *)this_1(D)]._vptr.Base; D.2235_10 = MEM[(int (*__vtbl_ptr_type) (void) *)D.2236_9 + 16B]; OBJ_TYPE_REF(D.2235_10;D.2214_3->2) (D.2214_3); <bb 3>: return; <L0>: MEM[(struct Base *)this_1(D)]._vptr.Base = &_ZTV4Base[2]; <L1>: resx 1 } Here IMO we don't see the fact that we are just constructing the object so we know the type. Perhaps we can special case this pointer of constructors or so? This seems rather important as this prevents sane inlining of constructors. Honza
The 800 missed devirtualizations on Mozilla seems to be mostly AddRef that I can imagine is exactly this case of reference counting in constructor.
Hmm, normally we should see it from COMPONENT_REF: while (true) { if (TREE_CODE (ref) == COMPONENT_REF) { tree par_type; tree binfo, base_binfo; tree field = TREE_OPERAND (ref, 1); if (!DECL_ARTIFICIAL (field)) { tree type = TREE_TYPE (field); if (TREE_CODE (type) == RECORD_TYPE) return TYPE_BINFO (type); else return NULL_TREE; } but we don't since it has DECL_ARTIFICIAL set. What is the logic here? Also what about i.e. ARRAY_REF and arrays of objects and COMPONENT_REFs translated to MEM_REFs? Honza
Note that the patch attached solves one indirect call in the testcase but has no effect on mozilla.
Another testcase where we devirtualize via folding is: // { dg-do assemble } // { dg-options "-g -O2" } // Copyright (C) 1999 Free Software Foundation, Inc. // Contributed by Nathan Sidwell 21 Nov 1999 <nathan@acm.org> // This causes assember relocation errors struct X { virtual ~X () {} }; struct Y { Y (){}; }; void foo () { X *x = new X; x->~X (); Y ys[2]; } compiled with -O2 we get x_3 = operator new (8); # DEBUG this => x_3 x_3->_vptr.X = &_ZTV1X[2]; # DEBUG x => x_3 D.2142_7 = (int (*__vtbl_ptr_type) (void)) __comp_dtor ; OBJ_TYPE_REF(D.2142_7;x_3->0) (x_3); that gets folded only in ccp3. We need FRE to fold: x_3->_vptr.X = &_ZTV1X[2]; # DEBUG x => x_3 D.2141_6 = &_ZTV1X[2]; D.2142_7 = *D.2141_6; OBJ_TYPE_REF(D.2142_7;x_3->0) (x_3); into x_3 = operator new (8); # DEBUG this => x_3 x_3->_vptr.X = &_ZTV1X[2]; # DEBUG x => x_3 D.2141_6 = x_3->_vptr.X; D.2142_7 = *D.2141_6; OBJ_TYPE_REF(D.2142_7;x_3->0) (x_3);
Hi, (In reply to comment #3) > Hmm, > normally we should see it from COMPONENT_REF: I don't understand the sentence above. > while (true) > { > if (TREE_CODE (ref) == COMPONENT_REF) > { > tree par_type; > tree binfo, base_binfo; > tree field = TREE_OPERAND (ref, 1); > > if (!DECL_ARTIFICIAL (field)) > { > tree type = TREE_TYPE (field); > if (TREE_CODE (type) == RECORD_TYPE) > return TYPE_BINFO (type); > else > return NULL_TREE; > } > but we don't since it has DECL_ARTIFICIAL set. What is the logic here? > Also what about i.e. ARRAY_REF and arrays of objects and COMPONENT_REFs > translated to MEM_REFs? See testcase g++.dg/ipa/ivinline-5.C. We have to differentiate between fields which represent ancestors and field which are really put there by the user. MEM_REFs would indeed pose a problem here, we would have to find the field by (a simpler version of) something like build_user_friendly_ref_for_offset in tree-sra.c. ARRAY_REFs cannot represent ancestors and so are not a problem. But you wrote the field is artificial so the code above should not be an issue. In fact, the code specifically does make sure it does not traverse BINFOs of the first ancestors with virtual methods because they do not have their own list of virtual methods (http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01458.html).
And by the way, I'm afraid we really need to somehow address PR 45934 before we attempt to fold more O_T_Rs coming from constructors/destructors.
I've just confirmed that main of the testcase from the initial bug description is optimized to nothing even by just the early optimizers on trunk. My dynamic-type change detection patches postpone that a little bit, unfortunately (and inevitably) but the final result is the same. I believe we have testcases already for this. As far as the testcase from comment #5 is concerned, that is quite another matter because the object is dynamically allocated there. If the constructor is inlined, we may do this with improved folding of O_T_R according to its first parameter. If it is not, we would need to be able to track the object interprocedurally to verify nothing bad happens to it (like a call to a destructor followed by a call to placement new). And of course we would have to solve the "operator new is not malloc" problem.
OK, main() code seems to optimize out that is an imrovement. Is it optimized away with your patch pre-IPA too? Derived() is also devirtualizable: Derived::Derived() (struct Derived * const this) { int (*__vtbl_ptr_type) (void) * D.2237; int (*__vtbl_ptr_type) (void) D.2236; struct Base * D.2215; <bb 2>: D.2215_2 = &this_1(D)->D.2115; D.2215_2->_vptr.Base = &_ZTV4Base[2]; this_1(D)->D.2115._vptr.Base = &_ZTV7Derived[2]; D.2215_3 = &this_1(D)->D.2115; D.2237_9 = D.2215_3->_vptr.Base; D.2236_10 = MEM[(int (*__vtbl_ptr_type) (void) *)D.2237_9 + 16B]; OBJ_TYPE_REF(D.2236_10;D.2215_3->2) (D.2215_3); We fail to devirtualize pre-IPA. After IPA we compile into: <bb 2>: MEM[(struct Base *)this_1(D)]._vptr.Base = &_ZTV4Base[2]; this_1(D)->D.2115._vptr.Base = &_ZTV7Derived[2]; D.2215_3 = &this_1(D)->D.2115; D.2236_10 = (int (*__vtbl_ptr_type) (void)) f; a type mismatch?
Eh, wanted to paste: D.2236_10 = (int (*__vtbl_ptr_type) (void)) f; OBJ_TYPE_REF(D.2236_10;D.2215_3->2) (D.2215_3); I told ccp should IMO optimize it, but doesn't. I guess it is because VRP1 works out the constant but then it doesn't know about OBJ_TYPE_REF folding.
I believe Richi's comment #14 in PR 46076 applies here as well.
No, this is different, since OBJ_TYPE_REF is sitting here and it imply type conversion in the way we implement it right now. There is no type mismatch in between the original address and the call argument, just the intermediate pointer is of wrong type. VRP should do the same as CCP, see the code I added into CCP.
... but obviously the problem still is that we don't devirtualize this early enough for inlining. The low level code should be able to do so if FRE+CCP was added as early pass or FRE was also extended to fold the OBJ_TYEP_REF calls away and added as early pass. Now when Richi fixed it, it should even be possible. However, can't this be also handled by type based code?
(In reply to comment #9) > OK, main() code seems to optimize out that is an imrovement. Is it optimized > away with your patch pre-IPA too? Yes. Just before IPA, in fact. > > Derived() is also devirtualizable: > It could be done intraprocedurally if, unlike for automatically allocated decls, we considered all calls as potentially changing the dynamic type in unknown ways while doing the dynamic type change detection. This then however becomes essentially the same thing as devirtualization based on the constant folding, perhaps only more complicated. It could be optimized by IPA-CP if we can confirm that all callers are automatically allocated (or that we generally have them under control). This should be relatively easy, I will have a look at that. Mainly because we may be able to widen scope of objects under our control later.
I ran into another variant of this problem, which I reduced to the following testcase. I found the problem on 4.8.2, but it is already fixed in trunk / gcc-4.9 (Debian 4.9-20140218-1). Still, it might be useful to have the testcase here for reference. class Base { }; class Sub : public Base { public: virtual void bar(); }; Sub foo; Sub * const pointer = &foo; Sub* function() { return &foo; }; int main() { // Gcc 4.8.2 devirtualizes this: pointer->bar(); // but not this: function()->bar(); }
With new polymorphic call code we could handle the testcase from comment 15 by simple propagation of contextes down during early optimization. This may be easy enough to be worth to implement. Still general solution is to add return functions to ipa-prop.
This is the last bug still open blocking bug 45375
The release_ssa dump of function main in the testcase from bug description is just "return 0;" The release_ssa dump of function main the testcase from comment #15 is: int main () { <bb 2> [local count: 1073741824]: Sub::bar (&foo); Sub::bar (&foo); return 0; } ...so devirtualized in my book (main is cold, so they are not inlined, of course). The release_ssa dump of function foo in the testcase from comment #5 is: void foo () { void * _3; <bb 2> [local count: 1073741824]: _3 = operator new (8); MEM[(struct X *)_3]._vptr.X = &MEM <int (*) ()[4]> [(void *)&_ZTV1X + 16B]; X::~X (_3); return; } ...so also devirtualzied before IPA. Therefore I am going to close this old bug as done. If anyone finds a different testcase where the early optimizers should do a better job to help devirtualziation, please file a new bug.