There are some missed devirtualization opportunities involving vectors. The issue is discussed in the stackoverflow here. https://stackoverflow.com/questions/73257739/missed-optimization-stdvectortpop-back-not-qualifying-destructor-call/ Because we know the type of the objects in the vector (they have to be _Tp), there is no need to invoke the virtual methods and we can devirtualize them. I believe this applies not just to the destructor, but also other member functions. This missed optimization does not only affect vector. When creating an array, the virtual destructor is invoked even though it should be quite clear to the compiler that it can resolve the type of the object in the array at compile time. There is a possibility of using placement new and "spoofing" the type https://godbolt.org/z/3MdMsdKha (thanks Jonathan Wakely), but I'm not sure if this is legal c++ and if this applies to raw arrays I think this might be a problem with the IPA component, I might have mislabelled the issue because there isn't an option for IPA. Please let me know if this makes sense or I have some misconception. A sample program that shows this behavior is below. /* { dg-do run } */ /* Virtual calls should be devirtualized because we know dynamic type of object in vector at compile time */ /* { dg-options "-O3 -fdump-tree-ssa" } */ #include <vector> using std::vector; class A { public: virtual ~A() { } }; class B : public A { public: virtual ~B() { } }; int main() { vector<B> arr; // B b[40]; return 0; } /* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "ssa"} } */
I don't think it should be checking ssa dump (which is the output right after going into ssa mode) but rather optimized.
I am not 100% sure the all of the objects in the vector<C> has to be in type of C. Because you could do some tricks dealing with inplacement new. >if this applies to raw arrays It does applies to raw arrays.
I'm giving the example of an array for now, because gcc treatment of the destructor is inconsistent and depends on the length of the array. Clang on the other hand is able to devirtualize the destructor in the array no matter the length of the array. Virtual call https://godbolt.org/z/f33Gh5EGM Devirtualized call https://godbolt.org/z/jPz3Ka1cd We know it is devirtualized because the destructor of the derived object is not called and the compiler assumes that the items in the array are all Base objects. So currently, gcc does perform this kind of "devirtualization" (I wouldn't really call it a devirtualization for an array, because it is similar to declaring Derived d; on the stack), but it is dependent on array length, and I think we should make it devirtualized for all length of array. I changed my testing to dump tree optimized like below (let me know if there's issues with the example because I am not familiar with hacking gcc). /* { dg-do run } */ /* Virtual calls should be devirtualized because we know dynamic type of object in vector at compile time */ /* { dg-options "-O3 -fdump-tree-optimized -fno-inline" } */ #include <vector> #include <iostream> #include <memory> using std::vector; class A { public: virtual ~A() { } }; class B : public A { public: virtual ~B() { } }; int main() { B b[3]; return 0; } /* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */
Would anyone be able to direct me to which portion of the code is responsible for this threshold between len 2 & 3 array? Is this the responsibility of the c++ frontend? or is it still related to the optimizer.
(In reply to Ng YongXiang from comment #4) > Would anyone be able to direct me to which portion of the code is > responsible for this threshold between len 2 & 3 array? Is this the > responsibility of the c++ frontend? or is it still related to the optimizer. The front-end has nothing to do with the difference between 2 and 3. The difference between 2 and 3 is unrolling of the loop in the optimizers. See bug 109770 and bug 80899 for other cases of inplacement new causing issues.
That is interesting. Thanks for the reply. However, I'd argue that the 2 bugs mentioned are different from what I am proposing. The 2 bugs linked access virtual functions via ptr (delete p; val->f();) and are invoked by the user directly, and so I think it makes sense for the compiler to access the vtable. However, the array issue I'm putting forth is caused by the destruction of the array with automatic storage duration, and the destruction code is generated by the compiler not by the user via some kind of ptr or reference. Moreover, clang does it right and devirtualizes the destruction of the array while gcc still doesn't. https://godbolt.org/z/f33Gh5EGM #include <iostream> #include <memory> struct A { A() { std::cout << "A()" << std::endl; } virtual ~A() { std::cout << "~A()" << std::endl; } }; struct B : public A { B() { std::cout << "B()" << std::endl; } virtual ~B() { std::cout << "~B()" << std::endl; } }; int main() { A a[3]; a[0].~A(); :: new(std::addressof(a[0])) B(); }
Created attachment 55256 [details] Arrays should do non virtual lookup when destructing individual elements This is a draft patch outlining a possible solution to call non virtual destructor of elements when arrays are destructed (clang already does this, see https://godbolt.org/z/f33Gh5EGM).
Just added a patch to illustrate the array destruction issue. What do you think?
Would anyone be willing to provide some feedback regarding the attachment (https://gcc.gnu.org/bugzilla/attachment.cgi?id=55256&action=diff) that I have created? Thanks.
The trunk branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:a47e615fbf9c6f4b24e5032df5d720b6bf9b63b5 commit r14-2853-ga47e615fbf9c6f4b24e5032df5d720b6bf9b63b5 Author: Ng YongXiang <yongxiangng@gmail.com> Date: Thu Jul 27 08:06:14 2023 +0800 c++: devirtualization of array destruction [PR110057] PR c++/110057 PR ipa/83054 gcc/cp/ChangeLog: * init.cc (build_vec_delete_1): Devirtualize array destruction. gcc/testsuite/ChangeLog: * g++.dg/warn/pr83054.C: Remove devirtualization warning. * g++.dg/lto/pr89335_0.C: Likewise. * g++.dg/tree-ssa/devirt-array-destructor-1.C: New test. * g++.dg/tree-ssa/devirt-array-destructor-2.C: New test. * g++.dg/warn/pr83054-2.C: New test. Signed-off-by: Ng Yong Xiang <yongxiangng@gmail.com>
One way this could be implemented is the following. * implement built-in `__builtin_vptr(object)` and `__builtin_vptr(Type)` that obtains the pointer to the vtable (or NULL if the object type has no virtual method) * in the implementation `vector.pop_back()` etc., add the the following: if (__builtin_vptr(this->_M_impl._M_finish) != __builtin_vptr(_Tp)) __builtin_unreachable(); * the optimizer should be able to figure out the rest. What do you think about this approach? -------- Alternatively, we can also use the existing `typeid()` --- although the optimizer is not smart enough to optimize the following code yet. ``` #include<cstdio> #include<typeinfo> struct Base{ Base() {} virtual void f(){ std::printf("Base\n"); } }; struct Derived final: Base{ Derived() {} virtual void f(){ std::printf("Derived\n"); } }; void f(Base* b){ if(!b or typeid(*b)!=typeid(Base)) __builtin_unreachable(); b->f(); } ```
(In reply to user202729 from comment #11) > if (__builtin_vptr(this->_M_impl._M_finish) != __builtin_vptr(_Tp)) > __builtin_unreachable(); This is more or less what the existing speculative devirtualization code (ipa-devirt.cc) does. It might be nice to be able to force speculative devirtualization in some circumstances, perhaps in the destroy_at function. But for vector specifically, and probably other containers as well, we know specifically the type that we're destroying, so the library should use a scoped destructor call (i.e. ptr->_Tp::~_Tp()) to force a non-virtual call. So the vector situation seems to me like a library issue.
That would only be valid for std::allocator and other allocators where std::allocator_traits<A>::destroy uses std::destroy_at. As it happens, I've just refactored std::allocator_traits today, which makes it possible to test for exactly that condition.
So something like this, and then use it in containers instead of _Alloc_traits::destroy template<typename _Tp, typename _Allocator> _GLIBCXX20_CONSTEXPR void _Destroy_static_type(_Tp* __p, _Allocator& __alloc) { #if __cplusplus >= 201103L if constexpr (__allocator_traits_base::__has_destroy<_Allocator, _Tp>) allocator_traits<_Allocator>::destroy(__alloc, __p); else #endif __p->_Tp::~_Tp(); }
(In reply to Jonathan Wakely from comment #14) LGTM.
(In reply to Jonathan Wakely from comment #14) > So something like this, and then use it in containers instead of > _Alloc_traits::destroy > > template<typename _Tp, typename _Allocator> > _GLIBCXX20_CONSTEXPR > void > _Destroy_static_type(_Tp* __p, _Allocator& __alloc) > { > #if __cplusplus >= 201103L > if constexpr (__allocator_traits_base::__has_destroy<_Allocator, _Tp>) > allocator_traits<_Allocator>::destroy(__alloc, __p); > else > #endif > __p->_Tp::~_Tp(); > } That sound like a good idea, thanks. I thought about this some time earlier (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413#c3) but I did not know how to check the implementation of `destroy()`. Nevertheless, this does not handle the case of devirtualization in vector access (e.g. `v[0].f()` where v is a vector and method f is virtual). What do you think? Apart from that, being do you think it is common for the user to override `destroy()` (in a template specialization) but still call the destructor inside it? In that case the proposal would not optimize that case.
(In reply to user202729 from comment #16) > That sound like a good idea, thanks. I thought about this some time earlier > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413#c3) but I did not know > how to check the implementation of `destroy()`. You only need to know if Alloc::destroy exists or not. > Nevertheless, this does not handle the case of devirtualization in vector > access (e.g. `v[0].f()` where v is a vector and method f is virtual). What > do you think? I don't think that optimization would be valid. Users could do disgusting things like this (as long as sizeof(Base) == sizeof(Derived)): std::vector<Base> v(1); v[0].~Base(); ::new((void*)v.data()) Derived(); v[0].f(); v[0].~Derived(); ::new((void*)v.data()) Base(); I'm not even 100% convinced that the destructor optimization is valid, but I think any time there is pointer arithmetic on the vector's storage, the dynamic types of objects in the array must match the static types. Arguably the v[0] access above already requires that, so maybe the code above isn't valid, but I don't think it's entirely clear from the standard. This certainly seems to be allowed: std::vector<Base> v(1); Base* p = v.data(); p->~Base(); ::new((void*)p) Derived(); p->f(); p->~Derived(); ::new((void*)p->) Base(); If we can be sure that v[0] requires the static and dynamic types to match, then we could do something like: reference operator[](size_type __n) _GLIBCXX_NOEXCEPT { __glibcxx_requires_subscript(__n); #ifdef __cpp_rtti if (typeid(this->_M_impl._M_start[__n]) != typeid(value_type)) __builtin_unreachable(); #endif return *(this->_M_impl._M_start + __n); } I have no idea whether that would help anything, or if the use of typeid would actually pessimize things. > Apart from that, being do you think it is common for the user to override > `destroy()` (in a template specialization) but still call the destructor > inside it? In that case the proposal would not optimize that case. I don't really care about that, we can't handle every case.
(In reply to Jonathan Wakely from comment #17) > I don't think that optimization would be valid. Users could do disgusting > things like this (as long as sizeof(Base) == sizeof(Derived)): > > std::vector<Base> v(1); > v[0].~Base(); > ::new((void*)v.data()) Derived(); > v[0].f(); > v[0].~Derived(); > ::new((void*)v.data()) Base(); According to "n.m." on StackOverflow, that code is illegal https://stackoverflow.com/questions/76380436/is-placement-new-of-derived-type-within-a-vector-array-of-base-type-legal#comment134688951_76380436 So I guess we're safe if the comment is correct. (Same for the vector destructor case) > This certainly seems to be allowed: > > std::vector<Base> v(1); > Base* p = v.data(); > p->~Base(); > ::new((void*)p) Derived(); > p->f(); > p->~Derived(); > ::new((void*)p->) Base(); The clause seems to apply just as well, you cannot access the newly constructed `Derived` object through `p`. Further reading: https://stackoverflow.com/questions/62642542/were-all-implementations-of-stdvector-non-portable-before-stdlaunder > If we can be sure that v[0] requires the static and dynamic types to match, > then we could do something like: > > reference > operator[](size_type __n) _GLIBCXX_NOEXCEPT > { > __glibcxx_requires_subscript(__n); > #ifdef __cpp_rtti > if (typeid(this->_M_impl._M_start[__n]) != typeid(value_type)) > __builtin_unreachable(); > #endif > return *(this->_M_impl._M_start + __n); > } > > I have no idea whether that would help anything, or if the use of typeid > would actually pessimize things. It turns out that the compiler can currently prove (with optimization enabled) that `typeid::operator==` is pure, so it wouldn't pessimize things. If you want to be extra careful I suppose `#if __OPTIMIZE__` is an option.
(In reply to user202729 from comment #18) > (In reply to Jonathan Wakely from comment #17) > The clause seems to apply just as well, you cannot access the newly > constructed `Derived` object through `p`. That's easily solved by accessing the new object through the pointer returned by the new expression: std::vector<Base> v(1); Base* p = v.data(); p->~Base(); p = ::new((void*)p) Derived(); p->f(); p->~Base(); ::new((void*)p) Base(); By the time anything is accessed through the vector's internal pointers, an object of the original type has been restored at that location, which meets all the requirements of [basic.life] p7. But that means v[0].f() wouldn't be valid, only p->f(), so adding information about the dynamic type to vector::operator[] would be allowed.
(In reply to Jonathan Wakely from comment #19) > That's easily solved by accessing the new object through the pointer > returned by the new expression: > > std::vector<Base> v(1); > Base* p = v.data(); > p->~Base(); > p = ::new((void*)p) Derived(); > p->f(); > p->~Base(); > ::new((void*)p) Base(); > > By the time anything is accessed through the vector's internal pointers, an > object of the original type has been restored at that location, which meets > all the requirements of [basic.life] p7. Trying codes related to it, do you think the following is valid C++? void k(){ Base b; b.~Base(); Base* p=new (&b) Derived; p->~Base(); // this cannot be devirtualized to Base::~Base() new (p) Base; } Currently gcc devirtualize the commented function to Base::~Base() anyway. https://godbolt.org/z/bsnEdxMeK
Yes, I think that's valid, so shouldn't be devirtualized.
The master branch has been updated by Sam James <sjames@gcc.gnu.org>: https://gcc.gnu.org/g:21fc6d35f27cdf4c56e9f093894f6728c55ecb0e commit r15-2440-g21fc6d35f27cdf4c56e9f093894f6728c55ecb0e Author: Sam James <sam@gentoo.org> Date: Tue Jul 30 21:44:12 2024 +0100 testsuite: fix 'dg-do-compile' typos We want 'dg-do compile', not 'dg-do-compile'. Fix that. PR target/69194 PR c++/92024 PR c++/110057 * c-c++-common/Wshadow-1.c: Fix 'dg-do compile' typo. * g++.dg/tree-ssa/devirt-array-destructor-1.C: Likewise. * g++.dg/tree-ssa/devirt-array-destructor-2.C: Likewise. * gcc.target/arm/pr69194.c: Likewise.
(In reply to Jonathan Wakely from comment #14) > So something like this, and then use it in containers instead of > _Alloc_traits::destroy > > template<typename _Tp, typename _Allocator> > _GLIBCXX20_CONSTEXPR > void > _Destroy_static_type(_Tp* __p, _Allocator& __alloc) > { > #if __cplusplus >= 201103L > if constexpr (__allocator_traits_base::__has_destroy<_Allocator, _Tp>) > allocator_traits<_Allocator>::destroy(__alloc, __p); > else > #endif > __p->_Tp::~_Tp(); > } Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664288.html