110057 – Missed devirtualization opportunities

Bug 110057 - Missed devirtualization opportunities

Summary: Missed devirtualization opportunities

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	14.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2023-05-31 10:08 UTC by Ng YongXiang
Modified:	2024-10-02 11:15 UTC (History)
CC List:	5 users (show)

See Also:	109770 80899 59883 115413
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Arrays should do non virtual lookup when destructing individual elements (1.40 KB, patch) 2023-06-05 03:11 UTC, Ng YongXiang	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ng YongXiang 2023-05-31 10:08:25 UTC

There are some missed devirtualization opportunities involving vectors. The issue is discussed in the stackoverflow here.

https://stackoverflow.com/questions/73257739/missed-optimization-stdvectortpop-back-not-qualifying-destructor-call/

Because we know the type of the objects in the vector (they have to be _Tp), there is no need to invoke the virtual methods and we can devirtualize them. I believe this applies not just to the destructor, but also other member functions.

This missed optimization does not only affect vector. When creating an array, the virtual destructor is invoked even though it should be quite clear to the compiler that it can resolve the type of the object in the array at compile time.

There is a possibility of using placement new and "spoofing" the type https://godbolt.org/z/3MdMsdKha (thanks Jonathan Wakely), but I'm not sure if this is legal c++ and if this applies to raw arrays

I think this might be a problem with the IPA component, I might have mislabelled the issue because there isn't an option for IPA. Please let me know if this makes sense or I have some misconception.

A sample program that shows this behavior is below.

/* { dg-do run } */
/* Virtual calls should be devirtualized because we know dynamic type of object in vector at compile time */
/* { dg-options "-O3 -fdump-tree-ssa"  } */

#include <vector>

using std::vector;

class A {
public:
    virtual ~A()
    {

    }
};

class B  : public A
{
public:
    virtual ~B()
    {

    }
};

int main()
{
    vector<B> arr;
//    B b[40];
    return 0;
}

/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "ssa"} } */

Comment 1 Andrew Pinski 2023-06-01 00:15:45 UTC

I don't think it should be checking ssa dump (which is the output right after going into ssa mode) but rather optimized.

Comment 2 Andrew Pinski 2023-06-01 00:24:56 UTC

I am not 100% sure the all of the objects in the vector<C> has to be in type of C. Because you could do some tricks dealing with inplacement new.

>if this applies to raw arrays

It does applies to raw arrays.

Comment 3 Ng YongXiang 2023-06-01 09:35:00 UTC

I'm giving the example of an array for now, because gcc treatment of the destructor is inconsistent and depends on the length of the array. Clang on the other hand is able to devirtualize the destructor in the array no matter the length of the array.

Virtual call

https://godbolt.org/z/f33Gh5EGM

Devirtualized call

https://godbolt.org/z/jPz3Ka1cd

We know it is devirtualized because the destructor of the derived object is not called and the compiler assumes that the items in the array are all Base objects.

So currently, gcc does perform this kind of "devirtualization" (I wouldn't really call it a devirtualization for an array, because it is similar to declaring Derived d; on the stack), but it is dependent on array length, and I think we should make it devirtualized for all length of array.

I changed my testing to dump tree optimized like below (let me know if there's issues with the example because I am not familiar with hacking gcc).

/* { dg-do run } */
/* Virtual calls should be devirtualized because we know dynamic type of object in vector at compile time */
/* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */

#include <vector>
#include <iostream>
#include <memory>

using std::vector;

class A
{
public:
    virtual ~A()
    {
    }
};

class B : public A
{
public:
    virtual ~B()
    {
    }
};

int main()
{
    B b[3];
    return 0;
}

/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */

Comment 4 Ng YongXiang 2023-06-01 09:37:40 UTC

Would anyone be able to direct me to which portion of the code is responsible for this threshold between len 2 & 3 array? Is this the responsibility of the c++ frontend? or is it still related to the optimizer.

Comment 5 Andrew Pinski 2023-06-01 13:37:39 UTC

(In reply to Ng YongXiang from comment #4)
> Would anyone be able to direct me to which portion of the code is
> responsible for this threshold between len 2 & 3 array? Is this the
> responsibility of the c++ frontend? or is it still related to the optimizer.

The front-end has nothing to do with the difference between 2 and 3. The difference between 2 and 3 is unrolling of the loop in the optimizers. See bug 109770 and bug 80899 for other cases of inplacement new causing issues.

Comment 6 Ng YongXiang 2023-06-02 03:43:58 UTC

That is interesting. Thanks for the reply.

However, I'd argue that the 2 bugs mentioned are different from what I am proposing. The 2 bugs linked access virtual functions via ptr (delete p; val->f();) and are invoked by the user directly, and so I think it makes sense for the compiler to access the vtable. 

However, the array issue I'm putting forth is caused by the destruction of the array with automatic storage duration, and the destruction code is generated by the compiler not by the user via some kind of ptr or reference.

Moreover, clang does it right and devirtualizes the destruction of the array while gcc still doesn't.

https://godbolt.org/z/f33Gh5EGM

#include <iostream>
#include <memory>

struct A
{
    A()
    {
        std::cout << "A()" << std::endl;
    }
    virtual ~A() 
    {
        std::cout << "~A()" << std::endl;
    }
};

struct B : public  A
{
    B()
    {
        std::cout << "B()" << std::endl;
    }
    virtual ~B()
    {
        std::cout << "~B()" << std::endl;
    }
};

int main()
{
    A a[3];
    a[0].~A();
    :: new(std::addressof(a[0])) B();
}

Comment 7 Ng YongXiang 2023-06-05 03:11:15 UTC

Created attachment 55256 [details]
Arrays should do non virtual lookup when destructing individual elements

This is a draft patch outlining a possible solution to call non virtual destructor of elements when arrays are destructed (clang already does this, see https://godbolt.org/z/f33Gh5EGM).

Comment 8 Ng YongXiang 2023-06-05 03:13:04 UTC

Just added a patch to illustrate the array destruction issue. What do you think?

Comment 9 Ng YongXiang 2023-07-03 01:09:06 UTC

Would anyone be willing to provide some feedback regarding the attachment (https://gcc.gnu.org/bugzilla/attachment.cgi?id=55256&action=diff) that I have created? Thanks.

Comment 10 GCC Commits 2023-07-28 15:40:50 UTC

The trunk branch has been updated by Jason Merrill <jason@gcc.gnu.org>:

https://gcc.gnu.org/g:a47e615fbf9c6f4b24e5032df5d720b6bf9b63b5

commit r14-2853-ga47e615fbf9c6f4b24e5032df5d720b6bf9b63b5
Author: Ng YongXiang <yongxiangng@gmail.com>
Date:   Thu Jul 27 08:06:14 2023 +0800

    c++: devirtualization of array destruction [PR110057]
    
            PR c++/110057
            PR ipa/83054
    
    gcc/cp/ChangeLog:
    
            * init.cc (build_vec_delete_1): Devirtualize array destruction.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/warn/pr83054.C: Remove devirtualization warning.
            * g++.dg/lto/pr89335_0.C: Likewise.
            * g++.dg/tree-ssa/devirt-array-destructor-1.C: New test.
            * g++.dg/tree-ssa/devirt-array-destructor-2.C: New test.
            * g++.dg/warn/pr83054-2.C: New test.
    
    Signed-off-by: Ng Yong Xiang <yongxiangng@gmail.com>

Comment 11 user202729 2024-05-27 09:29:27 UTC

One way this could be implemented is the following.

* implement built-in `__builtin_vptr(object)` and `__builtin_vptr(Type)` that obtains the pointer to the vtable (or NULL if the object type has no virtual method)
* in the implementation `vector.pop_back()` etc., add the the following:

    if (__builtin_vptr(this->_M_impl._M_finish) != __builtin_vptr(_Tp))
      __builtin_unreachable();

* the optimizer should be able to figure out the rest.

What do you think about this approach?

--------

Alternatively, we can also use the existing `typeid()` --- although the optimizer is not smart enough to optimize the following code yet.

```
#include<cstdio>
#include<typeinfo>
struct Base{
    Base() {}
    virtual void f(){ std::printf("Base\n"); }
};
struct Derived final: Base{
    Derived() {}
    virtual void f(){ std::printf("Derived\n"); }
};
void f(Base* b){
    if(!b or typeid(*b)!=typeid(Base)) __builtin_unreachable();
    b->f();
}
```

Comment 12 Jason Merrill 2024-07-11 15:14:23 UTC

(In reply to user202729 from comment #11)
>     if (__builtin_vptr(this->_M_impl._M_finish) != __builtin_vptr(_Tp))
>       __builtin_unreachable();

This is more or less what the existing speculative devirtualization code (ipa-devirt.cc) does.  It might be nice to be able to force speculative devirtualization in some circumstances, perhaps in the destroy_at function.

But for vector specifically, and probably other containers as well, we know specifically the type that we're destroying, so the library should use a scoped destructor call (i.e. ptr->_Tp::~_Tp()) to force a non-virtual call.

So the vector situation seems to me like a library issue.

Comment 13 Jonathan Wakely 2024-07-11 15:41:16 UTC

That would only be valid for std::allocator and other allocators where std::allocator_traits<A>::destroy uses std::destroy_at.

As it happens, I've just refactored std::allocator_traits today, which makes it possible to test for exactly that condition.

Comment 14 Jonathan Wakely 2024-07-11 15:56:12 UTC

So something like this, and then use it in containers instead of _Alloc_traits::destroy

  template<typename _Tp, typename _Allocator>
    _GLIBCXX20_CONSTEXPR
    void
    _Destroy_static_type(_Tp* __p, _Allocator& __alloc)
    {
#if __cplusplus >= 201103L
      if constexpr (__allocator_traits_base::__has_destroy<_Allocator, _Tp>)
	allocator_traits<_Allocator>::destroy(__alloc, __p);
      else
#endif
	__p->_Tp::~_Tp();
    }

Comment 15 Jason Merrill 2024-07-11 16:10:30 UTC

(In reply to Jonathan Wakely from comment #14)

LGTM.

Comment 16 user202729 2024-07-15 03:19:16 UTC

(In reply to Jonathan Wakely from comment #14)
> So something like this, and then use it in containers instead of
> _Alloc_traits::destroy
> 
>   template<typename _Tp, typename _Allocator>
>     _GLIBCXX20_CONSTEXPR
>     void
>     _Destroy_static_type(_Tp* __p, _Allocator& __alloc)
>     {
> #if __cplusplus >= 201103L
>       if constexpr (__allocator_traits_base::__has_destroy<_Allocator, _Tp>)
> 	allocator_traits<_Allocator>::destroy(__alloc, __p);
>       else
> #endif
> 	__p->_Tp::~_Tp();
>     }

That sound like a good idea, thanks. I thought about this some time earlier (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413#c3) but I did not know how to check the implementation of `destroy()`.

Nevertheless, this does not handle the case of devirtualization in vector access (e.g. `v[0].f()` where v is a vector and method f is virtual). What do you think?

Apart from that, being do you think it is common for the user to override `destroy()` (in a template specialization) but still call the destructor inside it? In that case the proposal would not optimize that case.

Comment 17 Jonathan Wakely 2024-07-15 09:55:20 UTC

(In reply to user202729 from comment #16)
> That sound like a good idea, thanks. I thought about this some time earlier
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413#c3) but I did not know
> how to check the implementation of `destroy()`.

You only need to know if Alloc::destroy exists or not.

> Nevertheless, this does not handle the case of devirtualization in vector
> access (e.g. `v[0].f()` where v is a vector and method f is virtual). What
> do you think?

I don't think that optimization would be valid. Users could do disgusting things like this (as long as sizeof(Base) == sizeof(Derived)):

std::vector<Base> v(1);
v[0].~Base();
::new((void*)v.data()) Derived();
v[0].f();
v[0].~Derived();
::new((void*)v.data()) Base();

I'm not even 100% convinced that the destructor optimization is valid, but I think any time there is pointer arithmetic on the vector's storage, the dynamic types of objects in the array must match the static types. Arguably the v[0] access above already requires that, so maybe the code above isn't valid, but I don't think it's entirely clear from the standard.

This certainly seems to be allowed:

std::vector<Base> v(1);
Base* p = v.data();
p->~Base();
::new((void*)p) Derived();
p->f();
p->~Derived();
::new((void*)p->) Base();

If we can be sure that v[0] requires the static and dynamic types to match, then we could do something like:

      reference
      operator[](size_type __n) _GLIBCXX_NOEXCEPT
      {
	__glibcxx_requires_subscript(__n);
#ifdef __cpp_rtti
        if (typeid(this->_M_impl._M_start[__n]) != typeid(value_type))
          __builtin_unreachable();
#endif
	return *(this->_M_impl._M_start + __n);
      }

I have no idea whether that would help anything, or if the use of typeid would actually pessimize things.

> Apart from that, being do you think it is common for the user to override
> `destroy()` (in a template specialization) but still call the destructor
> inside it? In that case the proposal would not optimize that case.

I don't really care about that, we can't handle every case.

Comment 18 user202729 2024-07-15 10:33:05 UTC

(In reply to Jonathan Wakely from comment #17)
> I don't think that optimization would be valid. Users could do disgusting
> things like this (as long as sizeof(Base) == sizeof(Derived)):
> 
> std::vector<Base> v(1);
> v[0].~Base();
> ::new((void*)v.data()) Derived();
> v[0].f();
> v[0].~Derived();
> ::new((void*)v.data()) Base();

According to "n.m." on StackOverflow, that code is illegal https://stackoverflow.com/questions/76380436/is-placement-new-of-derived-type-within-a-vector-array-of-base-type-legal#comment134688951_76380436

So I guess we're safe if the comment is correct. (Same for the vector destructor case)

> This certainly seems to be allowed:
> 
> std::vector<Base> v(1);
> Base* p = v.data();
> p->~Base();
> ::new((void*)p) Derived();
> p->f();
> p->~Derived();
> ::new((void*)p->) Base();

The clause seems to apply just as well, you cannot access the newly constructed `Derived` object through `p`.

Further reading: https://stackoverflow.com/questions/62642542/were-all-implementations-of-stdvector-non-portable-before-stdlaunder

> If we can be sure that v[0] requires the static and dynamic types to match,
> then we could do something like:
> 
>       reference
>       operator[](size_type __n) _GLIBCXX_NOEXCEPT
>       {
> 	__glibcxx_requires_subscript(__n);
> #ifdef __cpp_rtti
>         if (typeid(this->_M_impl._M_start[__n]) != typeid(value_type))
>           __builtin_unreachable();
> #endif
> 	return *(this->_M_impl._M_start + __n);
>       }
> 
> I have no idea whether that would help anything, or if the use of typeid
> would actually pessimize things.

It turns out that the compiler can currently prove (with optimization enabled) that `typeid::operator==` is pure, so it wouldn't pessimize things. If you want to be extra careful I suppose `#if __OPTIMIZE__` is an option.

Comment 19 Jonathan Wakely 2024-07-15 10:50:22 UTC

(In reply to user202729 from comment #18)
> (In reply to Jonathan Wakely from comment #17)
> The clause seems to apply just as well, you cannot access the newly
> constructed `Derived` object through `p`.

That's easily solved by accessing the new object through the pointer returned by the new expression:

std::vector<Base> v(1);
Base* p = v.data();
p->~Base();
p = ::new((void*)p) Derived();
p->f();
p->~Base();
::new((void*)p) Base();

By the time anything is accessed through the vector's internal pointers, an object of the original type has been restored at that location, which meets all the requirements of [basic.life] p7.

But that means v[0].f() wouldn't be valid, only p->f(), so adding information about the dynamic type to vector::operator[] would be allowed.

Comment 20 user202729 2024-07-18 10:21:51 UTC

(In reply to Jonathan Wakely from comment #19)
> That's easily solved by accessing the new object through the pointer
> returned by the new expression:
> 
> std::vector<Base> v(1);
> Base* p = v.data();
> p->~Base();
> p = ::new((void*)p) Derived();
> p->f();
> p->~Base();
> ::new((void*)p) Base();
> 
> By the time anything is accessed through the vector's internal pointers, an
> object of the original type has been restored at that location, which meets
> all the requirements of [basic.life] p7.


Trying codes related to it, do you think the following is valid C++?

void k(){
    Base b;
    b.~Base();
    Base* p=new (&b) Derived;
    p->~Base(); // this cannot be devirtualized to Base::~Base()
    new (p) Base;
}

Currently gcc devirtualize the commented function to Base::~Base() anyway. https://godbolt.org/z/bsnEdxMeK

Comment 21 Jonathan Wakely 2024-07-19 17:13:30 UTC

Yes, I think that's valid, so shouldn't be devirtualized.

Comment 22 GCC Commits 2024-07-31 15:10:19 UTC

The master branch has been updated by Sam James <sjames@gcc.gnu.org>:

https://gcc.gnu.org/g:21fc6d35f27cdf4c56e9f093894f6728c55ecb0e

commit r15-2440-g21fc6d35f27cdf4c56e9f093894f6728c55ecb0e
Author: Sam James <sam@gentoo.org>
Date:   Tue Jul 30 21:44:12 2024 +0100

    testsuite: fix 'dg-do-compile' typos
    
    We want 'dg-do compile', not 'dg-do-compile'. Fix that.
    
            PR target/69194
            PR c++/92024
            PR c++/110057
            * c-c++-common/Wshadow-1.c: Fix 'dg-do compile' typo.
            * g++.dg/tree-ssa/devirt-array-destructor-1.C: Likewise.
            * g++.dg/tree-ssa/devirt-array-destructor-2.C: Likewise.
            * gcc.target/arm/pr69194.c: Likewise.

Comment 23 Jonathan Wakely 2024-10-02 11:15:21 UTC

(In reply to Jonathan Wakely from comment #14)
> So something like this, and then use it in containers instead of
> _Alloc_traits::destroy
> 
>   template<typename _Tp, typename _Allocator>
>     _GLIBCXX20_CONSTEXPR
>     void
>     _Destroy_static_type(_Tp* __p, _Allocator& __alloc)
>     {
> #if __cplusplus >= 201103L
>       if constexpr (__allocator_traits_base::__has_destroy<_Allocator, _Tp>)
> 	allocator_traits<_Allocator>::destroy(__alloc, __p);
>       else
> #endif
> 	__p->_Tp::~_Tp();
>     }


Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664288.html