115413 – Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->f();" structure

Bug 115413 - Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->f();" structure

Summary: Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->...

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	15.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2024-06-10 09:18 UTC by user202729
Modified:	2024-07-11 10:14 UTC (History)
CC List:	3 users (show)

See Also:	115410 110057
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Code to reproduce the example where gcc devirtualizes the call. (723 bytes, application/zip) 2024-06-10 09:18 UTC, user202729	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description user202729 2024-06-10 09:18:46 UTC

Created attachment 58397 [details]
Code to reproduce the example where gcc devirtualizes the call.

As mentioned in the title.

I think code of the form

    if(typeid(*a)==typeid(A)) a->f();

where `f` is a virtual function in `A` can be optimized to eliminate the virtualization.

(Of course in practice it won't gain much performance because `type_info` comparison involves a string comparison which is slow, but if the `else` branch is `__builtin_unreachable()` it could potentially result in a performance gain.)

As pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115410 , however, the address of `a->f` may not be the same as `A::f` if `a` comes from a shared library, however gcc already does that optimization in some cases e.g.

    A a = create_object();
    a.f();

where `create_object` is loaded with `dlopen()`, then the address of `a.f` will actually be different from `A::f`, nevertheless gcc optimizes it to call `A::f` anyway. Because of ODR, the behavior must be identical.

Example: in the linked file, compile with

    g++ -fPIC -shared -o libshared.so shared_lib.cpp -Wno-pmf-conversions
    g++ test4.cpp -ldl -Wno-pmf-conversions
    ./a.out


then the 4 addresses will not be identical. (Note that the program technically invokes undefined behavior by referencing the address of `(void*)(&MyClass::hello)` which I believe violate ODR; nevertheless, it just serves to show which copy of `hello()` actually gets called.)

Comment 1 user202729 2024-06-11 08:17:52 UTC

In practice, one way to implement this is to rewrite all instances of

    typeid(a) == typeid(b)

into roughly the following (assuming the virtual pointer table has exactly 1 pointer to function)

    typeid(a) == typeid(b) && (a._vptr[0] == b._vptr[0] ? true : __builtin_unreachable())

I think this would work fine, except if the user wants to use the gcc extension https://gcc.gnu.org/onlinedocs/gcc/Bound-member-functions.html to print out the value of the function pointer.

On that note, would it be desirable to devirtualize call that explicitly extracts the function pointer?

    #include<typeinfo>
    
    struct A{
        virtual void f(int);
    };
    
    void f(A& b){
        if (typeid(b) == typeid(A))
            ((void (*)(A*, int)) (b.*&A::f)) (&b, 1); // devirtualize this?
    }

It is feasible someone would want to optimize this (because after all, the original purpose of the gcc extension is to manually optimize the code); however, it would be extremely confusing if the user were to e.g. print out the value of the function pointer before calling it in order to e.g. set a breakpoint at the address in gdb.

Comment 2 Jason Merrill 2024-06-25 02:04:58 UTC

If you're going to write code like this, why not

if(typeid(*a)==typeid(A)) a->A::f();

to force the non-virtual call?

Comment 3 user202729 2024-06-25 02:32:19 UTC

(In reply to Jason Merrill from comment #2)
> If you're going to write code like this, why not
> 
> if(typeid(*a)==typeid(A)) a->A::f();
> 
> to force the non-virtual call?

The practical reason is that that was inspired from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . The knowledge of the object's runtime type and the virtual function call are at different levels, and only the optimizer can inline the function.

I can't think of any better way to address the issue, and I don't think the optimization generates incorrect code anyway.

An idea I had was:

--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1325,7 +1325,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
       {
         __glibcxx_requires_nonempty();
         --this->_M_impl._M_finish;
-        _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+
+        if (std::is_same<std::allocator<_Tp>, _Alloc>::value) {
+          this->_M_impl._M_finish->_Tp::~_Tp();
+        } else {
+          _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+        }
+
         _GLIBCXX_ASAN_ANNOTATE_SHRINK(1);
       }


but that only works for std::allocator, and it is also incorrect (the user can partially specialize std::allocator as well: https://stackoverflow.com/q/61151170)

Comment 4 user202729 2024-07-11 10:14:20 UTC

I wrote a patch in https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655504.html which tries to address this issue.

Any review (or other suggestions on how to better implement the underlying feature https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 where the vector destructor can "hint" the deallocator that there is no virtualization going on) would be appreciated.

Thank you.