Bug 115413 - Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->f();" structure
Summary: Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2024-06-10 09:18 UTC by user202729
Modified: 2024-07-11 10:14 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Code to reproduce the example where gcc devirtualizes the call. (723 bytes, application/zip)
2024-06-10 09:18 UTC, user202729
Details

Note You need to log in before you can comment on or make changes to this bug.
Description user202729 2024-06-10 09:18:46 UTC
Created attachment 58397 [details]
Code to reproduce the example where gcc devirtualizes the call.

As mentioned in the title.

I think code of the form

    if(typeid(*a)==typeid(A)) a->f();

where `f` is a virtual function in `A` can be optimized to eliminate the virtualization.

(Of course in practice it won't gain much performance because `type_info` comparison involves a string comparison which is slow, but if the `else` branch is `__builtin_unreachable()` it could potentially result in a performance gain.)

As pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115410 , however, the address of `a->f` may not be the same as `A::f` if `a` comes from a shared library, however gcc already does that optimization in some cases e.g.

    A a = create_object();
    a.f();

where `create_object` is loaded with `dlopen()`, then the address of `a.f` will actually be different from `A::f`, nevertheless gcc optimizes it to call `A::f` anyway. Because of ODR, the behavior must be identical.

Example: in the linked file, compile with

    g++ -fPIC -shared -o libshared.so shared_lib.cpp -Wno-pmf-conversions
    g++ test4.cpp -ldl -Wno-pmf-conversions
    ./a.out


then the 4 addresses will not be identical. (Note that the program technically invokes undefined behavior by referencing the address of `(void*)(&MyClass::hello)` which I believe violate ODR; nevertheless, it just serves to show which copy of `hello()` actually gets called.)
Comment 1 user202729 2024-06-11 08:17:52 UTC
In practice, one way to implement this is to rewrite all instances of

    typeid(a) == typeid(b)

into roughly the following (assuming the virtual pointer table has exactly 1 pointer to function)

    typeid(a) == typeid(b) && (a._vptr[0] == b._vptr[0] ? true : __builtin_unreachable())

I think this would work fine, except if the user wants to use the gcc extension https://gcc.gnu.org/onlinedocs/gcc/Bound-member-functions.html to print out the value of the function pointer.

On that note, would it be desirable to devirtualize call that explicitly extracts the function pointer?

    #include<typeinfo>
    
    struct A{
        virtual void f(int);
    };
    
    void f(A& b){
        if (typeid(b) == typeid(A))
            ((void (*)(A*, int)) (b.*&A::f)) (&b, 1); // devirtualize this?
    }

It is feasible someone would want to optimize this (because after all, the original purpose of the gcc extension is to manually optimize the code); however, it would be extremely confusing if the user were to e.g. print out the value of the function pointer before calling it in order to e.g. set a breakpoint at the address in gdb.
Comment 2 Jason Merrill 2024-06-25 02:04:58 UTC
If you're going to write code like this, why not

if(typeid(*a)==typeid(A)) a->A::f();

to force the non-virtual call?
Comment 3 user202729 2024-06-25 02:32:19 UTC
(In reply to Jason Merrill from comment #2)
> If you're going to write code like this, why not
> 
> if(typeid(*a)==typeid(A)) a->A::f();
> 
> to force the non-virtual call?

The practical reason is that that was inspired from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . The knowledge of the object's runtime type and the virtual function call are at different levels, and only the optimizer can inline the function.

I can't think of any better way to address the issue, and I don't think the optimization generates incorrect code anyway.

An idea I had was:

--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1325,7 +1325,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
       {
         __glibcxx_requires_nonempty();
         --this->_M_impl._M_finish;
-        _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+
+        if (std::is_same<std::allocator<_Tp>, _Alloc>::value) {
+          this->_M_impl._M_finish->_Tp::~_Tp();
+        } else {
+          _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+        }
+
         _GLIBCXX_ASAN_ANNOTATE_SHRINK(1);
       }


but that only works for std::allocator, and it is also incorrect (the user can partially specialize std::allocator as well: https://stackoverflow.com/q/61151170)
Comment 4 user202729 2024-07-11 10:14:20 UTC
I wrote a patch in https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655504.html which tries to address this issue.

Any review (or other suggestions on how to better implement the underlying feature https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 where the vector destructor can "hint" the deallocator that there is no virtualization going on) would be appreciated.

Thank you.