Bug 86912

Summary: Function pointer imposes an optimization barrier
Product: gcc Reporter: Antony Polukhin <antoshkka>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: antoshkka, dimhen, falemagn, mail, msebor
Priority: P3 Keywords: missed-optimization
Version: 9.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80603
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93411
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78113
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2020-01-24 00:00:00

Description Antony Polukhin 2018-08-10 12:42:44 UTC
Consider the following minimized example of std::variant visitation:

struct A {};
struct B : A {};
struct C : A {};
struct D : A {};
struct E : A {};
struct X : A {};

template <class T>
struct helper {
    static A& call(void* value) {
        return *static_cast<T*>(value);
    }
};

A& get_base(int index, void* value) {
    using f_ptr = A&(*)(void*);
    constexpr f_ptr visitors[] = {
        helper<A>::call, helper<B>::call, helper<C>::call,
        helper<D>::call, helper<E>::call, helper<X>::call,
    };
    return visitors[index](value);
}


For the above code GCC generates very suboptimal assembly that fills an array with pointers and does the jmp:
<...>
get_base(int, void*):
  mov QWORD PTR [rsp-56], OFFSET FLAT:(anonymous namespace)::helper<A>::call(void*)
  movsx rax, edi
  mov rdi, rsi
  mov QWORD PTR [rsp-48], OFFSET FLAT:(anonymous namespace)::helper<B>::call(void*)
  mov QWORD PTR [rsp-40], OFFSET FLAT:(anonymous namespace)::helper<C>::call(void*)
  mov QWORD PTR [rsp-32], OFFSET FLAT:(anonymous namespace)::helper<D>::call(void*)
  mov QWORD PTR [rsp-24], OFFSET FLAT:(anonymous namespace)::helper<E>::call(void*)
  mov QWORD PTR [rsp-16], OFFSET FLAT:(anonymous namespace)::helper<X>::call(void*)
  mov rax, QWORD PTR [rsp-56+rax*8]
  jmp rax


Optimal assembly should be the following:
get_base(int, void*):
  mov rax, rsi
  ret


If we rewrite `get_base` to avoid calls by function pointer, the assembly becomes optimal:
A& get_base(int index, void* value) {
    if (index == 0) return helper<A>::call(value);
    if (index == 1) return helper<B>::call(value);
    if (index == 2) return helper<C>::call(value);
    if (index == 3) return helper<D>::call(value);
    if (index == 4) return helper<E>::call(value);
    if (index == 5) return helper<X>::call(value);
}
Comment 1 Nikita Kniazev 2019-03-24 13:14:01 UTC
Making `visitors` a `static constexpr` turns the output to almost the same as Clang produces for this code (except an unnecessary mov).
Comment 2 Kim Nilsson 2020-01-24 09:24:54 UTC
Some of our projects make heavy use of std::variant and we are beginning to notice a discrepancy in performance between GCC and Clang, specifically due to the lack of this optimization. Is anyone going to take a look at this, or should we take the time to upload a patch somewhere?

Thank you
Comment 3 Martin Sebor 2020-01-24 10:20:25 UTC
I'll confirm this but I'm not sure if the problem is substantively different than something as simple as the following C program where GCC doesn't fold the array reference to zero.  Even declaring the array at file scope doesn't help.  (I'm pretty sure there is a report of this optimization opportunity somewhere in Bugzilla.)

$ cat u.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout u.c
int f (int i)
{
  const int a[] = { 0, 0, 0 };
  return a[i];   // should be folded to return 0;
}

;; Function f (f, funcdef_no=0, decl_uid=1930, cgraph_uid=1, symbol_order=0)

f (int i)
{
  const int a[3];
  int _6;

  <bb 2> [local count: 1073741824]:
  MEM <unsigned long> [(int *)&a] = 0;
  a[2] = 0;
  _6 = a[i_5(D)];
  a ={v} {CLOBBER};
  return _6;

}
Comment 4 Martin Sebor 2020-01-28 16:54:12 UTC
See also pr80603 and its duplicate pr93411 that I raised for the missing optimization in comment #3.