Consider the following minimized example of std::variant visitation: struct A {}; struct B : A {}; struct C : A {}; struct D : A {}; struct E : A {}; struct X : A {}; template <class T> struct helper { static A& call(void* value) { return *static_cast<T*>(value); } }; A& get_base(int index, void* value) { using f_ptr = A&(*)(void*); constexpr f_ptr visitors[] = { helper<A>::call, helper<B>::call, helper<C>::call, helper<D>::call, helper<E>::call, helper<X>::call, }; return visitors[index](value); } For the above code GCC generates very suboptimal assembly that fills an array with pointers and does the jmp: <...> get_base(int, void*): mov QWORD PTR [rsp-56], OFFSET FLAT:(anonymous namespace)::helper<A>::call(void*) movsx rax, edi mov rdi, rsi mov QWORD PTR [rsp-48], OFFSET FLAT:(anonymous namespace)::helper<B>::call(void*) mov QWORD PTR [rsp-40], OFFSET FLAT:(anonymous namespace)::helper<C>::call(void*) mov QWORD PTR [rsp-32], OFFSET FLAT:(anonymous namespace)::helper<D>::call(void*) mov QWORD PTR [rsp-24], OFFSET FLAT:(anonymous namespace)::helper<E>::call(void*) mov QWORD PTR [rsp-16], OFFSET FLAT:(anonymous namespace)::helper<X>::call(void*) mov rax, QWORD PTR [rsp-56+rax*8] jmp rax Optimal assembly should be the following: get_base(int, void*): mov rax, rsi ret If we rewrite `get_base` to avoid calls by function pointer, the assembly becomes optimal: A& get_base(int index, void* value) { if (index == 0) return helper<A>::call(value); if (index == 1) return helper<B>::call(value); if (index == 2) return helper<C>::call(value); if (index == 3) return helper<D>::call(value); if (index == 4) return helper<E>::call(value); if (index == 5) return helper<X>::call(value); }
Making `visitors` a `static constexpr` turns the output to almost the same as Clang produces for this code (except an unnecessary mov).
Some of our projects make heavy use of std::variant and we are beginning to notice a discrepancy in performance between GCC and Clang, specifically due to the lack of this optimization. Is anyone going to take a look at this, or should we take the time to upload a patch somewhere? Thank you
I'll confirm this but I'm not sure if the problem is substantively different than something as simple as the following C program where GCC doesn't fold the array reference to zero. Even declaring the array at file scope doesn't help. (I'm pretty sure there is a report of this optimization opportunity somewhere in Bugzilla.) $ cat u.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout u.c int f (int i) { const int a[] = { 0, 0, 0 }; return a[i]; // should be folded to return 0; } ;; Function f (f, funcdef_no=0, decl_uid=1930, cgraph_uid=1, symbol_order=0) f (int i) { const int a[3]; int _6; <bb 2> [local count: 1073741824]: MEM <unsigned long> [(int *)&a] = 0; a[2] = 0; _6 = a[i_5(D)]; a ={v} {CLOBBER}; return _6; }
See also pr80603 and its duplicate pr93411 that I raised for the missing optimization in comment #3.