[Bug middle-end/35560] Missing CSE/PRE for memory operations involved in virtual call.

Fri Dec 30 21:46:30 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560

Witold Baryluk <witold.baryluk+gcc at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |witold.baryluk+gcc at gmail dot co
                   |                            |m

--- Comment #15 from Witold Baryluk <witold.baryluk+gcc at gmail dot com> ---
I know this is a pretty old bug, but I was exploring some assembly of gcc and
clang on godbolt, and also stumbled into same issue.

https://godbolt.org/z/qPzMhWse1

class A {
public:
    virtual int f7(int x) const;
};

int g(const A * const a, int x) {
    int r = 0;
    for (int i = 0; i < 10000; i++)
        r += a->f7(x);
    return r;
}

(same happens without loop, when just calling a->f7 multiple times)

g(A const*, int):
        push    r13
        mov     r13d, esi
        push    r12
        xor     r12d, r12d
        push    rbp
        mov     rbp, rdi
        push    rbx
        mov     ebx, 10000
        sub     rsp, 8
.L2:
        mov     rax, QWORD PTR [rbp+0]       # a vtable deref
        mov     esi, r13d
        mov     rdi, rbp
        call    [QWORD PTR [rax]]            # f7 indirect call
        add     r12d, eax
        dec     ebx
        jne     .L2

        add     rsp, 8
        pop     rbx
        pop     rbp
        mov     eax, r12d
        pop     r12
        pop     r13
        ret

I was expecting  mov     rax, QWORD PTR [rbp+0] and call    [QWORD PTR [rax]],
to be hoisted out of the loop (call converted to lea, and call register).

A bit sad.

Is there some recent work done on this optimization?

Are there at least some cases where it is valid to do CSE, or change code so it
is moved out of the loop?