Failure to optimize?

Tue Jan 12 13:32:20 GMT 2021

I have a code snippet that I'm wondering why GCC didn't optimize the way I
think it should:
https://godbolt.org/z/1qKvax

bar2() is a variant of bar1() that has been manually tweaked to avoid
branches. I haven't done any benchmarks but, I would expect the branchless
bar2() to perform better than bar1() but GCC does not automatically
optimize bar1() to be like bar2(); the generated code for bar1() and bar2()
are different and the generated code for bar1() contains a branch.

I'm generally trying to get an idea of how smart GCC optimization is and
how much hand-holding I should provide, so could someone help me understand
why GCC didn't generate the same branchless code for bar1() and bar2()? Or,
perhaps avoiding branches here doesn't actually help performance?

Josh

*SOURCE*
char const* foo();

int cursor = 0;

char const* bar1() {
    char const* result = foo();
    if (result)
        ++cursor;
    return result;
}

char const* bar2() {
    char const* result = foo();
    cursor += !!result;
    return result;
}

*GENERATED CODE*
bar1():
        sub     rsp, 8
        call    foo()
        test    rax, rax
        je      .L1
        add     DWORD PTR cursor[rip], 1
.L1:
        add     rsp, 8
        ret
bar2():
        sub     rsp, 8
        call    foo()
        cmp     rax, 1
        sbb     DWORD PTR cursor[rip], -1
        add     rsp, 8
        ret
cursor:
        .zero   4