Failure to optimize?
☂Josh Chia (謝任中)
joshchia@gmail.com
Tue Jan 12 13:32:20 GMT 2021
I have a code snippet that I'm wondering why GCC didn't optimize the way I
think it should:
https://godbolt.org/z/1qKvax
bar2() is a variant of bar1() that has been manually tweaked to avoid
branches. I haven't done any benchmarks but, I would expect the branchless
bar2() to perform better than bar1() but GCC does not automatically
optimize bar1() to be like bar2(); the generated code for bar1() and bar2()
are different and the generated code for bar1() contains a branch.
I'm generally trying to get an idea of how smart GCC optimization is and
how much hand-holding I should provide, so could someone help me understand
why GCC didn't generate the same branchless code for bar1() and bar2()? Or,
perhaps avoiding branches here doesn't actually help performance?
Josh
*SOURCE*
char const* foo();
int cursor = 0;
char const* bar1() {
char const* result = foo();
if (result)
++cursor;
return result;
}
char const* bar2() {
char const* result = foo();
cursor += !!result;
return result;
}
*GENERATED CODE*
bar1():
sub rsp, 8
call foo()
test rax, rax
je .L1
add DWORD PTR cursor[rip], 1
.L1:
add rsp, 8
ret
bar2():
sub rsp, 8
call foo()
cmp rax, 1
sbb DWORD PTR cursor[rip], -1
add rsp, 8
ret
cursor:
.zero 4
More information about the Gcc-help
mailing list