[Bug rtl-optimization/85605] New: Potentially missing optimization under x64 and ARM: seemingly unnecessary branch in codegen
sergey.ignatchenko at ithare dot com
gcc-bugzilla@gcc.gnu.org
Wed May 2 09:14:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85605
Bug ID: 85605
Summary: Potentially missing optimization under x64 and ARM:
seemingly unnecessary branch in codegen
Product: gcc
Version: 7.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: sergey.ignatchenko at ithare dot com
Target Milestone: ---
Code:
==========
#include <stdint.h>
#include <type_traits>
template<class T,class T2>
inline bool cmp(T a, T2 b) {
return a<0 ? true : T2(a) < b;
}
template<class T,class T2>
inline bool cmp2(T a, T2 b) {
return (a<0) | (T2(a) < b);
}
bool f(int a, int b) {
return cmp(int64_t(a), unsigned(b));
}
bool f2(int a, int b) {
return cmp2(int64_t(a), unsigned(b));
}
====
Functions cmp and cmp2 seem to be equivalent (at least under "as if" rule, as
side effects of reading and casting are non-observable). However, under
GCC/x64, cmp() generates code with branch, while seemingly-equivalent cmp2() -
manages to do without branching:
===============
f(int, int):
testl %edi, %edi
movl $1, %eax
js .L1
cmpl %edi, %esi
seta %al
.L1:
rep ret
f2(int, int):
movl %edi, %edx
shrl $31, %edx
cmpl %edi, %esi
seta %al
orl %edx, %eax
ret
===============
And f2() is expected to be significantly faster than f1() in most usage
scenarios (*NB: if you feel it is necessary to create a case to illustrate
detriment of branching - please LMK, but hopefully it is quite obvious*).
Per Godbolt, similar behavior is observed under both GCC/x64, and GCC/ARM;
however, Clang manages to do without branching both for f1() and f2().
*Godbolt link*: https://godbolt.org/g/ktovvP
More information about the Gcc-bugs
mailing list