Bug 85605

Summary: Potentially missing optimization under x64 and ARM: seemingly unnecessary branch in codegen
Product: gcc Reporter: sergey.ignatchenko
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal Keywords: missed-optimization
Priority: P3    
Version: 7.3.0   
Target Milestone: ---   
Host: Target: x86_64-*-*, i?86-*-*, arm, aarch64*-*-*
Build: Known to work:
Known to fail: Last reconfirmed:

Description sergey.ignatchenko 2018-05-02 09:13:54 UTC


#include <stdint.h>
#include <type_traits>

template<class T,class T2>
inline bool cmp(T a, T2 b) {
  return a<0 ? true : T2(a) < b;

template<class T,class T2>
inline bool cmp2(T a, T2 b) {
  return (a<0) | (T2(a) < b);

bool f(int a, int b) {
    return cmp(int64_t(a), unsigned(b));

bool f2(int a, int b) {
    return cmp2(int64_t(a), unsigned(b));


Functions cmp and cmp2 seem to be equivalent (at least under "as if" rule, as side effects of reading and casting are non-observable). However, under GCC/x64, cmp() generates code with branch, while seemingly-equivalent cmp2() - manages to do without branching:


f(int, int):
  testl %edi, %edi
  movl $1, %eax
  js .L1
  cmpl %edi, %esi
  seta %al
  rep ret

f2(int, int):
  movl %edi, %edx
  shrl $31, %edx
  cmpl %edi, %esi
  seta %al
  orl %edx, %eax


And f2() is expected to be significantly faster than f1() in most usage scenarios (*NB: if you feel it is necessary to create a case to illustrate detriment of branching - please LMK, but hopefully it is quite obvious*). 

Per Godbolt, similar behavior is observed under both GCC/x64, and GCC/ARM; however, Clang manages to do without branching both for f1() and f2(). 

*Godbolt link*: https://godbolt.org/g/ktovvP
Comment 1 sergey.ignatchenko 2018-05-02 09:32:54 UTC
Command line switches (see also Godbolt link above): -O3 -fomit-frame-pointer
Comment 2 Andrew Pinski 2018-05-02 16:35:39 UTC
There might be a duplicate of this bug already but:
CMP1 ? true : CMP2;

Can be transformed into:

This needs PHIOPT to do the optimization, either via match or manually.