Bug 85605 - Potentially missing optimization under x64 and ARM: seemingly unnecessary branch in codegen
Summary: Potentially missing optimization under x64 and ARM: seemingly unnecessary bra...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-05-02 09:13 UTC by sergey.ignatchenko
Modified: 2018-05-02 16:35 UTC (History)
0 users

See Also:
Host:
Target: x86_64-*-*, i?86-*-*, arm, aarch64*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sergey.ignatchenko 2018-05-02 09:13:54 UTC
Code:

==========

#include <stdint.h>
#include <type_traits>

template<class T,class T2>
inline bool cmp(T a, T2 b) {
  return a<0 ? true : T2(a) < b;
}

template<class T,class T2>
inline bool cmp2(T a, T2 b) {
  return (a<0) | (T2(a) < b);
}

bool f(int a, int b) {
    return cmp(int64_t(a), unsigned(b));
}

bool f2(int a, int b) {
    return cmp2(int64_t(a), unsigned(b));
}

====

Functions cmp and cmp2 seem to be equivalent (at least under "as if" rule, as side effects of reading and casting are non-observable). However, under GCC/x64, cmp() generates code with branch, while seemingly-equivalent cmp2() - manages to do without branching:

===============

f(int, int):
  testl %edi, %edi
  movl $1, %eax
  js .L1
  cmpl %edi, %esi
  seta %al
.L1:
  rep ret

f2(int, int):
  movl %edi, %edx
  shrl $31, %edx
  cmpl %edi, %esi
  seta %al
  orl %edx, %eax
  ret

===============

And f2() is expected to be significantly faster than f1() in most usage scenarios (*NB: if you feel it is necessary to create a case to illustrate detriment of branching - please LMK, but hopefully it is quite obvious*). 

Per Godbolt, similar behavior is observed under both GCC/x64, and GCC/ARM; however, Clang manages to do without branching both for f1() and f2(). 

*Godbolt link*: https://godbolt.org/g/ktovvP
Comment 1 sergey.ignatchenko 2018-05-02 09:32:54 UTC
Command line switches (see also Godbolt link above): -O3 -fomit-frame-pointer
Comment 2 Andrew Pinski 2018-05-02 16:35:39 UTC
There might be a duplicate of this bug already but:
CMP1 ? true : CMP2;

Can be transformed into:
CMP1 | CMP2

This needs PHIOPT to do the optimization, either via match or manually.