Bug 94892 - (x >> 31) + 1 not getting narrowed to compare
Summary: (x >> 31) + 1 not getting narrowed to compare
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 19987
  Show dependency treegraph
 
Reported: 2020-04-30 17:25 UTC by Gabriel Ravier
Modified: 2023-08-24 21:46 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-15 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriel Ravier 2020-04-30 17:25:03 UTC
inline int sign(int x)
{
    return (x >> 31) | ((unsigned)-x >> 31);
}

bool f(int x)
{
    return sign(x) > -1;
}

With -O3, LLVM produces this :

f(int):
  test edi, edi
  setns al
  ret

GCC produces this :

f(int):
  sar edi, 31
  lea eax, [rdi+1]
  ret

Changing `f` to `(x >> 31) + 1` results in it being optimized optimally
Comment 1 Andrew Pinski 2020-04-30 18:34:40 UTC
GCC's code gen is actually better than LLVM's here I think.
Comment 2 Gabriel Ravier 2020-04-30 19:32:50 UTC
In that case, then, GCC is generating sub-optimal code for `(x >> 31) + 1` alone since it optimises that to the same thing as LLVM
Comment 3 Harald van Dijk 2020-05-01 19:28:00 UTC
Changing the test slightly to

  inline int sign(int x)
  {
    return (x >> 31) | ((unsigned)-x >> 31);
  }

  void f1(void);
  void f2(void);
  void f(int x)
  {
    if (sign(x) > -1)
      f1();
    else
      f2();
  }

shows at -O3 with LLVM:

  f:
    test    edi, edi
    js      .LBB0_2
    jmp     f1
  .LBB0_2:
    jmp     f2

whereas GCC produces:

  f:
    mov     eax, edi
    sar     edi, 31
    neg     eax
    shr     eax, 31
    or      edi, eax
    cmp     edi, -1
    je      .L2
    jmp     f1
  .L2:
    jmp     f2

In that example, LLVM is doing much better.
Comment 4 Gabriel Ravier 2020-08-10 00:19:27 UTC
Harald, for the specific code you wrote, I now see this from GCC :

f(int):
  test edi, edi
  js .L2
  jmp f1()
.L2:
  jmp f2()

So for that specific code, GCC has improved, though for the original test case it has not. I must note, though, that this still means that either :
- GCC fails to properly optimize `(x >> 31) + 1`
- GCC fails to properly optimize my original test case
Comment 5 Andrew Pinski 2021-12-15 00:23:43 UTC
Confirmed.
Testcase:

  inline int sign(int x)
  {
    return (x >> 31) | ((unsigned)-x >> 31);
  }
bool f1(int x)
{
    return sign(x) > -1;
}
bool f3(int x)
{
    return (x>>31)+1;
}

  void f1(void);
  void f2(void);
  void f(int x)
  {
    if (sign(x) > -1)
      f1();
    else
      f2();
  }
  void f0(int x)
  {
    if (f3(x))
      f1();
    else
      f2();
  }
  void fn1(int x)
  {
    if (x>0)
      f1();
    else
      f2();
  }



f, f0 and fn1 should all produce the same result on the gimple level but don't.

(x >> 31) | ((unsigned)-x >> 31) is the same as x == 0 ? 0 : -(x < 0) really.
Which we should simplify into.
Though COND_EXPR is not as well handled by the rest of GCC really.
Comment 6 Andrew Pinski 2021-12-15 00:31:12 UTC
Two things we should simplify:
  _4 = _3 >> 31;
  _4 != -1

Into:
  _3 >= 0 (if _3 is signed, otherwise false)

(this will solve f0)


And:
  _4 = x_3(D) >> 31;
  _7 = -x_3(D);
  _8 = (unsigned int) _7;
  _9 = _8 >> 31;
  _10 = (int) _9;
  _11 = _4 | _10;

Into:
_4 = x_3(D) >> 31; // keep around
_t = _4 | 1;
_11 = x != 0 ? _t : 0;
Comment 7 Andrew Pinski 2023-05-28 23:15:43 UTC
(In reply to Andrew Pinski from comment #6)
> Two things we should simplify:
>   _4 = _3 >> 31;
>   _4 != -1
> 
> Into:
>   _3 >= 0 (if _3 is signed, otherwise false)
> 
> (this will solve f0)

See bug 85234 comment #5 on handle that one (g and g2).