Bug 101179 - y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
Summary: y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 101251 101252 108446 (view as bug list)
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2021-06-23 11:05 UTC by Jonathan Wakely
Modified: 2023-05-06 21:52 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-06-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Wakely 2021-06-23 11:05:32 UTC
These produces different assembly:

int f1(int y)
{
    const bool x = y % 100 == 0;
    return y % (x ? 16 : 4) == 0;
}

int f2(int y)
{
    const bool x = y % 100 == 0;
    return y % (4 << (x * 2)) == 0;
}

Since they do the same calculation, I would expect them to produce the same code.

Currently f1 produces slightly smaller code for aarch64 and x86_64.

With Clang they produce the same code (but using cmov which might not be optimal).
Comment 1 Jonathan Wakely 2021-06-23 13:58:08 UTC
On IRC Richi said: "VRP has code to do that but maybe for some reason shifts are not handled"
Comment 2 Andrew Pinski 2021-06-23 17:51:38 UTC
(x ? 16 : 4)

to:
(4 << (x * 2))

Should be easy to add to match.pd's
/* A few simplifications of "a ? CST1 : CST2". */

And PHI-OPT will use it without you doing anything extra.
Comment 3 Jonathan Wakely 2021-06-23 17:55:34 UTC
the ?: one seems to produce better code currently though, so I'm not sure transforming it to the shift is what we want.
Comment 4 Andrew Pinski 2021-06-23 18:05:39 UTC
But here are two other functions which all should have the same code gen as the original two:
int f3(int y)
{
    const bool x = y % 100 == 0;
    return (x ? y%16 : y%4) == 0;
}

int f4(int y)
{
    const bool x = y % 100 == 0;
    return (x ? (y%16) == 0 : (y%4) == 0);
}

Only the last one produces the best code.
Comment 5 Andrew Pinski 2021-06-23 20:10:07 UTC
(In reply to Andrew Pinski from comment #4) 
> Only the last one produces the best code.

So for clang, f1-f3 produces the same code but f4 is bad.
It was only fixed in clang 10.
Comment 6 Richard Biener 2021-06-24 06:40:24 UTC
(In reply to Jonathan Wakely from comment #1)
> On IRC Richi said: "VRP has code to do that but maybe for some reason shifts
> are not handled"

Specifically simplify_using_ranges::simplify has

      /* Convert:
         LHS = CST BINOP VAR
         Where VAR is two-valued and LHS is used in GIMPLE_COND only
         To:
         LHS = VAR == VAL1 ? (CST BINOP VAL1) : (CST BINOP VAL2)

         Also handles:
         LHS = VAR BINOP CST
         Where VAR is two-valued and LHS is used in GIMPLE_COND only
         To:
         LHS = VAR == VAL1 ? (VAL1 BINOP CST) : (VAL2 BINOP CST) */

restrictions that stand in the way:

      if (TREE_CODE_CLASS (rhs_code) == tcc_binary
...
          && single_imm_use (lhs, &use_p, &use_stmt)
          && gimple_code (use_stmt) == GIMPLE_COND)

VRP1 sees

  <bb 2> [local count: 1073741824]:
  _1 = y_7(D) % 100;
  x_8 = _1 == 0;
  _2 = (int) x_8;
  _3 = _2 * 2;
  _4 = 4 << _3;
  _5 = y_7(D) % _4;
  _6 = _5 == 0; 
  _9 = (int) _6;
  return _9; 

so it would consider _2 * 2 but its single use is in a shift, the
condition is after another op, the modulo, and there the condition
is in an assignment, not in a GIMPLE_COND.
Comment 7 Andrew Pinski 2021-07-01 20:43:07 UTC
*** Bug 101252 has been marked as a duplicate of this bug. ***
Comment 8 Andrew Pinski 2021-07-01 20:43:15 UTC
*** Bug 101251 has been marked as a duplicate of this bug. ***
Comment 9 Andrew Pinski 2021-07-01 20:49:34 UTC
Here is one more function which should be the same:
int f0(int y)
{
    const bool x = y % 100 == 0;
    return (y & ((4 << (x * 2)) - 1)) != 0;
}
Comment 10 Andrew Pinski 2023-01-18 16:51:42 UTC
*** Bug 108446 has been marked as a duplicate of this bug. ***
Comment 11 Andrew Pinski 2023-05-06 21:52:41 UTC
Similar to PR 71336, difference is the difference is a power of 2 there while here they are power of 2s