Bug 106138 - Inefficient code generation: logical AND of disjoint booleans from equal and bitwise AND not optimized to constant false
Summary: Inefficient code generation: logical AND of disjoint booleans from equal and ...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.1.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2022-06-29 13:28 UTC by Pavel M
Modified: 2023-06-26 05:22 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-06-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel M 2022-06-29 13:28:57 UTC
_Bool f(char x)
{
    _Bool b1 = x == 4;
    _Bool b2 = x & 0x3;
    return b1 && b2;
}

Invocation: gcc -O3

Actual generated code:
f:
        test    dil, 3
        setne   al
        cmp     dil, 4
        sete    dl
        and     eax, edx
        ret

Expected generated code:
f:
        xor     eax, eax
        ret

The Summary needs to be adjusted perhaps.
Comment 1 Richard Biener 2022-06-30 07:26:02 UTC
So we're seeing

  b1_8 = x_7(D) == 4;
  # RANGE [0, 3] NONZERO 3
  _3 = x_7(D) & 3;
  b2_9 = _3 != 0;
  _5 = b1_8 & b2_9;

and fail to optimize that.  It somewhat looks like "relations" (on x_7), but
I'm not sure if suitable to handle for ranger.

Adding something in match.pd might be possible - the key is of course that
both b1 and b2 are derived from the same value.  I think we already have
a pattern looking at not intersecting nonzero bits.
Comment 2 Jakub Jelinek 2022-06-30 07:45:09 UTC
I'd say reassoc would be a better place to handle it, so that we handle both
the logical op short circuit && and x == 4 && y == 25 && z != 16 && (x & 3) != 0.
Comment 3 Peter Cordes 2022-07-01 03:02:44 UTC
Ideally, bitwise & of booleans should also be handled, not just &&.
A testcase (https://godbolt.org/z/qvosv8q7c) makes it easy to check both.

//#define LOGIC_AND 
_Bool f2(char x)
{
    _Bool b1 = x == 2;
    _Bool b2 = x & 1;

    #ifdef LOGIC_AND
      return b1 && b2;
    #else
      return b1 & b2;
    #endif
}

(Clang optimized it to return false for the && version, but not bitwise.  GCC currently doesn't optimize either way.)
This was originally posted on Stack Overflow (https://stackoverflow.com/q/72802469/224132), BTW.
Comment 4 Andrew Macleod 2022-07-04 14:47:02 UTC
(In reply to Richard Biener from comment #1)
> So we're seeing
> 
>   b1_8 = x_7(D) == 4;
>   # RANGE [0, 3] NONZERO 3
>   _3 = x_7(D) & 3;
>   b2_9 = _3 != 0;
>   _5 = b1_8 & b2_9;
> 
> and fail to optimize that.  It somewhat looks like "relations" (on x_7), but
> I'm not sure if suitable to handle for ranger.
> 
> Adding something in match.pd might be possible - the key is of course that
> both b1 and b2 are derived from the same value.  I think we already have
> a pattern looking at not intersecting nonzero bits.

Its not suitable yet anyway. theres an order of indirection in that any relation is based on whether b1_8 and b2_9 are both 1 or both 0...  If there was  a branch we start to pick up information on the edges...  but without an edge using _5, we don't yet try to paw back the same way.

perhaps we could optimize &/&& and |/|| to check if one of the operands doesnt matter..

ie, if b1_8 is 1, then check if b2_9 is always 1 with the range of x_7([4,4]),  likewise if b1_8 is 0, then b2_9 doesnt matter.  so we can simplify this to _5 = b1_8

it seems like something we could do with simplify_using_ranges quite easily by utilizing gori perhaps?  easy enough to pick up a range for b2_9 based on the range of x_7 from b1_8 == [1,1]
Comment 5 Andrew Pinski 2022-07-07 20:39:51 UTC
(In reply to Jakub Jelinek from comment #2)
> I'd say reassoc would be a better place to handle it, so that we handle both
> the logical op short circuit && and x == 4 && y == 25 && z != 16 && (x & 3)
> != 0.

Right now reassoc calls maybe_fold_and_comparisons and maybe_fold_or_comparisons which are places where this could be added. Or even match-and-simplify gets called (via those two functions).

You could add it either in combine_comparisons or match.pd.

Maybe something like (psedu code as I didn't add the wi part and it can be extended and all):
(simplify
 (bit_and
  (eq @0 INTEGER_CST@1)
  (ne (bit_and @0 INTEGER_CST@2) zero_p))
 (if ((@1 & @2) == 0))
 ({true_bool_value;}))

You can change @1 and @2 to with_possible_nonzero_bits/with_possible_nonzero_bits2 and then compare the nonzero bits to see if they overlap.
Also extend it to handle ior too.
Comment 6 Andrew Pinski 2022-07-07 21:18:22 UTC
Note for:
bool f(char x)
{
    bool b1 = x == 4;
    bool b2 = x & 4;
    return b1 && b2;
}


We should just get:
bool f(char x)
{
    return x == 4;
}

bit_ior should be handled too.
Comment 7 Pavel M 2022-07-08 09:47:12 UTC
May be useful: https://devblogs.microsoft.com/cppblog/new-code-optimizer. Search for "Bit Estimator" section containing "Folding comparisons and branches".
Comment 8 Andrew Pinski 2022-07-08 19:35:30 UTC
(In reply to Pavel M from comment #7)
> May be useful: https://devblogs.microsoft.com/cppblog/new-code-optimizer.
> Search for "Bit Estimator" section containing "Folding comparisons and
> branches".

All of the infrastructure for the above referenced is already done really. This is what with_possible_nonzero_bits/get_nonzero_bits is used for inside match.pd. My original example match.pd used a constant but does not need to.

In fact GCC already handles:
bool f(int x)
{
    if (x == 4)
      if (x & 0x3)
        return true;
    return false;
}
It is the & case where GCC does not handle (&& in the original code is converted into & early on as both sides do NOT have side effects and such).