Bug 94589 - Optimize (i<=>0)>0 to i>0
Summary: Optimize (i<=>0)>0 to i>0
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P3 enhancement
Target Milestone: 12.0
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
: 100322 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-04-14 06:57 UTC by Marc Glisse
Modified: 2025-01-09 12:57 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 11.1.0
Known to fail:
Last reconfirmed: 2020-04-14 00:00:00


Attachments
prototype (2.30 KB, patch)
2021-04-29 11:49 UTC, Richard Biener
Details | Diff
gcc12-pr94589-wip.patch (1.92 KB, patch)
2021-04-29 17:41 UTC, Jakub Jelinek
Details | Diff
gcc12-pr94589-wip.patch (2.84 KB, patch)
2021-04-30 17:23 UTC, Jakub Jelinek
Details | Diff
gcc12-pr94589.patch (5.78 KB, patch)
2021-05-03 17:09 UTC, Jakub Jelinek
Details | Diff
Here is my idea around the patch for prototype for doing the constant prop idea (3.33 KB, patch)
2023-09-13 18:09 UTC, Andrew Pinski
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2020-04-14 06:57:31 UTC
g++-10 -std=gu++2a -O3

#include <compare>
bool k(int i){
  auto c=i<=>0;
  return c>0;
}

  <bb 2> [local count: 1073741824]:
  if (i_1(D) != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  _2 = i_1(D) >= 0;

  <bb 4> [local count: 1073741824]:
  # prephitmp_6 = PHI <_2(3), 0(2)>
  return prephitmp_6;

For most comparisons @ we do optimize (i<=>0)@0 to just i@0, but not for > and <=. Spaceship operator<=> is very painful to use, but I expect we will end up seeing a lot of it with C++20, and comparing its result with 0 is almost the only way to use its output, so it seems important to optimize this common case.

(there is probably a very old dup, but I couldn't find it)
Comment 1 Richard Biener 2020-04-14 10:43:16 UTC
Hmm.  I would have said phiopt but then there's the missing opportunity to
handle PHIs as COND_EXPRs and simplify them via match.pd patterns.  In this
case simplify

 i_1(D) != 0 ? i_1(D) >= 0 : 0

that could be forwprops job then.  The only pecularity is when the PHI
isn't the only thing controlled by the condition, thus the control flow
not vanishing with replacing the PHI.  In this case we'd turn the code into

  <bb 2> [local count: 1073741824]:
  if (i_1(D) != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  _2 = i_1(D) >= 0;

  <bb 4> [local count: 1073741824]:
  prephitmp_6 = i_1(D) > 0;
  return prephitmp_6;

and leave the rest to DCE.  Obviously if the other two compares are not
dead afterwards this isn't going to be a good idea.

And CFG pattern matching is done by phiopt - which for the actual
simplification could resort to match.pd simplifying a COND_EXPR as well.
And yeah, I think there's a dup about PHIs and "folding".
Comment 2 Victor Khimenko 2020-06-05 15:31:23 UTC
Note that gcc looks bad even on the example from Microsoft's blog post:

https://godbolt.org/z/Jc7TcN

The fact that MSVC also looks bad on example from Microsoft is not really relevant, it's MSVC, after all.

Blogpost itself is here: https://devblogs.microsoft.com/cppblog/simplify-your-code-with-rocket-science-c20s-spaceship-operator/
Comment 3 Jonathan Wakely 2021-04-29 08:55:34 UTC
*** Bug 100322 has been marked as a duplicate of this bug. ***
Comment 4 Jonathan Wakely 2021-04-29 08:57:41 UTC
PR 100322 shows that this missed-optimization causes a regression for code using std::chrono::duration types. Since C++20 their comparisons use operator<=> and so produce much worse code than the same source compiled as C++17.
Comment 5 Jonathan Wakely 2021-04-29 09:21:22 UTC
(In reply to Marc Glisse from comment #0)
> For most comparisons @ we do optimize (i<=>0)@0 to just i@0, but not for >
> and <=. Spaceship operator<=> is very painful to use, but I expect we will
> end up seeing a lot of it with C++20, and comparing its result with 0 is
> almost the only way to use its output, so it seems important to optimize
> this common case.

Maybe even common enough to transform it in the FE.


(In reply to Victor Khimenko from comment #2)
> Note that gcc looks bad even on the example from Microsoft's blog post:
> 
> https://godbolt.org/z/Jc7TcN


Right, this doesn't only affect (i<=>0)@0 but also (i<=>j)@0 which is very common, because it's what the FE synthesizes for operator@ when the type only defines operator<=>


https://godbolt.org/z/19dM8PdaM

#include <compare> 

struct X
{
    int i = 0;
    auto operator<=>(const X&) const = default;
};

bool lt(X l, X r) { return l<r; }

struct Y
{
    int i = 0;
    bool operator<(Y rhs) const { return i < rhs.i; }
};

bool lt(Y l, Y r) { return l<r; }

Defining <=> once is easier than defining all of < > <= >= but the code is bad:

lt(X, X):
        xor     eax, eax
        cmp     edi, esi
        je      .L2
        setge   al
        lea     eax, [rax-1+rax]
.L2:
        shr     al, 7
        ret
lt(Y, Y):
        cmp     edi, esi
        setl    al
        ret

It seems like it should be possible for the FE to recognize that in this trivial case the defaulted <=> is just comparing integers, and therefore the synthesized op< could be transformed from (l.i <=> r.i) < 0 to simply l.i < r.i

The FE is not required to synthesize exactly (l.i<=>r.i) @ 0 as long as the result is correct, so l.i @ r.i would be OK (and avoids the poor codegen until the missed-optimization gets done).
Comment 6 Richard Biener 2021-04-29 10:07:29 UTC
Note the original testcase is optimized with GCC 11

  <bb 2> [local count: 1073741824]:
  _2 = i_1(D) > 0;
  return _2;

but not on the GCC 10 branch.  Not sure what fixed it there.
Comment 7 Marc Glisse 2021-04-29 10:25:16 UTC
Some key steps in the optimization:
PRE turns PHI<-1,0,1> > 0 into PHI<0,0,1>
reassoc then combines the operations (it didn't in gcc-10)
forwprop+phiopt cleans up (i>0)!=0?1:0 into just i>0.

Having to wait until phiopt4 to get the simplified form is still very long, and most likely causes missed optimizations in earlier passes. But nice progress!
Comment 8 Marc Glisse 2021-04-29 10:28:44 UTC
PR96480 would be my guess.
Comment 9 Jonathan Wakely 2021-04-29 10:33:26 UTC
(In reply to Richard Biener from comment #6)
> Not sure what fixed it there.

Seems to be r11-2593
Comment 10 Jonathan Wakely 2021-04-29 10:33:40 UTC
(In reply to Marc Glisse from comment #8)
> PR96480 would be my guess.

Yes
Comment 11 rguenther@suse.de 2021-04-29 10:39:20 UTC
On Thu, 29 Apr 2021, glisse at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589
> 
> --- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> ---
> Some key steps in the optimization:
> PRE turns PHI<-1,0,1> > 0 into PHI<0,0,1>
> reassoc then combines the operations (it didn't in gcc-10)
> forwprop+phiopt cleans up (i>0)!=0?1:0 into just i>0.
> 
> Having to wait until phiopt4 to get the simplified form is still very 
> long, and most likely causes missed optimizations in earlier passes. But 
> nice progress!

Agreed - requiring PRE (and thus -O2) is also less than optimal for
such a core feature.  But the IL we get is simply awkward ;)

ifcombine/phiopt2 see

  <bb 2> [local count: 1073741824]:
  if (i_1(D) == 0)
    goto <bb 5>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 536870913]:
  if (i_1(D) < 0)
    goto <bb 5>; [41.00%]
  else
    goto <bb 4>; [59.00%]

  <bb 4> [local count: 316753840]:

  <bb 5> [local count: 1073741824]:
  # c$_M_value_2 = PHI <0(2), -1(3), 1(4)>
  _4 = c$_M_value_2 > 0;

I guess that's what we should try to work with.  For PR99997
I have prototyped a forwprop patch to try constant folding
stmts with all-constant PHIs, thus in this case c$_M_value_2 > 0,
when there's only a single use of it (that basically does what
PRE later does but earlier and for a very small subset of cases).
Comment 12 Marc Glisse 2021-04-29 11:06:52 UTC
(In reply to rguenther@suse.de from comment #11)

> For PR99997
> I have prototyped a forwprop patch to try constant folding
> stmts with all-constant PHIs, thus in this case c$_M_value_2 > 0,
> when there's only a single use of it

Maybe we could handle any case where trying to fold the single use (counting x*x as a single use of x) with each possible value satisfies is_gimple_val (or whatever the condition is to be allowed in a PHI, and without introducing a use of a ssa_name before it is defined), so that things like PHI<X,0> & X would simplify. But the constant case is indeed the most important, and should allow the optimization in this PR before the vectorizer using reassoc1.
Comment 13 Richard Biener 2021-04-29 11:49:02 UTC
Created attachment 50707 [details]
prototype

For reference this is the prototype patch I mentioned.  I wasn't entirely happy and wanted to explore the ??? in the commit message.
Comment 14 Richard Biener 2021-04-29 12:55:26 UTC
#include <compare>
bool k(int a, int b){
  auto c = (a <=> b);
  return c>0;
}

Produces

  <bb 2> [local count: 1073741824]:
  if (a_1(D) == b_3(D))
    goto <bb 5>; [34.00%]
  else
    goto <bb 3>; [66.00%]

  <bb 3> [local count: 708669601]:
  if (a_1(D) < b_3(D))
    goto <bb 5>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 4> [local count: 354334801]:

  <bb 5> [local count: 1073741824]:
  # prephitmp_8 = PHI <0(2), 0(3), 1(4)>
  return prephitmp_8;

from PRE (or the proposed patch) where it is not matched further.  This kind
of patters could be handled by phiopt.
Comment 15 Jakub Jelinek 2021-04-29 13:29:19 UTC
C testcase (though surely we need C++ one too because with C++ there are aggregates and inline functions involved):

int f1 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == 0; }
int f2 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a < 0; }
int f3 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a > 0; }
int f4 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != 0; }
int f5 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a <= 0; }
int f6 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a >= 0; }
int f7 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == -1; }
int f8 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != -1; }
int f9 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == 1; }
int f10 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != 1; }
int f11 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == 0; }
int f12 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a < 0; }
int f13 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a > 0; }
int f14 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != 0; }
int f15 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a <= 0; }
int f16 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a >= 0; }
int f17 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == -1; }
int f18 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != -1; }
int f19 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == 1; }
int f20 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != 1; }
Comment 16 Jakub Jelinek 2021-04-29 15:48:19 UTC
Writing a phiopt patch now.
Comment 17 Jakub Jelinek 2021-04-29 17:41:31 UTC
Created attachment 50710 [details]
gcc12-pr94589-wip.patch

WIP patch that just matches those spaceship comparisons followed by single use comparison of that with -1/0/1, but doesn't yet perform any of the optimizations.
Comment 18 Jakub Jelinek 2021-04-30 17:23:10 UTC
Created attachment 50719 [details]
gcc12-pr94589-wip.patch

Updated patch.  This already does the optimizations, though needs testcase coverage and perhaps something better for debug uses of the PHI_RESULT besides the single non-debug use.
And, it won't work together with the above mentioned forwprop patch, I'm afraid it would need to virtually undo that change but it would be hard.
Comment 19 Jakub Jelinek 2021-05-03 17:09:00 UTC
Created attachment 50741 [details]
gcc12-pr94589.patch

Full untested patch.
Comment 20 GCC Commits 2021-05-06 08:18:51 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ad96c867e173c1ebcfc201b201adac5095683a08

commit r12-559-gad96c867e173c1ebcfc201b201adac5095683a08
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu May 6 10:15:40 2021 +0200

    phiopt: Optimize (x <=> y) cmp z [PR94589]
    
    genericize_spaceship genericizes i <=> j to approximately
    ({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
    for strong ordering and
    ({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2; c; })
    for partial ordering.
    The C++ standard supports then == or != comparisons of that against
    strong/partial ordering enums, or </<=/==/!=/>/>= comparisons of <=> result
    against literal 0.
    
    In some cases we already optimize that but in many cases we keep performing
    all the 2 or 3 comparisons, compute the spaceship value and then compare
    that.
    
    The following patch recognizes those patterns if the <=> operands are
    integral types or floating point (the latter only for -ffast-math) and
    optimizes it to the single comparison that is needed (plus adds debug stmts
    if needed for the spaceship result).
    
    There is one thing I'd like to address in a follow-up: the pr94589-2.C
    testcase should be matching just 12 times each, but runs
    into operator>=(partial_ordering, unspecified) being defined as
    (_M_value&1)==_M_value
    rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
    unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
    but is not a single use but two uses.  I'll need to pattern match that case
    specially.
    
    2021-05-06  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
            spaceship_replacement.
            (cond_only_block_p, spaceship_replacement): New functions.
    
            * gcc.dg/pr94589-1.c: New test.
            * gcc.dg/pr94589-2.c: New test.
            * gcc.dg/pr94589-3.c: New test.
            * gcc.dg/pr94589-4.c: New test.
            * g++.dg/opt/pr94589-1.C: New test.
            * g++.dg/opt/pr94589-2.C: New test.
            * g++.dg/opt/pr94589-3.C: New test.
            * g++.dg/opt/pr94589-4.C: New test.
Comment 21 GCC Commits 2021-05-12 07:48:38 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f5f1838435400b837c8677c53a611e2dc6d56442

commit r12-733-gf5f1838435400b837c8677c53a611e2dc6d56442
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed May 12 09:46:03 2021 +0200

    match.pd: Optimize (x & y) == x into (x & ~y) == 0 [PR94589]
    
    > Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
    > that could be worth doing already in GIMPLE.
    
    Apparently it is
      /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
         constant folding if x/y is a constant.  */
      if ((code == EQ || code == NE)
          && (op0code == AND || op0code == IOR)
          && !side_effects_p (op1)
          && op1 != CONST0_RTX (cmp_mode))
        {
          /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
             (eq/ne (and (not y) x) 0).  */
    ...
          /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
             (eq/ne (and (not x) y) 0).  */
    Yes, doing that on GIMPLE for the case where the not argument is constant
    would simplify the phiopt follow-up (it would be single imm use then).
    
    On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:
    > We can probably do it in 2 steps, first something like
    >
    > (for cmp (eq ne)
    >  (simplify
    >   (cmp (bit_and:c @0 @1) @0)
    >   (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))
    >
    > to get rid of the double use, and then simplify X&C==0 to X<=~C if C is a
    > mask 111...000 (I thought we already had a function to detect such masks, or
    > the 000...111, but I can't find them anymore).
    
    Ok, here is the first step then.
    
    2021-05-12  Jakub Jelinek  <jakub@redhat.com>
                Marc Glisse  <marc.glisse@inria.fr>
    
            PR tree-optimization/94589
            * match.pd ((X & Y) == X -> (X & ~Y) == 0,
            (X | Y) == Y -> (X & ~Y) == 0): New GIMPLE simplifications.
    
            * gcc.dg/tree-ssa/pr94589-1.c: New test.
Comment 22 GCC Commits 2021-05-18 08:09:49 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:65061ea287a80cfb214e402cfd2373a14bfec95a

commit r12-864-g65061ea287a80cfb214e402cfd2373a14bfec95a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue May 18 10:08:51 2021 +0200

    phiopt: Optimize partial_ordering spaceship >= 0 -ffinite-math-only [PR94589]
    
    As mentioned earlier, spaceship_replacement didn't optimize partial_ordering
    >= 0 comparisons, because the possible values are -1, 0, 1, 2 and the
    >= comparison is implemented as (res & 1) == res to choose the 0 and 1
    cases from that.  As we optimize that only with -ffinite-math-only, the
    2 case is assumed not to happen and my earlier match.pd change optimizes
    (res & 1) == res into (res & ~1) == 0, so this patch pattern matches
    that case and handles it like res >= 0.
    
    2021-05-18  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            * tree-ssa-phiopt.c (spaceship_replacement): Pattern match
            phi result used in (res & ~1) == 0 comparison as res >= 0 as
            res == 2 would be UB with -ffinite-math-only.
    
            * g++.dg/opt/pr94589-2.C: Adjust scan-tree-dump count from 14 to 12.
Comment 23 Jakub Jelinek 2021-05-18 08:11:56 UTC
Fixed.
Comment 24 GCC Commits 2021-05-20 07:11:59 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2b536797f7e43c55072a3215735f5833f1d6d218

commit r12-935-g2b536797f7e43c55072a3215735f5833f1d6d218
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu May 20 09:10:38 2021 +0200

    phiopt: Simplify (X & Y) == X -> (X & ~Y) == 0 even in presence of integral conversions [PR94589]
    
    On Wed, May 19, 2021 at 10:15:53AM +0200, Christophe Lyon via Gcc-patches wrote:
    > After this update, the test fails on arm and aarch64: according to the
    > logs, the optimization is still performed 14 times.
    
    Seems this is because
                  if (change
                      && !flag_syntax_only
                      && (load_extend_op (TYPE_MODE (TREE_TYPE (and0)))
                          == ZERO_EXTEND))
                    {
                      tree uns = unsigned_type_for (TREE_TYPE (and0));
                      and0 = fold_convert_loc (loc, uns, and0);
                      and1 = fold_convert_loc (loc, uns, and1);
                    }
    in fold-const.c adds on these targets extra casts that prevent the
    optimizations.
    
    2021-05-20  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            * match.pd ((X & Y) == X -> (X & ~Y) == 0): Simplify even in presence
            of integral conversions.
Comment 25 GCC Commits 2021-05-21 08:40:39 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f1c777f40aa0b6941efc7440495a8d7e0cc2a1bb

commit r12-964-gf1c777f40aa0b6941efc7440495a8d7e0cc2a1bb
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri May 21 10:39:50 2021 +0200

    tree-optimization: Improve spaceship_replacement [PR94589]
    
    On Wed, May 19, 2021 at 01:30:31PM -0400, Jason Merrill via Gcc-patches wrote:
    > Here, when genericizing lexicographical_compare_three_way, we haven't yet
    > walked the operands, so (a == a) still sees ADDR_EXPR <a>, but this is after
    > we've changed the type of a to REFERENCE_TYPE.  When we try to fold (a == a)
    > by constexpr evaluation, the constexpr code doesn't understand trying to
    > take the address of a reference, and we end up crashing.
    >
    > Fixed by avoiding constexpr evaluation in genericize_spaceship, by using
    > fold_build2 instead of build_new_op on scalar operands.  Class operands
    > should have been expanded during parsing.
    
    Unfortunately this slightly changed the IL and spaceship_replacement no
    longer pattern matches it.
    
    Here are 3 improvements that make it match:
    
    1) as mentioned in the comment above spaceship_replacement, for
       strong_ordering, we are pattern matching something like:
       x == y ? 0 : x < y ? -1 : 1;
       and for partial_ordering
       x == y ? 0 : x < y ? -1 : x > y ? 1 : 2;
       but given the == comparison done first and the other comparisons only
       if == was false, we actually don't care if the other comparisons
       are < vs. <= (or > vs. >=), provided the operands of the comparison
       are the same; we know == is false when doing those and < vs. <= or
       > vs. >= have the same behavior for NaNs too
    2) when y is an integral constant, we should treat x < 5 equivalently
       to x <= 4 etc.
    3) the code punted if cond2_phi_edge wasn't a EDGE_TRUE_VALUE edge, but
       as the new IL shows, that isn't really needed; given 1) that
       > and >= are equivalent in the code, any of swapping the comparison
       operands, changing L[TE]_EXPR to G[TE]_EXPR or vice versa or
       swapping the EDGE_TRUE_VALUE / EDGE_FALSE_VALUE bits on the edges
       reverses one of the two comparisons
    
    2021-05-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            * tree-ssa-phiopt.c (spaceship_replacement): For integral rhs1 and
            rhs2, treat x <= 4 equivalently to x < 5 etc.  In cmp1 and cmp2 (if
            not the same as cmp3) treat <= the same as < and >= the same as >.
            Don't require that cond2_phi_edge is true edge, instead take
            false/true edges into account based on cmp1/cmp2 comparison kinds.
Comment 26 Andrew Pinski 2023-09-13 18:09:33 UTC
Created attachment 55893 [details]
Here is my idea around the patch for prototype for doing the constant prop idea

This is like the one in comment #13 but handles it in phiopt rather than forwprop.
Comment 27 GCC Commits 2024-11-21 08:38:47 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ca7430f145f5c7960f67ec77f585f3a2b58c7d10

commit r15-5556-gca7430f145f5c7960f67ec77f585f3a2b58c7d10
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 21 09:38:01 2024 +0100

    phiopt: Fix a pasto in spaceship_replacement [PR117612]
    
    When working on the PR117612 fix, I've noticed a pasto in
    tree-ssa-phiopt.cc (spaceship_replacement).
    The code is
          if (absu_hwi (tree_to_shwi (arg2)) != 1)
            return false;
          if (e1->flags & EDGE_TRUE_VALUE)
            {
              if (tree_to_shwi (arg0) != 2
                  || absu_hwi (tree_to_shwi (arg1)) != 1
                  || wi::to_widest (arg1) == wi::to_widest (arg2))
                return false;
            }
          else if (tree_to_shwi (arg1) != 2
                   || absu_hwi (tree_to_shwi (arg0)) != 1
                   || wi::to_widest (arg0) == wi::to_widest (arg1))
            return false;
    where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a
    true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1,
    otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1.
    But due to pasto in the latte case doesn't verify that arg0
    is different from arg2, it could be both -1 or both 1 and we wouldn't
    punt.  The wi::to_widest (arg0) == wi::to_widest (arg1) test
    is always false when we've made sure in the earlier conditions that
    arg1 is 2 and arg0 is -1 or 1, so never 2.
    
    2024-11-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            PR tree-optimization/117612
            * tree-ssa-phiopt.cc (spaceship_replacement): Fix up
            a pasto in check when arg1 is 2.
Comment 28 GCC Commits 2024-11-21 08:40:19 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:05ab9447fe80e5b1450192e21f3116890d38ecc7

commit r15-5557-g05ab9447fe80e5b1450192e21f3116890d38ecc7
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 21 09:39:06 2024 +0100

    phiopt: Improve spaceship_replacement for HONOR_NANS [PR117612]
    
    The following patch optimizes spaceship followed by comparisons of the
    spaceship value even for floating point spaceship when NaNs can appear.
    operator<=> for this emits roughly
    signed char c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2;
    and I believe the
    /* The optimization may be unsafe due to NaNs.  */
    comment just isn't true.
    Sure, the i == j comparison doesn't raise exceptions on qNaNs, but if
    one of the operands is qNaN, then i == j is false and i < j or i > j
    is then executed and raises exceptions even on qNaNs.
    And we can safely optimize say
    c == -1 comparison after the above into i < j, that also raises
    exceptions like before and handles NaNs the same way as the original.
    The only unsafe transormation would be c == 0 or c != 0, turning it
    into i == j or i != j wouldn't raise exception, so I'm not doing that
    optimization (but other parts of the compiler optimize the i < j comparison
    away anyway).
    
    Anyway, to match the HONOR_NANS case, we need to verify that the
    second comparison has true edge to the phi_bb (yielding there -1 or 1),
    it can't be the false edge because when NaNs are honored, the false
    edge is for both the case where the inverted comparison is true or when
    one of the operands is NaN.  Similarly we need to ensure that the two
    non-equality comparisons are the opposite, while for -ffast-math we can in
    some cases get one comparison x >= 5.0 and the other x > 5.0 and it is fine,
    because NaN is UB, when NaNs are honored, they must be different to leave
    the unordered case with 2 value as the last one remaining.
    The patch also punts if HONOR_NANS and the phi has just 3 arguments instead
    of 4.
    When NaNs are honored, we also in some cases need to perform some comparison
    and then invert its result (so that exceptions are properly thrown and we
    get the correct result).
    
    2024-11-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            PR tree-optimization/117612
            * tree-ssa-phiopt.cc (spaceship_replacement): Handle
            HONOR_NANS (TREE_TYPE (lhs1)) case when possible.
    
            * gcc.dg/pr94589-5.c: New test.
            * gcc.dg/pr94589-6.c: New test.
            * g++.dg/opt/pr94589-5.C: New test.
            * g++.dg/opt/pr94589-6.C: New test.
Comment 29 GCC Commits 2025-01-09 12:57:12 UTC
The releases/gcc-14 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:3190d6263c90f8e859b738a9b39f5e650a4a3a16

commit r14-11142-g3190d6263c90f8e859b738a9b39f5e650a4a3a16
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 21 09:38:01 2024 +0100

    phiopt: Fix a pasto in spaceship_replacement [PR117612]
    
    When working on the PR117612 fix, I've noticed a pasto in
    tree-ssa-phiopt.cc (spaceship_replacement).
    The code is
          if (absu_hwi (tree_to_shwi (arg2)) != 1)
            return false;
          if (e1->flags & EDGE_TRUE_VALUE)
            {
              if (tree_to_shwi (arg0) != 2
                  || absu_hwi (tree_to_shwi (arg1)) != 1
                  || wi::to_widest (arg1) == wi::to_widest (arg2))
                return false;
            }
          else if (tree_to_shwi (arg1) != 2
                   || absu_hwi (tree_to_shwi (arg0)) != 1
                   || wi::to_widest (arg0) == wi::to_widest (arg1))
            return false;
    where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a
    true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1,
    otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1.
    But due to pasto in the latte case doesn't verify that arg0
    is different from arg2, it could be both -1 or both 1 and we wouldn't
    punt.  The wi::to_widest (arg0) == wi::to_widest (arg1) test
    is always false when we've made sure in the earlier conditions that
    arg1 is 2 and arg0 is -1 or 1, so never 2.
    
    2024-11-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/94589
            PR tree-optimization/117612
            * tree-ssa-phiopt.cc (spaceship_replacement): Fix up
            a pasto in check when arg1 is 2.
    
    (cherry picked from commit ca7430f145f5c7960f67ec77f585f3a2b58c7d10)