Bug 96685 - Failure to optimize not+sub to add+not
Summary: Failure to optimize not+sub to add+not
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: 11.0
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
: 37516 (view as bug list)
Depends on:
Blocks: 19987 94877
  Show dependency treegraph
 
Reported: 2020-08-18 17:15 UTC by Gabriel Ravier
Modified: 2023-09-03 18:55 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-08-25 00:00:00


Attachments
gcc11-pr96685.patch (771 bytes, patch)
2020-12-11 13:24 UTC, Jakub Jelinek
Details | Diff
gcc11-pr96685.patch (1.05 KB, patch)
2020-12-11 16:37 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriel Ravier 2020-08-18 17:15:34 UTC
int f(int x, int y)
{
    return ~(x - y);
}

This can be optimized to `~x + y`. While this isn't necessarily faster on most platforms, it is at least equivalent and on x86 the addition can be optimized to `lea` whereas the subtraction can't. This transformation is done by LLVM, but not by GCC.
Comment 1 Richard Biener 2020-08-25 10:50:31 UTC
It's at least a missed canonicalization as ~x + y isn't transformed to ~(x - y)
either.

int f(int x, int y)
{
    return ~(x - y) + (~x + y);
}

should see CSE, with a minus folding to zero (that works already for some
reason).
Comment 2 Jakub Jelinek 2020-12-11 12:24:23 UTC
I guess ~x + y as canonical form has another advantage, that + is commutative while - is not.
Comment 3 Jakub Jelinek 2020-12-11 13:24:03 UTC
Created attachment 49742 [details]
gcc11-pr96685.patch

Untested fix.
Comment 4 Jakub Jelinek 2020-12-11 13:33:32 UTC
Though, there is some canonicalization problem GENERIC vs. GIMPLE:
unsigned
f1 (unsigned x, unsigned y)
{
  unsigned int r = (x - y);
  return ~r;
}

unsigned
f2 (unsigned x, unsigned y)
{
  unsigned int r = ~(x - y);
  return r;
}

unsigned
f3 (unsigned x)
{
  unsigned int r = (x - 23);
  return ~r;
}

unsigned
f4 (unsigned x)
{
  unsigned int r = ~(x - 23);
  return r;
}

int
f5 (int x, int y)
{
  int r = (x - y);
  return ~r;
}

int
f6 (int x, int y)
{
  int r = ~(x - y);
  return r;
}

int
f7 (int x)
{
  int r = (x - 23);
  return ~r;
}

int
f8 (int x)
{
  int r = ~(x - 23);
  return r;
}

Before the above patch, the above testcase emitted:
        subl    %esi, %edi
        movl    %edi, %eax
        notl    %eax
for f1/f2/f5/f6 and
        leal    -23(%rdi), %eax
        notl    %eax
for f3/f4/f7/f8.
With the patch it emits:
        notl    %edi
        leal    (%rdi,%rsi), %eax
for f1/f5/f6,
        subl    %edi, %esi
        leal    -1(%rsi), %eax
for f2,
        notl    %edi
        leal    23(%rdi), %eax
for f3/f7,
        movl    $22, %eax
        subl    %edi, %eax
for f4/f8.
Comment 5 Jakub Jelinek 2020-12-11 13:51:02 UTC
Ok, so for GENERIC it seems to be the associate: in fold_binary_loc that converts
~x + y created by this patch into (y - x) + 1, and we don't have an equivalent for that in GIMPLE.

So, shall I restrict this match.pd optimization to #if GIMPLE only, or shall I canonicalize not to
~x + y but to (y - x) + 1, or should we implement the associate: optimization on GIMPLE?  I guess the last one might be too much in stage3.
I guess the middle option wouldn't help for the testcase in the patch, because
we wouldn't canonicalize ~x + y to (y - x) + 1 and the first option would
not consider ~x + y or ~(x - y) equivalent to user-written (y - x) + 1.
Comment 6 Jakub Jelinek 2020-12-11 16:37:45 UTC
Created attachment 49745 [details]
gcc11-pr96685.patch

Updated patch.
Comment 7 GCC Commits 2020-12-12 13:50:57 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:0bd675183d94e6bca100c3aaaf87ee9676fb3c26

commit r11-5958-g0bd675183d94e6bca100c3aaaf87ee9676fb3c26
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Dec 12 14:49:57 2020 +0100

    match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
    
    This patch adds the ~(X - Y) -> ~X + Y simplification requested
    in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
    be safely negated.
    
    The first two simplify blocks is what has been requested in the PR
    and that makes the first testcase pass.
    Unfortunately, that change also breaks the second testcase, because
    while the same expressions appearing in the same stmt and split
    across multiple stmts has been folded (not really) before, with
    this optimization fold-const.c optimizes ~X + Y further into
    (Y - X) - 1 in fold_binary_loc associate: code, but we have nothing
    like that in GIMPLE and so end up with different expressions.
    
    The last simplify is an attempt to deal with just this case,
    had to rule out there the Y == -1U case, because then we
    reached infinite recursion as ~X + -1U was canonicalized by
    the pattern into (-1U - X) + -1U but there is a canonicalization
    -1 - A -> ~A that turns it back.  Furthermore, had to make it #if
    GIMPLE only, because it otherwise resulted in infinite recursion
    when interacting with the associate: optimization.
    The end result is that we pass all 3 testcases and thus canonizalize
    the 3 possible forms of writing the same thing.
    
    2020-12-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/96685
            * match.pd (~(X - Y) -> ~X + Y): New optimization.
            (~X + Y -> (Y - X) - 1): Likewise.
    
            * gcc.dg/tree-ssa/pr96685-1.c: New test.
            * gcc.dg/tree-ssa/pr96685-2.c: New test.
            * gcc.dg/tree-ssa/pr96685-3.c: New test.
Comment 8 Jakub Jelinek 2021-01-16 08:44:14 UTC
Fixed.
Comment 9 Andrew Pinski 2021-08-03 23:03:50 UTC
*** Bug 37516 has been marked as a duplicate of this bug. ***