63184 – [9/10/11/12 Regression] Fails to simplify comparison

Bug 63184 - [9/10/11/12 Regression] Fails to simplify comparison

Summary: [9/10/11/12 Regression] Fails to simplify comparison

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.9.1

Importance:	P2 minor
Target Milestone:	12.0
Assignee:	Andrew Pinski

URL:	https://gcc.gnu.org/pipermail/gcc-pat...
Keywords:	deferred, missed-optimization, patch, TREE

Depends on:
Blocks:

Reported:	2014-09-05 08:32 UTC by Richard Biener
Modified:	2021-09-06 07:48 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2018-12-06 00:00:00

Attachments
patch (1.79 KB, patch) 2018-12-07 10:56 UTC, Richard Biener	Details \| Diff
reassoc patch (703 bytes, patch) 2018-12-07 10:57 UTC, Richard Biener	Details \| Diff
patch which I am testing (1.39 KB, patch) 2021-09-06 01:10 UTC, Andrew Pinski	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2014-09-05 08:32:53 UTC

c-c++-common/pr19807-2.c

/* { dg-do link } */

extern void link_error(void);
int i;
int main()
{
  int a[4];
  if ((char*)&a[1] + 4*i + 4 != (char*)&a[i+2])
    link_error();
  return 0;
}

Fails with all optimization levels for all compilers.

c-c++-common/pr19807-3.c

/* { dg-do link } */

extern void link_error(void);
int i;
int main()
{
  int a[4];
  if (&a[1] + i + 1 != &a[i+2])
    link_error();
  return 0;
}

gcc 4.7 passes with -O[01] but fails with -O2+
gcc 4.8 and 4.9 fail with -O0 but passes with -O1+ (thanks to SLSR)
gcc 5.0 since the fix for PR63148 fails

which is a regression.

We don't seem to systematically use sth like the tree-affine.c framework
to simplify address comparisons and the fact that tree-reassoc.c doesn't
in any way associate pointer arithmetic doesn't help either.

Comment 1 Richard Biener 2014-09-05 08:34:05 UTC

Note that due to the inconsistency with -fstrict-overflow I don't consider this
an important optimization regression.

But I'll see if I can improve some generic code to handle these better (with
optimization).  But for 5.0 only.

Comment 2 Richard Biener 2014-10-17 11:29:47 UTC

Ok, so we can (and do) forward addresses like &a[_9] into plain dereferences,
so lowering these addresses early is probably not a good idea.  We could
lower them as a very first thing in GIMPLE reassoc but that would still need
to apply transforms like (i + 1) * 4 -> i*4 + 4 to be most effective
(though that generally applies).

Otherwise I can't see an easy place to hook in simplification of

  i.0_3 = i;
  _4 = i.0_3 * 4;
  _5 = (sizetype) _4;
  _6 = _5 + 4;
  _7 = &a[1] + _6;
  _9 = i.0_3 + 2;
  _10 = &a[_9];
  if (_7 != _10)

which is what we get for the first testcase after initial scalar cleanups.
SLSR helps somewhat in some cases but it's run very late.

We could detect address comparisons with the same base and use the
affine combination machinery to simplify them though.  From somewhere.
tree-ssa-forwprop.c probably.  Simplify them to _7 - _10 CMP 0 (if
the addresses have the same base object at least that should cancel).
We canonicalize (base p+ off1) p+ off2 to base p+ (off1 + off2) which
should help here as well.

We could also only "fold" away the base object from the chain (not using
the affine machinery).  match-and-simplify doesn't help here because
the replacement involves an expression created by get_inner_reference
(well, a manual transform would work here, of course).

Comment 3 Jakub Jelinek 2015-04-22 11:59:11 UTC

GCC 5.1 has been released.

Comment 4 Richard Biener 2015-07-16 09:11:48 UTC

GCC 5.2 is being released, adjusting target milestone to 5.3.

Comment 5 Richard Biener 2015-11-18 13:22:53 UTC

Ok, on trunk we now lower addresses somewhat but that doesn't help as we are
still faced with

  i.0_3 = i;
  _4 = (sizetype) i.0_3;
  _5 = _4 + 1;
  _6 = _5 * 4;
  _7 = &a[1] + _6;
  _8 = i.0_3 + 2;
  _12 = (sizetype) _8;
  _13 = _12 * 4;
  _9 = &a + _13;
  if (_7 != _9)

where FRE/PRE cannot see any equivalences, reassoc doesn't do anything and
SLSR does

-  _13 = _12 * 4;
+  _13 = _6 + 4;

but that's too late or rather DOM doesn't figure out the equivalence
of &a + _6 + 4 and &a[1] + _6 (not that I see how it could).

Comment 6 Richard Biener 2015-12-04 10:44:02 UTC

GCC 5.3 is being released, adjusting target milestone.

Comment 7 Richard Biener 2016-06-03 10:04:40 UTC

GCC 5.4 is being released, adjusting target milestone.

Comment 8 Jeffrey A. Law 2017-01-13 20:24:50 UTC

So while we don't optimize this on the tree level, the addressing lowering that was introduced for gcc-6 sets things up so the RTL optimizers can detect the equivalences and remove the tests.

So ISTM we can do two things here.  Keep this as a regression since it'd be nice for the SSA/gimple optimizers to catch this.  Or mark it was fixed for gcc-6 and gcc-7 since the RTL optimizers are able to clean this stuff up.

Comment 9 Richard Biener 2017-01-16 07:20:54 UTC

Keep it open, it still is an issue in VN that I'd like to address.

Comment 10 Richard Biener 2018-03-15 11:06:53 UTC

deferred.

Comment 11 Jeffrey A. Law 2018-12-04 15:49:33 UTC

So could we reassociate the address arithmetic in match.pd so that we fold away the pointer computation in favor of index adjustment in the ARRAY_REF?  Do we have to worry about overflow in address reassociation?

Using the gimple from c#2:

  i.0_3 = i;
  _4 = i.0_3 * 4;
  _5 = (sizetype) _4;
  _6 = _5 + 4;
  _7 = &a[1] + _6;
  _9 = i.0_3 + 2;
  _10 = &a[_9];
  if (_7 != _10)


Transform it into:

  i.0_3 = i;
  temp = i.0_3 + 1 + 1;  /* +1 from pointer arith, +1 from array index ]
  _4 = i.0_3 * 4;        /* DEAD */
  _5 = (sizetype) _4;    /* DEAD */
  _6 = _5 + 4;           /* DEAD */
  _7 = &a[temp];
  _9 = i.0_3 + 2;
  _10 = &a[_9];
  if (_7 != _10)

That gives us a fighting chance to see that temp is equivalent to _9 and that the ultimate addresses are equal.

Comment 12 Richard Biener 2018-12-05 14:54:02 UTC

It happens that we optimize both cases now (with optimization only).  Would be still a regression vs. GCC 4.7 at -O0 but IMHO we shouldn't care about optimizing this at -O0.

Thus fixed.

I'll add the testcases.

Comment 13 Richard Biener 2018-12-05 14:56:31 UTC

Author: rguenth
Date: Wed Dec  5 14:55:59 2018
New Revision: 266827

URL: https://gcc.gnu.org/viewcvs?rev=266827&root=gcc&view=rev
Log:
2018-12-05  Richard Biener  <rguenther@suse.de>

	PR middle-end/63184
	* c-c++-common/pr19807-2.c: New testcase.
	* c-c++-common/pr19807-3.c: Likewise.

Added:
    trunk/gcc/testsuite/c-c++-common/pr19807-2.c
    trunk/gcc/testsuite/c-c++-common/pr19807-3.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 14 Rainer Orth 2018-12-06 09:57:14 UTC

The new testcases FAIL on quite a number of targets (I'm seeing it on
sparc-sun-solaris2.11, 32 and 64-bit, but there are also gcc-testresults postings
for aarch64-unknown-linux-gnu, armv8l-unknown-linux-gnueabihf,
ia64-suse-linux-gnu, moxie-unknown-elf,powerpc64-unknown-linux-gnu,
powerpc64le-unknown-linux-gnu:

+FAIL: c-c++-common/pr19807-2.c  -std=gnu++14 (test for excess errors)
+FAIL: c-c++-common/pr19807-2.c  -std=gnu++17 (test for excess errors)
+FAIL: c-c++-common/pr19807-2.c  -std=gnu++98 (test for excess errors)
+FAIL: c-c++-common/pr19807-3.c  -std=gnu++14 (test for excess errors)
+FAIL: c-c++-common/pr19807-3.c  -std=gnu++17 (test for excess errors)
+FAIL: c-c++-common/pr19807-3.c  -std=gnu++98 (test for excess errors)

+FAIL: c-c++-common/pr19807-2.c  -Wc++-compat  (test for excess errors)

Excess errors:
Undefined                       first referenced
 symbol                             in file
link_error                          /var/tmp//ccvTdBwc.o

+FAIL: c-c++-common/pr19807-3.c  -Wc++-compat  (test for excess errors)

Comment 15 Richard Biener 2018-12-06 10:18:16 UTC

Oh, we only optimize it on RTL on x86, possibly with the help of combine and complex addressing modes...  I'll adjust the testcase.

Comment 16 Richard Biener 2018-12-06 10:26:50 UTC

Now to answer Jeff my original idea was to change how VN handles addresses
to use tree-affine in there.  But that takes some time...

Given that we lower non-invariant addresses now another idea would be to
recognize

  _4 = _3 + 4;
  _5 = &a[1] + _4;

and reassociate that to combine the + 4 with the &a[1] (which is &a + 4).
Now the reassoc pass associates the constant last which makes this
unreliable to detect in a match.pd pattern (the constant may be far away)
but the reassoc pass itself could, when associating a chain, look for
whether the SSA name we start from is (single-)used in a POINTER_PLUS_EXPR
and the constant element in the reassoc ops[] array can be combined with
a constant offset in the address operand.

I'll see if that works out.

Comment 17 Richard Biener 2018-12-06 10:51:06 UTC

Author: rguenth
Date: Thu Dec  6 10:20:39 2018
New Revision: 266846

URL: https://gcc.gnu.org/viewcvs?rev=266846&root=gcc&view=rev
Log:
2018-12-06  Richard Biener  <rguenther@suse.de>

	PR middle-end/63184
	* c-c++-common/pr19807-2.c: Try link only on x86, add xfailed
	optimized dump scanning.
	* c-c++-common/pr19807-3.c: Likewise.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/c-c++-common/pr19807-2.c
    trunk/gcc/testsuite/c-c++-common/pr19807-3.c

Comment 18 Richard Biener 2018-12-06 11:09:08 UTC

That works but it doesn't perform the folding in the end.  Still combining
&a + b + c + d + e + 2 into (&a + 2) + b + c + d + e is better since most
targets can compute symbol+offset in a single instruction.

We then end up with

  <bb 2> [local count: 1073741824]:
  i.0_1 = i;
  _2 = i.0_1 * 4;
  _3 = (sizetype) _2;
  _5 = &MEM[(void *)&a + 8B] + _3;
  _13 = _3 + 8;
  _7 = &a + _13;
  if (_5 != _7)
    goto <bb 3>; [53.47%]
  else
    goto <bb 4>; [46.53%]

where the issue is that the 2nd opportunity for &a + _3 + 8 only appears
after SLSR which transforms

  _6 = i.0_1 + 2;
  _12 = (sizetype) _6;
  _13 = _12 * 4;
  _7 = &a + _13;

to

  _6 = i.0_1 + 2;
  _12 = (sizetype) _6;
  _13 = _3 + 8;
  _7 = &a + _13;

and we have

      NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
      NEXT_PASS (pass_strength_reduction);

and I'm not sure it's a good idea to swap those two...

If I do we fold things (albeit quite late).

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index a9f45bfd891..fb1f8014633 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5988,6 +5988,31 @@ reassociate_bb (basic_block bb)
                    }
                }
 
+             /* If the association chain is used in a single
+                POINTER_PLUS_EXPR with an invariant first operand
+                then combine a constant element with the invariant
+                address.  */
+             use_operand_p use_p;
+             gimple *use_stmt;
+             if (ops.length () > 1
+                 && rhs_code == PLUS_EXPR
+                 && TREE_CODE (ops.last ()->op) == INTEGER_CST
+                 && single_imm_use (lhs, &use_p, &use_stmt)
+                 && is_gimple_assign (use_stmt)
+                 && gimple_assign_rhs_code (use_stmt) == POINTER_PLUS_EXPR
+                 && TREE_CODE (gimple_assign_rhs1 (use_stmt)) == ADDR_EXPR)
+               {
+                 last = ops.pop ();
+                 tree addr = gimple_assign_rhs1 (use_stmt);
+                 addr = build1 (ADDR_EXPR, TREE_TYPE (addr),
+                                fold_build2 (MEM_REF,
+                                             TREE_TYPE (TREE_TYPE (addr)),
+                                             addr,
+                                             fold_convert (ptr_type_node,
+                                                           last->op)));
+                 gimple_assign_set_rhs1 (use_stmt, addr);
+               }
+
              tree new_lhs = lhs;
              /* If the operand vector is now empty, all operands were 
                 consumed by the __builtin_powi optimization.  */

Comment 19 Richard Biener 2018-12-06 11:45:40 UTC

Btw, we alread manage to do sth like that on RTL.

typedef __UINTPTR_TYPE__ uintptr_t;
char a[1024];
char foo (uintptr_t b, uintptr_t c, uintptr_t d, uintptr_t e, uintptr_t f)
{
  uintptr_t tem = b + c + d + e + f + 1;
  return *(a + tem);
}

with the patch changes

-  _14 = f_10(D) + 1;
-  _15 = e_9(D) + _14;
-  _16 = d_8(D) + _15;
-  _17 = c_7(D) + _16;
-  tem_11 = b_6(D) + _17;
-  _5 = &a + tem_11;
+  _14 = e_9(D) + f_10(D);
+  _15 = d_8(D) + _14;
+  _16 = c_7(D) + _15;
+  tem_11 = b_6(D) + _16;
+  _5 = &MEM[(void *)&a + 1B] + tem_11;
   _13 = *_5;
   return _13;

but the asssembler is the same

        addq    %rsi, %rdi
        addq    %rdx, %rdi
        addq    %rcx, %rdi
        movzbl  a+1(%r8,%rdi), %eax
        ret

disabling TER makes it worse without the patch but not with:

        leaq    1(%r8,%rcx), %rax
        addq    %rax, %rdx
        addq    %rdx, %rsi
        movzbl  a(%rsi,%rdi), %eax
        ret

so I guess the incremental RTL build canonicalizes the constant outermost.

At least the patch looks like being in the correct direction for the
goal of removing TER...

Still most of the time the issue will be that ptr + a + b + c + 1 cannot
be handled because of the intermediate conversion.  That could be
mitigated by moving the constant part out of an outer conversion
(when permitted).

Comment 20 Richard Biener 2018-12-07 10:56:45 UTC

Created attachment 45179 [details]
patch

This patch fixes the testcases but it comes at a cost without much visible
benefit (trying 258098 expansions in a C,C++ bootstrap with just 4 hits,
all in the same place).

Comment 21 Richard Biener 2018-12-07 10:57:26 UTC

Created attachment 45180 [details]
reassoc patch

The reassoc patch, not solving the issue in its own.

Comment 22 Richard Biener 2019-11-14 07:55:38 UTC

The GCC 7 branch is being closed, re-targeting to GCC 8.4.

Comment 23 Jakub Jelinek 2020-03-04 09:49:43 UTC

GCC 8.4.0 has been released, adjusting target milestone.

Comment 24 Jakub Jelinek 2021-05-14 09:47:18 UTC

GCC 8 branch is being closed.

Comment 25 Richard Biener 2021-06-01 08:06:25 UTC

GCC 9.4 is being released, retargeting bugs to GCC 9.5.

Comment 26 Andrew Pinski 2021-09-05 23:59:24 UTC

So what is interesting is for aarch64, GCC is able to figure out the two values are equal but only after register allocation (note LLVM has a similar issue too).

Anyways at forwprop4 we have:
  _5 = &a[1] + _4;
  _13 = _3 + 8;
  _7 = &a + _13;
  if (_5 != _7)

I think I have a patch (I need to format it correctly though):
(for neeq (ne eq)
 (simplify
  (neeq (pointer_plus ADDR_EXPR@0 @1) (pointer_plus ADDR_EXPR@2 @3))
   (with { poly_int64 diff; }
    (if (ptr_difference_const (@0, @2, &diff))
     (neeq (plus { build_int_cst_type (type, diff); }
            (convert (minus @1 @3)))
      {build_zero_cst (type); })))))


I also added this one too as I saw that was not being folded either:
(simplify
 (pointer_diff (pointer_plus ADDR_EXPR@0 @1) (pointer_plus ADDR_EXPR@2 @3))
 (with { poly_int64 diff; }
   (if (ptr_difference_const (@0, @2, &diff))
    (plus { build_int_cst_type (type, diff); } (convert (minus @1 @3))))))

Comment 27 Andrew Pinski 2021-09-06 01:10:08 UTC

Created attachment 51411 [details]
patch which I am testing

Comment 28 Andrew Pinski 2021-09-06 05:31:37 UTC

Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578859.html

Comment 29 GCC Commits 2021-09-06 07:47:19 UTC

The master branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:564efbf40077c25623cdd6ce2f911c56b5b08f6c

commit r12-3364-g564efbf40077c25623cdd6ce2f911c56b5b08f6c
Author: Andrew Pinski <apinski@marvell.com>
Date:   Mon Sep 6 00:52:18 2021 +0000

    Fix PR tree-optimization/63184: add simplification of (& + A) != (& + B)
    
    These two testcases have been failing since GCC 5 but things
    have improved such that adding a simplification to match.pd
    for this case is easier than before.
    In the end we have the following IR:
    ....
      _5 = &a[1] + _4;
      _7 = &a + _13;
      if (_5 != _7)
    
    So we can fold the _5 != _7 into:
    (&a[1] - &a) + _4 != _13
    
    The subtraction is folded into constant by ptr_difference_const.
    In this case, the full expression gets folded into a constant
    and we are able to remove the if statement.
    
    OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
    
    gcc/ChangeLog:
    
            PR tree-optimization/63184
            * match.pd: Add simplification of pointer_diff of two pointer_plus
            with addr_expr in the first operand of each pointer_plus.
            Add simplificatoin of ne/eq of two pointer_plus with addr_expr
            in the first operand of each pointer_plus.
    
    gcc/testsuite/ChangeLog:
    
            PR tree-optimization/63184
            * c-c++-common/pr19807-2.c: Enable for all targets and remove the xfail.
            * c-c++-common/pr19807-3.c: Likewise.

Comment 30 Andrew Pinski 2021-09-06 07:48:48 UTC

Fixed on the trunk and I don't think this should be backported as it has been a missed optimization for 7 years or more.