94988 – [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

Bug 94988 - [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	11.0

Importance:	P3 normal
Target Milestone:	11.0
Assignee:	Richard Biener

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	57359
	Show dependency tree / graph

Reported:	2020-05-07 18:29 UTC by H.J. Lu
Modified:	2024-02-16 07:49 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2020-05-08 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2020-05-07 18:29:21 UTC

commit 283cb9ea6293e813e48a1b769e1e0779918ea20a (r11-161)

Author: Richard Biener <rguenther@suse.de>
Date:   Mon Apr 27 14:45:54 2020 +0200

    tree-optimization/57359 - rewrite SM code
    
    This rewrites store-motion to process candidates where we can
    ensure order preserving separately and with no need to disambiguate
    against all stores.  Those candidates we cannot handle this way
    are validated to be independent on all stores (w/o TBAA) and then
    processed as "unordered" (all conditionally executed stores are so
    as well).
    
    This will necessary cause
      FAIL: gcc.dg/graphite/pr80906.c scan-tree-dump graphite "isl AST to Gimple succeeded"
    because the SM previously performed is not valid for exactly the PR57359
    reason, we still perform SM of qc for the innermost loop but that's not enough.
    
    There is still room for improvements because we still check some constraints
    for the order preserving cases that are only necessary in the current
    strict way for the unordered ones.  Leaving that for the furture.

caused:

FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

Comment 1 Richard Biener 2020-05-08 06:31:17 UTC

Ah, forgot to update this testcase.  This is another instance of PR57359, that is, we may not sink the store to b across the store to *b since b may point
to itself and with j == 1 we'd change

 b = b + 2;
 *b = x;

to

 *b = x;
 b = b + 2;

note there's a twist for this particular case, namely the preceeding load
of 'b' gives us knowledge about the dynamic type of 'b' which means we
could use that to assess that we _can_ exchange the stores.

But that logic is not implemented.

I'll see how to do that.

Comment 2 Richard Biener 2020-05-08 10:08:38 UTC

(In reply to Richard Biener from comment #1)
> Ah, forgot to update this testcase.  This is another instance of PR57359,
> that is, we may not sink the store to b across the store to *b since b may
> point
> to itself and with j == 1 we'd change
> 
>  b = b + 2;
>  *b = x;
> 
> to
> 
>  *b = x;
>  b = b + 2;
> 
> note there's a twist for this particular case, namely the preceeding load
> of 'b' gives us knowledge about the dynamic type of 'b' which means we
> could use that to assess that we _can_ exchange the stores.
> 
> But that logic is not implemented.
> 
> I'll see how to do that.

OK, we can't.  Consider the following which we miscompile with GCC 10
but which is fixed on trunk.  bar () is simply the inner loop of
bar in the pr64110.c testcase.  GCC 10 and earlier transform

  b++;
  *b = x;

to

  tem = b + 1;
  *b = x;
  b = tem;

which is wrong with b == &b, the *b = x store re-purposes the
storage in 'b'.

short *b;

void __attribute__((noipa))
bar (short x, int j)
{
  for (int i = 0; i < j; ++i)
    *b++ = x;
}

int
main()
{
  b = (short *)&b;
  bar (0, 1);
  if ((short)(unsigned long)b != 0)
    __builtin_abort ();
  return 0;
}

Now the only thing that can be done (as noted in PR57359) is
re-materializing _both_ stores on the exit.   Thus turn

  for (int i = 0; i < j; ++i)
    {
      tem = b;
      tem = tem + 1;
      b = tem;
      *tem = x;
    }

into

  tem = b;
  for (int i = 0; i < j; ++i)
    {
      tem = tem + 1;
      *tem = x;
    }
  b = tem;
  *tem = x;

when applying store-motion.  Note this only works when b is written to
unconditionally.  It also needs some kind of a cost model I guess...

Comment 3 GCC Commits 2020-05-11 14:53:07 UTC

The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b6ff3ddecfa93d53867afaaa078f85fc848abbbd

commit r11-272-gb6ff3ddecfa93d53867afaaa078f85fc848abbbd
Author: Richard Biener <rguenther@suse.de>
Date:   Fri May 8 12:03:30 2020 +0200

    tree-optimization/94988 - enhance SM some more
    
    This enhances store-order preserving store motion to handle the case
    of non-invariant dependent stores in the sequence of unconditionally
    executed stores on exit by re-issueing them as part of the sequence
    of stores on the exit.  This fixes the observed regression of
    gcc.target/i386/pr64110.c which relies on store-motion of 'b'
    for a loop like
    
      for (int i = 0; i < j; ++i)
        *b++ = x;
    
    where for correctness we now no longer apply store-motion.  With
    the patch we emit the correct
    
      tem = b;
      for (int i = 0; i < j; ++i)
        {
          tem = tem + 1;
          *tem = x;
        }
      b = tem;
      *tem = x;
    
    preserving the original order of stores.  A testcase reflecting
    the miscompilation done by earlier GCC is added as well.
    
    This also fixes the reported ICE in PR95025 and adds checking code
    to catch it earlier - the issue was not-supported refs propagation
    leaving stray refs in the sequence.
    
    2020-05-11  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/94988
            PR tree-optimization/95025
            * tree-ssa-loop-im.c (seq_entry): Make a struct, add from.
            (sm_seq_push_down): Take extra parameter denoting where we
            moved the ref to.
            (execute_sm_exit): Re-issue sm_other stores in the correct
            order.
            (sm_seq_valid_bb): When always executed, allow sm_other to
            prevail inbetween sm_ord and record their stored value.
            (hoist_memory_references): Adjust refs_not_supported propagation
            and prune sm_other from the end of the ordered sequences.
    
            * gcc.dg/torture/pr94988.c: New testcase.
            * gcc.dg/torture/pr95025.c: Likewise.
            * gcc.dg/torture/pr95045.c: Likewise.
            * g++.dg/asan/pr95025.C: New testcase.

Comment 4 Richard Biener 2020-05-11 14:57:22 UTC

Fixed.

Comment 5 Rainer Orth 2020-05-12 11:53:21 UTC

One new testcase FAILs on several targets:

+FAIL: gcc.dg/torture/pr94988.c   -O0  execution test
+FAIL: gcc.dg/torture/pr94988.c   -O1  execution test
+FAIL: gcc.dg/torture/pr94988.c   -O2  execution test
+FAIL: gcc.dg/torture/pr94988.c   -O2 -flto  execution test
+FAIL: gcc.dg/torture/pr94988.c   -O2 -flto -flto-partition=none  execution test
+FAIL: gcc.dg/torture/pr94988.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
+FAIL: gcc.dg/torture/pr94988.c   -O3 -g  execution test
+FAIL: gcc.dg/torture/pr94988.c   -Os  execution test

I'm seeing it on 32 and 64-bit sparc-sun-solaris2.11, and there are also reports
for moxie-unknown-elf, powerpc64-unknown-linux-gnu, s390x-ibm-linux-gnu, all
big-endian targets AFAICS.

Comment 6 GCC Commits 2020-05-12 12:12:40 UTC

The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:119a7db1e05c9741803b3ff93266b00fd535732a

commit r11-320-g119a7db1e05c9741803b3ff93266b00fd535732a
Author: Richard Biener <rguenther@suse.de>
Date:   Tue May 12 14:13:32 2020 +0200

    middle-end/94988 fix testcase for big-endian
    
    The testcase only works for little-endian, mark it so.
    
    2020-05-12  Richard Biener  <rguenther@suse.de>
    
            PR middle-end/94988
            * gcc.dg/torture/pr94988.c: Disable runtime test for
            * non-little-endian.

Comment 7 Richard Biener 2020-05-12 12:13:30 UTC

Fixed.

Comment 8 GCC Commits 2020-06-16 11:32:27 UTC

The master branch has been updated by Thomas Schwinge <tschwinge@gcc.gnu.org>:

https://gcc.gnu.org/g:2210ef7d3d68a027ec16476825049567953c7fa4

commit r11-1348-g2210ef7d3d68a027ec16476825049567953c7fa4
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Sat Jun 6 18:23:28 2020 +0200

    Un-XFAIL 'gcc.dg/graphite/pr80906.c'
    
    The recent commit b6ff3ddecfa93d53867afaaa078f85fc848abbbd
    "tree-optimization/94988 - enhance SM some more" fixed this.
    
            gcc/testsuite/
            PR tree-optimization/94988
            * gcc.dg/graphite/pr80906.c: Un-XFAIL.

Comment 9 Alexandre Oliva 2024-02-15 18:20:55 UTC

ISTM that the test invokes undefined behavior because the assignment and the increment in the loop both modify the same storage without an intervening sequence point.  ISTM that the dynamic type of that storage is thus uncertain, and accessing it afterwards, without an intervening store that resolves its type either way, would also invoke undefined behavior.

Comment 10 Alexandre Oliva 2024-02-15 18:36:33 UTC

Reasoning that the concurrent stores invoke undefined behavior would enable us to assume that the stores don't alias, which invalidates the reasoning in comment 1.  Alas, I don't think gimple preserves enough information for us to tell that two statements used to be concurrent so as to derive optimization opportunities from them.

Comment 11 Richard Biener 2024-02-16 07:49:01 UTC

(In reply to Alexandre Oliva from comment #9)
> ISTM that the test invokes undefined behavior because the assignment and the
> increment in the loop both modify the same storage without an intervening
> sequence point.  ISTM that the dynamic type of that storage is thus
> uncertain, and accessing it afterwards, without an intervening store that
> resolves its type either way, would also invoke undefined behavior.

I think the source can be rewritten to

    *b = x;
    b++;

and still show the original issue on the GIMPLE side.