Bug 101868 - [9 Regression] Incorrect reordering in -O2 with LTO
Summary: [9 Regression] Incorrect reordering in -O2 with LTO
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.2.0
: P2 normal
Target Milestone: 9.5
Assignee: Richard Biener
URL:
Keywords: lto, wrong-code
Depends on:
Blocks:
 
Reported: 2021-08-11 23:54 UTC by Alan Wu
Modified: 2023-04-10 05:26 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.3.1, 11.2.1, 12.0, 9.4.1
Known to fail: 10.3.0, 11.2.0, 9.4.0
Last reconfirmed: 2021-08-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Wu 2021-08-11 23:54:12 UTC
GCC with LTO seems to be hoisting a memory read to a place too early. It only seems to reproduce with LTO, so please excuse posting multiple files.

Compile command: gcc -flto -O2 -fno-strict-aliasing one.c two.c three.c four.c

//--------------- one.c --------------------------------
typedef unsigned long VALUE;

__attribute__ ((cold))
void rb_check_type(VALUE, int);

static VALUE
repro(VALUE dummy, VALUE hash)
{
    if (hash == 0) {
        rb_check_type(hash, 1);
    }
    else if (*(long *)hash) {
        rb_check_type(hash, 1);
    }


    return *(long *)hash;
}

static VALUE (*that)(VALUE dummy, VALUE hash) = repro;

int
main(int argc, char **argv)
{
        argc--;
        that(0, argc);

        rb_check_type(argc, argc);

}
//------------ end of one.c ----------------------------

//------------ two.c -----------------------------------
typedef unsigned long VALUE;


__attribute__ ((noreturn)) void rexc_raise(VALUE mesg);

VALUE rb_donothing(VALUE klass);

static void
funexpected_type(VALUE x, int xt, int t)
{
    rexc_raise(rb_donothing(0));
}

__attribute__ ((cold))
void
rb_check_type(VALUE x, int t)
{
    int xt;

    if (x == 0) {
        funexpected_type(x, xt, t);
    }
}
//------------- end of two.c ---------------------------

//------------ three.c ---------------------------------
typedef unsigned long VALUE;

static void thing(void) {}
static void (*ptr)(void) = &thing;

VALUE
rb_donothing(VALUE klass)
{
        ptr();
        return 0;
}
//-------- end of three.c ------------------------------

//-------- four.c --------------------------------------
typedef unsigned long VALUE;

__attribute__((noreturn))
void
rexc_raise(VALUE mesg)
{
        __builtin_exit(42);
}
//------------- end of four.c --------------------------


The code for repo() reads from memory before doing the check for zero: 
   0x00000000004011a0 <+0>:	sub    $0x18,%rsp
=> 0x00000000004011a4 <+4>:	mov    (%rsi),%rax
   0x00000000004011a7 <+7>:	test   %rsi,%rsi
   0x00000000004011aa <+10>:	je     0x401051 <repro.cold>
   0x00000000004011b0 <+16>:	test   %rax,%rax
   0x00000000004011b3 <+19>:	jne    0x401067 <repro.cold+22>
   0x00000000004011b9 <+25>:	add    $0x18,%rsp
   0x00000000004011bd <+29>:	ret

Here is the output of gcc -v. I'm using the 11.2.0 Docker Hub image.

    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-linux-gnu/11.2.0/lto-wrapper
    Target: x86_64-linux-gnu
    Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++,fortran,go
    Thread model: posix
    Supported LTO compression algorithms: zlib
    gcc version 11.2.0 (GCC)
Comment 1 Andrew Pinski 2021-08-12 00:21:08 UTC
Works for me on the trunk:
repro:
.LFB9:
        .cfi_startproc
        subq    $24, %rsp
        testq   %rsi, %rsi
        je      .L14
        movq    (%rsi), %rax
        testq   %rax, %rax
        jne     .L15
.L10:
        addq    $24, %rsp
...
.L14:
        xorl    %edi, %edi
        call    rb_check_type.isra.0
        movq    0, %rax
        jmp     .L10
.L15:
        movq    %rsi, %rdi
        movq    %rax, 8(%rsp)
        call    rb_check_type.isra.0
        movq    8(%rsp), %rax
        jmp     .L10
Comment 2 Andrew Pinski 2021-08-12 00:36:10 UTC
(In reply to Andrew Pinski from comment #1)
> Works for me on the trunk:

I almost want to say this was fixed by PR 101373.

Before Pre we had:
  if (hash_6(D) == 0)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  rb_check_type.isra (0);
  goto <bb 6>; [0.00%]

  <bb 4> [local count: 1073741824]:
  _1 = (long int *) hash_6(D);
  _2 = *_1;
  if (_2 != 0)
    goto <bb 5>; [0.00%]
  else
    goto <bb 6>; [100.00%]

  <bb 5> [count: 0]:
  rb_check_type.isra (hash_6(D));

  <bb 6> [local count: 1073741824]:
  _3 = (long int *) hash_6(D);
  _4 = *_3;

Pre is able to figure that _3 and _1 are the same and even *_3 and *_1 would be the same except rb_check_type.isra (hash_6(D)) can do a noreturn depending on the argument even though it is a "pure" function otherwise.
Comment 3 Martin Liška 2021-08-12 08:25:33 UTC
(In reply to Andrew Pinski from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > Works for me on the trunk:
> 
> I almost want to say this was fixed by PR 101373.

Yes, I can confirm that it was fixed with r12-2254-gfedcf3c476aff753.
@Richi: Can we close it as dup?
Comment 4 Andrew Pinski 2021-08-12 15:09:12 UTC
(In reply to Martin Liška from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > Works for me on the trunk:
> > 
> > I almost want to say this was fixed by PR 101373.
> 
> Yes, I can confirm that it was fixed with r12-2254-gfedcf3c476aff753.
> @Richi: Can we close it as dup?

We should at least put this as a testcase. I suspect it is a regression from when -fcode-hoisting was added to GCC too.
Comment 5 Richard Biener 2021-08-16 07:32:15 UTC
(In reply to Martin Liška from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > Works for me on the trunk:
> > 
> > I almost want to say this was fixed by PR 101373.
> 
> Yes, I can confirm that it was fixed with r12-2254-gfedcf3c476aff753.
> @Richi: Can we close it as dup?

Yes, can you add the testcase?
Comment 6 Martin Liška 2021-08-16 07:57:09 UTC
> Yes, can you add the testcase?

Sure.
Comment 7 Marek Polacek 2021-08-16 21:18:30 UTC
Is there any chance that this fix could be backported to 11 or is it too risky?
Comment 8 Richard Biener 2021-08-17 06:24:07 UTC
(In reply to Marek Polacek from comment #7)
> Is there any chance that this fix could be backported to 11 or is it too
> risky?

To fix this bug it should be enough to backport the following part:

* tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
references when the BB may not return.

I'll check and do that.
Comment 9 Richard Biener 2021-08-17 06:41:47 UTC
I also have a testcase for the testsuite.
Comment 10 GCC Commits 2021-08-17 09:21:52 UTC
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0

commit r11-8875-gee875b63b22e30a0dcb4b05f7532c2c416ba6cd0
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Aug 17 08:38:35 2021 +0200

    tree-optimization/101868 - avoid PRE of trapping mems across calls
    
    This backports a fix for the omission of a check of trapping mems
    when hoisting them across calls that might not return.  This was
    originally done as part of a fix to handle const functions that throw
    properly.
    
    2021-08-17  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/101373
            PR tree-optimization/101868
            * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
            references when the BB may not return.
    
            * gcc.dg/lto/pr101868_0.c: New testcase.
            * gcc.dg/lto/pr101868_1.c: Likewise.
            * gcc.dg/lto/pr101868_2.c: Likewise.
            * gcc.dg/lto/pr101868_3.c: Likewise.
Comment 11 GCC Commits 2021-08-17 09:24:27 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:3ed779689631ff8f398dcde06d5efa2a3c43ef27

commit r12-2943-g3ed779689631ff8f398dcde06d5efa2a3c43ef27
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Aug 17 11:23:06 2021 +0200

    tree-optimization/101868 - avoid PRE of trapping mems across calls
    
    This adds the testcase from the fix for the PR.
    
    2021-08-17  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/101868
            * gcc.dg/lto/pr101868_0.c: New testcase.
            * gcc.dg/lto/pr101868_1.c: Likewise.
            * gcc.dg/lto/pr101868_2.c: Likewise.
            * gcc.dg/lto/pr101868_3.c: Likewise.
Comment 12 GCC Commits 2021-10-13 11:09:01 UTC
The releases/gcc-10 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:95a95ec274cd0ec125ce48ab002fad4e400e345b

commit r10-10206-g95a95ec274cd0ec125ce48ab002fad4e400e345b
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Aug 17 08:38:35 2021 +0200

    tree-optimization/101868 - avoid PRE of trapping mems across calls
    
    This backports a fix for the omission of a check of trapping mems
    when hoisting them across calls that might not return.  This was
    originally done as part of a fix to handle const functions that throw
    properly.
    
    2021-08-17  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/101373
            PR tree-optimization/101868
            * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
            references when the BB may not return.
    
            * gcc.dg/lto/pr101868_0.c: New testcase.
            * gcc.dg/lto/pr101868_1.c: Likewise.
            * gcc.dg/lto/pr101868_2.c: Likewise.
            * gcc.dg/lto/pr101868_3.c: Likewise.
    
    (cherry picked from commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0)
Comment 13 GCC Commits 2021-11-08 14:07:41 UTC
The releases/gcc-9 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:2498de689b735422ef71d93e2afe7ae3e6988bb3

commit r9-9818-g2498de689b735422ef71d93e2afe7ae3e6988bb3
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Aug 17 08:38:35 2021 +0200

    tree-optimization/101868 - avoid PRE of trapping mems across calls
    
    This backports a fix for the omission of a check of trapping mems
    when hoisting them across calls that might not return.  This was
    originally done as part of a fix to handle const functions that throw
    properly.
    
    2021-08-17  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/101373
            PR tree-optimization/101868
            * tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
            references when the BB may not return.
    
            * gcc.dg/lto/pr101868_0.c: New testcase.
            * gcc.dg/lto/pr101868_1.c: Likewise.
            * gcc.dg/lto/pr101868_2.c: Likewise.
            * gcc.dg/lto/pr101868_3.c: Likewise.
    
    (cherry picked from commit ee875b63b22e30a0dcb4b05f7532c2c416ba6cd0)
Comment 14 Richard Biener 2021-11-08 14:08:18 UTC
Fixed.
Comment 15 GCC Commits 2022-06-24 20:25:56 UTC
The master branch has been updated by Dimitar Dimitrov <dimitar@gcc.gnu.org>:

https://gcc.gnu.org/g:b1d0d3520e96802dee37e8fc1c56e19c13d598b1

commit r13-1257-gb1d0d3520e96802dee37e8fc1c56e19c13d598b1
Author: Dimitar Dimitrov <dimitar@dinux.eu>
Date:   Sun May 15 17:30:52 2022 +0300

    testsuite: Remove reliance on argc in lto/pr101868_0.c
    
    Some embedded targets do not pass any argv arguments.  When argc is
    zero, this causes spurious failures for lto/pr101868_0.c.  Fix by
    following the strategy in r0-114701-g2c49569ecea56d.  Use a volatile
    variable instead of argc to inject a runtime value into the test.
    
    I validated the following:
      - No changes in testresults for x86_64-pc-linux-gnu.
      - The spurious failures are fixed for PRU target.
      - lto/pr101868_0.c still fails on x86_64-pc-linux-gnu, if
        the PR/101868 fix (r12-2254-gfedcf3c476aff7) is reverted.
    
            PR tree-optimization/101868
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/lto/pr101868_0.c (zero): New volatile variable.
            (main): Use it instead of argc.
    
    Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>