Bug 91386 - [9 regression] open-iscsi iscsiadm miscompiled by LTO on aarch64
Summary: [9 regression] open-iscsi iscsiadm miscompiled by LTO on aarch64
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.1.1
: P3 normal
Target Milestone: 9.3
Assignee: Richard Earnshaw
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2019-08-07 10:31 UTC by Andreas Schwab
Modified: 2019-08-19 16:14 UTC (History)
3 users (show)

See Also:
Host:
Target: aarch64-*-*
Build:
Known to work: 10.0
Known to fail: 9.1.0
Last reconfirmed: 2019-08-07 00:00:00


Attachments
Preprocessed sources with Makefile, part 1 (1000.16 KB, application/octet-stream)
2019-08-07 10:32 UTC, Andreas Schwab
Details
Preprocessed sources with Makefile, part 2 (1000.16 KB, application/octet-stream)
2019-08-07 10:33 UTC, Andreas Schwab
Details
Preprocessed sources with Makefile, part 3 (359.94 KB, application/octet-stream)
2019-08-07 10:34 UTC, Andreas Schwab
Details
candidate patch (431 bytes, patch)
2019-08-07 15:58 UTC, Richard Earnshaw
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schwab 2019-08-07 10:31:04 UTC
This is the start of main from iscsiadm:

    8fe0:       d13003ff        sub     sp, sp, #0xc00
    8fe4:       a9007bfd        stp     x29, x30, [sp]
    8fe8:       910003fd        mov     x29, sp
    8fec:       a9025bf5        stp     x21, x22, [sp, #32]
    8ff0:       2a0003f6        mov     w22, w0
    8ff4:       f0000420        adrp    x0, 8f000 <iscsi_ifaces_get@plt+0x86030>
    8ff8:       a90153f3        stp     x19, x20, [sp, #16]
    8ffc:       f9479800        ldr     x0, [x0, #3888]
    9000:       a90363f7        stp     x23, x24, [sp, #48]
    9004:       d10983ff        sub     sp, sp, #0x260
    9008:       aa0103f7        mov     x23, x1
    900c:       f9400001        ldr     x1, [x0]
    9010:       f9072fe1        str     x1, [sp, #3672]
    9014:       d2800001        mov     x1, #0x0                        // #0
    9018:       b90397ff        str     wzr, [sp, #916]
    901c:       b9039bff        str     wzr, [sp, #920]
    9020:       f901d7ff        str     xzr, [sp, #936]
    9024:       f901dbff        str     xzr, [sp, #944]
    9028:       97fffd12        bl      8470 <iscsi_context_new@plt>
    902c:       f9017fe0        str     x0, [sp, #760]
    9030:       b4007540        cbz     x0, 9ed8 <iscsi_ifaces_get@plt+0xf08>
    9034:       f90153f9        str     x25, [sp, #672]
    9038:       b0000021        adrp    x1, e000 <iscsi_ifaces_get@plt+0x5030>
    903c:       f90157fa        str     x26, [sp, #680]
    9040:       910d0021        add     x1, x1, #0x340
    9044:       f9015bfb        str     x27, [sp, #688]
    9048:       911263e0        add     x0, sp, #0x498
    904c:       f900011c        str     x28, [x8]

The last insn is part of the prologue to save register x28, but x8 is never set.

This breaks during peephole2:

(insn/f:TI 7430 7429 7271 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 696 [0x2b8])) [69  S8 A8])
        (reg:DI 28 x28)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 28 x28)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 696 [0x2b8])) [69  S8 A8])
                (reg:DI 28 x28))
            (nil))))

is transformed into:

(insn/f:TI 7430 7429 7842 4 (set (mem/c:DI (reg:DI 8 x8) [69  S8 A8])
        (reg:DI 28 x28)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 28 x28)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 696 [0x2b8])) [69  S8 A8])
                (reg:DI 28 x28))
            (nil))))
Comment 1 Andreas Schwab 2019-08-07 10:32:43 UTC
Created attachment 46681 [details]
Preprocessed sources with Makefile, part 1
Comment 2 Andreas Schwab 2019-08-07 10:33:54 UTC
Created attachment 46682 [details]
Preprocessed sources with Makefile, part 2
Comment 3 Andreas Schwab 2019-08-07 10:34:17 UTC
Created attachment 46683 [details]
Preprocessed sources with Makefile, part 3
Comment 4 Richard Earnshaw 2019-08-07 10:36:19 UTC
R8 is the register used for the address of the return value location when the result cannot be stored in registers.  Are you sure that this isn't a problem in the caller?
Comment 5 Richard Earnshaw 2019-08-07 10:38:07 UTC
(In reply to Richard Earnshaw from comment #4)
> R8 is the register used for the address of the return value location when
> the result cannot be stored in registers.  Are you sure that this isn't a
> problem in the caller?

Scratch that, can't be the case in main.
Comment 6 Richard Biener 2019-08-07 11:03:39 UTC
So I can't reproduce with a cross easily (w/o a libc I can only
do a partial link).  Nevertheless I see some

      58:       910e63e6        add     x6, sp, #0x398
...
      90:       a90008c2        stp     x2, x2, [x6]

which is probably what was intended.
Comment 7 Martin Liška 2019-08-07 11:04:48 UTC
I'm reducing the LTO files that are needed to expose the problem..
Comment 8 Richard Biener 2019-08-07 11:06:08 UTC
So if one can reproduce a way for a smaller testcase (likely only for trunk then) is to -fdump-tree-optimized-gimple and make a GIMPLE FE testcase from main()
(adding relevant typedefs from the preprocessed sources).  It may not
necessarily reproduce the issue of course...
Comment 9 Martin Liška 2019-08-07 11:25:12 UTC
I ended up with 5 files that are needed for the issue and my backtrace is then:

│0xaaaaaaab3930 <main>                           sub    sp, sp, #0xbc0
│0xaaaaaaab3934 <main+4>                         stp    x29, x30, [sp]
│0xaaaaaaab3938 <main+8>                         mov    x29, sp
│0xaaaaaaab393c <main+12>                        stp    x19, x20, [sp, #16]
│0xaaaaaaab3940 <main+16>                        stp    x23, x24, [sp, #48]
│0xaaaaaaab3944 <main+20>                        sub    sp, sp, #0x260
│0xaaaaaaab3948 <main+24>                        add    x9, sp, #0x390
│0xaaaaaaab394c <main+28>                        mov    w23, w0
│0xaaaaaaab3950 <main+32>                        mov    w0, #0xffffffff                 // #-1
│0xaaaaaaab3954 <main+36>                        mov    x24, x1
│0xaaaaaaab3958 <main+40>                        str    xzr, [sp, #928]
│0xaaaaaaab395c <main+44>                        stp    wzr, w0, [x9, #-8]
│0xaaaaaaab3960 <main+48>                        stp    wzr, w0, [x9]
│0xaaaaaaab3964 <main+52>                        str    xzr, [sp, #936]
│0xaaaaaaab3968 <main+56>                        bl     0xaaaaaaab2d20 <iscsi_context_new@plt>
│0xaaaaaaab396c <main+60>                        str    x0, [sp, #752]
│0xaaaaaaab3970 <main+64>                        cbz    x0, 0xaaaaaaab46f0 <main+3520>
│0xaaaaaaab3974 <main+68>                        str    x21, [sp, #640]
│0xaaaaaaab3978 <main+72>                        adrp   x1, 0xaaaaaaac4000 <__for_each_matched_rec+48>
│0xaaaaaaab397c <main+76>                        str    x22, [sp, #648]
│0xaaaaaaab3980 <main+80>                        add    x1, x1, #0x6c8
│0xaaaaaaab3984 <main+84>                        str    x25, [sp, #672]
│0xaaaaaaab3988 <main+88>                        add    x0, sp, #0x460
│0xaaaaaaab398c <main+92>                        str    x26, [sp, #680]
│0xaaaaaaab3990 <main+96>                        add    x8, sp, #0x3b8
│0xaaaaaaab3994 <main+100>                       str    x27, [sp, #688]
│0xaaaaaaab3998 <main+104>                       add    x3, sp, #0x3c8
>│0xaaaaaaab399c <main+108>                       str    x28, [x7]

Segfault happens at the last instruction.
Comment 10 Martin Liška 2019-08-07 11:53:59 UTC
I'm attaching all tree and rtl dumps for the problematic LTRANS unit:
https://drive.google.com/file/d/1CW4cWvpm1VVXFIP80XCf1IzYXWwTsynZ/view?usp=sharing

I can confirm what Andreas sees:

(note 8303 8031 7890 4 NOTE_INSN_PROLOGUE_END)
(insn:TI 7890 8303 7889 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 704 [0x2c0])) [63 %sfp+-2912 S8 A64])
        (reg/f:DI 2 x2 [1833])) "../include/list.h":29:13 47 {*movdi_aarch64}
     (nil))
(insn 7889 7890 7887 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 712 [0x2c8])) [63 %sfp+-2904 S8 A64])
        (reg/f:DI 3 x3 [1834])) "../include/list.h":29:13 47 {*movdi_aarch64}
     (nil))

gets transformed by peephole2 to:
(note 8303 8031 8556 4 NOTE_INSN_PROLOGUE_END)
(insn 8556 8303 7887 4 (parallel [
            (set (mem/c:DI (plus:DI (reg:DI 7 x7)
                        (const_int 8 [0x8])) [63 %sfp+-2912 S8 A64])
                (reg/f:DI 2 x2 [1833]))
            (set (mem/c:DI (plus:DI (reg:DI 7 x7)
                        (const_int 16 [0x10])) [63 %sfp+-2904 S8 A64])
                (reg/f:DI 3 x3 [1834]))
        ]) "../include/list.h":29:13 -1
     (nil))

I don't know what's the meaning of x7 register, but it's not set during main function execution.
Comment 11 Martin Liška 2019-08-07 12:07:50 UTC
And I can also verify that adding -fno-peephole -fno-peephole2 to CFLAGS helps to resolve the issue.
Comment 12 Wilco 2019-08-07 13:26:20 UTC
(In reply to Martin Liška from comment #10)
> I'm attaching all tree and rtl dumps for the problematic LTRANS unit:
> https://drive.google.com/file/d/1CW4cWvpm1VVXFIP80XCf1IzYXWwTsynZ/
> view?usp=sharing
> 
> I can confirm what Andreas sees:
> 
> (note 8303 8031 7890 4 NOTE_INSN_PROLOGUE_END)
> (insn:TI 7890 8303 7889 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
>                 (const_int 704 [0x2c0])) [63 %sfp+-2912 S8 A64])
>         (reg/f:DI 2 x2 [1833])) "../include/list.h":29:13 47 {*movdi_aarch64}
>      (nil))
> (insn 7889 7890 7887 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
>                 (const_int 712 [0x2c8])) [63 %sfp+-2904 S8 A64])
>         (reg/f:DI 3 x3 [1834])) "../include/list.h":29:13 47 {*movdi_aarch64}
>      (nil))
> 
> gets transformed by peephole2 to:
> (note 8303 8031 8556 4 NOTE_INSN_PROLOGUE_END)
> (insn 8556 8303 7887 4 (parallel [
>             (set (mem/c:DI (plus:DI (reg:DI 7 x7)
>                         (const_int 8 [0x8])) [63 %sfp+-2912 S8 A64])
>                 (reg/f:DI 2 x2 [1833]))
>             (set (mem/c:DI (plus:DI (reg:DI 7 x7)
>                         (const_int 16 [0x10])) [63 %sfp+-2904 S8 A64])
>                 (reg/f:DI 3 x3 [1834]))
>         ]) "../include/list.h":29:13 -1
>      (nil))
> 
> I don't know what's the meaning of x7 register, but it's not set during main
> function execution.

Basically the backend can combine up to 4 loads or stores and insert an addressing instruction so that all offsets are in range (aarch64_gen_adjusted_ldpstp). It appears a later optimization removes the initialization of x7 as dead and so the address is wrong.

The key question is how does one dump rtl with -flto? It doesn't work at all, making debugging this difficult...
Comment 13 Martin Liška 2019-08-07 13:29:21 UTC
> 
> The key question is how does one dump rtl with -flto? It doesn't work at
> all, making debugging this difficult...

It does, look:

marxin@marxinbox:/tmp> gcc -c main.c -flto
marxin@marxinbox:/tmp> gcc main.o -flto -o xxxxxxxx
marxin@marxinbox:/tmp> gcc main.o -flto -o xxxxxxxx -fdump-rtl-all
marxin@marxinbox:/tmp> ls -l xxxxxxx*
-rwxr-xr-x 1 marxin users 20544 Aug  7 15:28 xxxxxxxx
-rw-r--r-- 1 marxin users  1454 Aug  7 15:28 xxxxxxxx.ltrans0.233r.expand
-rw-r--r-- 1 marxin users  1041 Aug  7 15:28 xxxxxxxx.ltrans0.234r.vregs
-rw-r--r-- 1 marxin users  1042 Aug  7 15:28 xxxxxxxx.ltrans0.235r.into_cfglayout
-rw-r--r-- 1 marxin users  1997 Aug  7 15:28 xxxxxxxx.ltrans0.236r.jump
-rw-r--r-- 1 marxin users  1010 Aug  7 15:28 xxxxxxxx.ltrans0.248r.reginfo
-rw-r--r-- 1 marxin users  1069 Aug  7 15:28 xxxxxxxx.ltrans0.269r.outof_cfglayout
-rw-r--r-- 1 marxin users  1041 Aug  7 15:28 xxxxxxxx.ltrans0.270r.split1
-rw-r--r-- 1 marxin users  2389 Aug  7 15:28 xxxxxxxx.ltrans0.272r.dfinit
-rw-r--r-- 1 marxin users  2471 Aug  7 15:28 xxxxxxxx.ltrans0.273r.mode_sw
-rw-r--r-- 1 marxin users  2389 Aug  7 15:28 xxxxxxxx.ltrans0.274r.asmcons
-rw-r--r-- 1 marxin users  4419 Aug  7 15:28 xxxxxxxx.ltrans0.279r.ira
-rw-r--r-- 1 marxin users  4519 Aug  7 15:28 xxxxxxxx.ltrans0.280r.reload
-rw-r--r-- 1 marxin users  2158 Aug  7 15:28 xxxxxxxx.ltrans0.282r.postreload_jump
-rw-r--r-- 1 marxin users  2126 Aug  7 15:28 xxxxxxxx.ltrans0.285r.split2
-rw-r--r-- 1 marxin users  3635 Aug  7 15:28 xxxxxxxx.ltrans0.289r.pro_and_epilogue
-rw-r--r-- 1 marxin users  3178 Aug  7 15:28 xxxxxxxx.ltrans0.292r.jump2
-rw-r--r-- 1 marxin users  3228 Aug  7 15:28 xxxxxxxx.ltrans0.305r.stack
-rw-r--r-- 1 marxin users  3146 Aug  7 15:28 xxxxxxxx.ltrans0.306r.alignments
-rw-r--r-- 1 marxin users  3146 Aug  7 15:28 xxxxxxxx.ltrans0.308r.mach
-rw-r--r-- 1 marxin users  3146 Aug  7 15:28 xxxxxxxx.ltrans0.309r.barriers
-rw-r--r-- 1 marxin users  3257 Aug  7 15:28 xxxxxxxx.ltrans0.314r.shorten
-rw-r--r-- 1 marxin users  3257 Aug  7 15:28 xxxxxxxx.ltrans0.315r.nothrow
-rw-r--r-- 1 marxin users  3847 Aug  7 15:28 xxxxxxxx.ltrans0.316r.dwarf2
-rw-r--r-- 1 marxin users  3257 Aug  7 15:28 xxxxxxxx.ltrans0.317r.final
-rw-r--r-- 1 marxin users  1960 Aug  7 15:28 xxxxxxxx.ltrans0.318r.dfinish
Comment 14 Wilco 2019-08-07 13:35:16 UTC
(In reply to Martin Liška from comment #13)
> > 
> > The key question is how does one dump rtl with -flto? It doesn't work at
> > all, making debugging this difficult...
> 
> It does, look:
> 
> marxin@marxinbox:/tmp> gcc -c main.c -flto
> marxin@marxinbox:/tmp> gcc main.o -flto -o xxxxxxxx

Ah right, so you need an explicit -o<file>, otherwise it doesn't write the dump files in the current directory...
Comment 15 Richard Earnshaw 2019-08-07 14:34:49 UTC
From looking at the dumps it would appear that one of the STP generating peepholes might have bailed out, but that some of the changes have not been undone.

From the pass before, we have:

(insn/f:TI 8028 8027 8029 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 672 [0x2a0])) [17  S8 A8])
        (reg:DI 25 x25)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 25 x25)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 672 [0x2a0])) [17  S8 A8])
                (reg:DI 25 x25))
            (nil))))
(insn/f 8029 8028 8030 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 680 [0x2a8])) [17  S8 A8])
        (reg:DI 26 x26)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 26 x26)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 680 [0x2a8])) [17  S8 A8])
                (reg:DI 26 x26))
            (nil))))
(insn/f:TI 8030 8029 8031 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 688 [0x2b0])) [17  S8 A8])
        (reg:DI 27 x27)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 27 x27)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 688 [0x2b0])) [17  S8 A8])
                (reg:DI 27 x27))
            (nil))))
(insn/f 8031 8030 8303 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 696 [0x2b8])) [17  S8 A8])
        (reg:DI 28 x28)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 28 x28)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 696 [0x2b8])) [17  S8 A8])
                (reg:DI 28 x28))

Which looks like a perfect candidate for replacing with two STP instructions.

After peepholing, we have:

(insn/f:TI 8028 8027 8029 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 672 [0x2a0])) [17  S8 A8])
        (reg:DI 25 x25)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 25 x25)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 672 [0x2a0])) [17  S8 A8])
                (reg:DI 25 x25))
            (nil))))
(insn/f 8029 8028 8030 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 680 [0x2a8])) [17  S8 A8])
        (reg:DI 26 x26)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 26 x26)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 680 [0x2a8])) [17  S8 A8])
                (reg:DI 26 x26))
            (nil))))
(insn/f:TI 8030 8029 8031 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 688 [0x2b0])) [17  S8 A8])
        (reg:DI 27 x27)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 27 x27)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 688 [0x2b0])) [17  S8 A8])
                (reg:DI 27 x27))
            (nil))))
(insn/f 8031 8030 8303 4 (set (mem/c:DI (reg:DI 7 x7) [17  S8 A8])
        (reg:DI 28 x28)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 28 x28)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 696 [0x2b8])) [17  S8 A8])
                (reg:DI 28 x28))
            (nil))))

And we can see that the last insn has been modified.  There's nothing earlier in the log to suggest that there was any substitution here.

My suspicion is this hunk:

  replace_equiv_address_nv (mem_1, plus_constant (Pmode, operands[8],
						  new_off_1), true);
  replace_equiv_address_nv (mem_2, plus_constant (Pmode, operands[8],
						  new_off_1 + msize), true);
  replace_equiv_address_nv (mem_3, plus_constant (Pmode, operands[8],
						  new_off_3), true);
  replace_equiv_address_nv (mem_4, plus_constant (Pmode, operands[8],
						  new_off_3 + msize), true);

  if (!aarch64_mem_pair_operand (mem_1, mode)
      || !aarch64_mem_pair_operand (mem_3, mode))
    return false;

Is somehow modifying the insn in-place, but then the test is failing somehow.
Comment 16 Richard Earnshaw 2019-08-07 15:44:45 UTC
I had the wrong set of insns previously. The problem cases are:

(insn/f 8031 8030 8303 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 696 [0x2b8])) [17  S8 A8])
        (reg:DI 28 x28)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 28 x28)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 696 [0x2b8])) [17  S8 A8])
                (reg:DI 28 x28))
            (nil))))
(note 8303 8031 7890 4 NOTE_INSN_PROLOGUE_END)
(insn:TI 7890 8303 7889 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 704 [0x2c0])) [63 %sfp+-2912 S8 A64])
        (reg/f:DI 2 x2 [1833])) "../include/list.h":29:13 47 {*movdi_aarch64}
     (nil))
(insn 7889 7890 7887 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 712 [0x2c8])) [63 %sfp+-2904 S8 A64])
        (reg/f:DI 3 x3 [1834])) "../include/list.h":29:13 47 {*movdi_aarch64}
     (nil))
(insn:TI 7887 7889 229 4 (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                (const_int 720 [0x2d0])) [63 %sfp+-2896 S8 A64])
        (reg/f:DI 4 x4 [1838])) "log.c":422:11 47 {*movdi_aarch64}
     (nil))

Which after peephole2 is transformed to:

(insn/f 8031 8030 8303 4 (set (mem/c:DI (reg:DI 7 x7) [17  S8 A8])
        (reg:DI 28 x28)) 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 28 x28)
        (expr_list:REG_CFA_OFFSET (set (mem/c:DI (plus:DI (reg/f:DI 31 sp)
                        (const_int 696 [0x2b8])) [17  S8 A8])
                (reg:DI 28 x28))
            (nil))))
(note 8303 8031 8556 4 NOTE_INSN_PROLOGUE_END)
(insn 8556 8303 7887 4 (parallel [
            (set (mem/c:DI (plus:DI (reg:DI 7 x7)
                        (const_int 8 [0x8])) [63 %sfp+-2912 S8 A64])
                (reg/f:DI 2 x2 [1833]))
            (set (mem/c:DI (plus:DI (reg:DI 7 x7)
                        (const_int 16 [0x10])) [63 %sfp+-2904 S8 A64])
                (reg/f:DI 3 x3 [1834]))
        ]) "../include/list.h":29:13 -1
     (nil))
(insn:TI 7887 8556 8557 4 (set (mem/c:DI (plus:DI (reg:DI 7 x7)
                (const_int 24 [0x18])) [63 %sfp+-2896 S8 A64])
        (reg/f:DI 4 x4 [1838])) "log.c":422:11 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg/f:DI 4 x4 [1838])
        (nil)))

In this case, it looks like one of the 4-store peepholes has been partially applied, and then failed for some reason, but the modifications have not been unwound.  (there is then a second peephole that does match the second and third stores which confuses things a bit).

The reason for the non-match is that one of the insns has a frame-related note set on it, but the peephole produces three output isns and we can't handle that.

I think the fix is to make a full copy of the mems before doing the adjust; then if the change is unwound we haven't modified the original insns by mistake.
Comment 17 Richard Earnshaw 2019-08-07 15:58:55 UTC
Created attachment 46686 [details]
candidate patch

Could you try this patch please?  So far only very lightly tested.
Comment 18 Martin Liška 2019-08-08 07:46:10 UTC
(In reply to Richard Earnshaw from comment #17)
> Created attachment 46686 [details]
> candidate patch
> 
> Could you try this patch please?  So far only very lightly tested.

Sure, I'll test the problematic package build with your patch.
Comment 19 Martin Liška 2019-08-08 09:18:15 UTC
Good I can confirm the patch works for the package!
Comment 20 Richard Earnshaw 2019-08-09 16:15:30 UTC
Author: rearnsha
Date: Fri Aug  9 16:14:59 2019
New Revision: 274238

URL: https://gcc.gnu.org/viewcvs?rev=274238&root=gcc&view=rev
Log:
[aarch64] PR target/91386 Use copy_rtx to avoid modifying original insns in peep2 pattern

PR target/91386 is a situation where a peephole2 pattern substitution
is discarded late because the selected instructions contain
frame-related notes that we cannot redistribute (because the pattern
has more than one insn in the output).  Unfortunately, the original
insns were being modified during the generation, so after the undo we
are left with corrupt RTL.

We avoid this by ensuring that the modifications are always made on a
copy, so that the original insns are never changed.

	PR target/91386
	* config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Use copy_rtx
	to preserve the contents of the original insns.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/aarch64/aarch64.c
Comment 21 Richard Earnshaw 2019-08-09 16:16:46 UTC
Fixed on trunk.
Comment 22 Wilco 2019-08-12 11:21:19 UTC
(In reply to Richard Earnshaw from comment #21)
> Fixed on trunk.

I ran an AArch64 bootstrap on GCC9 branch and that is fine.
Comment 23 Richard Earnshaw 2019-08-19 16:12:02 UTC
Author: rearnsha
Date: Mon Aug 19 16:11:30 2019
New Revision: 274675

URL: https://gcc.gnu.org/viewcvs?rev=274675&root=gcc&view=rev
Log:
[aarch64] PR target/91386 Use copy_rtx to avoid modifying original insns in peep2 pattern

PR target/91386 is a situation where a peephole2 pattern substitution
is discarded late because the selected instructions contain
frame-related notes that we cannot redistribute (because the pattern
has more than one insn in the output).  Unfortunately, the original
insns were being modified during the generation, so after the undo we
are left with corrupt RTL.

We avoid this by ensuring that the modifications are always made on a
copy, so that the original insns are never changed.

	Backport from mainline
	2019-09-09  Richard Earnshaw  <rearnsha@arm.com>

	PR target/91386
        * config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Use copy_rtx
        to preserve the contents of the original insns.


Modified:
    branches/gcc-9-branch/gcc/ChangeLog
    branches/gcc-9-branch/gcc/config/aarch64/aarch64.c
Comment 24 Richard Earnshaw 2019-08-19 16:14:48 UTC
Fixed for gcc-9.3.