Bug 84169 - [8 Regression] wrong code with u128 multiplication result at aarch64 -O
Summary: [8 Regression] wrong code with u128 multiplication result at aarch64 -O
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 8.0
: P1 normal
Target Milestone: 8.0
Assignee: Segher Boessenkool
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2018-02-01 17:09 UTC by Zdenek Sojka
Modified: 2020-04-07 19:31 UTC (History)
2 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: aarch64-unknown-linux-gnu
Build:
Known to work: 7.3.1
Known to fail: 8.0
Last reconfirmed: 2018-02-01 00:00:00


Attachments
reduced testcase (209 bytes, text/plain)
2018-02-01 17:09 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2018-02-01 17:09:09 UTC
Created attachment 43319 [details]
reduced testcase

Output:
$ aarch64-unknown-linux-gnu-gcc testcase.c -static -O
$ ./a.out 
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

The most-significant 64bit are zeros, but should be ones.

$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-257303-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/8.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-257303-checking-yes-rtl-df-extra-aarch64
Thread model: posix
gcc version 8.0.1 20180201 (experimental) (GCC)
Comment 1 ktkachov 2018-02-01 17:37:22 UTC
Confirmed
Comment 2 ktkachov 2018-02-01 18:27:30 UTC
I think combine is at fault here. The faulty assembly is:
foo:
        ldrb    w0, [sp, 16]
        neg     x0, x0, lsl 4
        csetm   x3, ne
<snip>

The csetm is supposed to set the top 64 bits of the TImode value (in x3) to ones, based on the CC content, but the CC-setting instruction has disappeared. The correct sequence (by hacking the assembly output and checking that it passes) would be:
        ldrb    w0, [sp, 16]
        negs     x0, x0, lsl 4 // NEG sets the condition code as well.
        csetm   x3, ne

There's a few complex combinations going on, but before combine we have (some RTL edited to tone down verbosity):

(insn 14 12 17 2 (set (reg:DI 87)
        (lshiftrt:DI (reg:DI 103)
            (const_int 60 [0x3c]))) "wrong.c":9 676 {*aarch64_lshr_sisd_or_int_di3}
(insn 19 17 20 2 (parallel [
            (set (reg:CC 66 cc)
                (compare:CC (reg:DI 85)
                    (reg:DI 105)))
            (set (reg:DI 90)
                (minus:DI (reg:DI 85)
                    (reg:DI 105)))
        ]) "wrong.c":9 270 {subdi3_compare1}

(insn 20 19 23 2 (set (reg:DI 91)
        (minus:DI (minus:DI (reg:DI 85)
                (reg:DI 87))
            (ltu:DI (reg:CC 66 cc)
                (const_int 0 [0])))) "wrong.c":9 327 {*subdi3_carryin}

where insn 19 sets the CC and insn 20 uses it. After combine we have:
(insn 14 12 17 2 (set (reg:DI 87)
        (lshiftrt:DI (reg:DI 103)
            (const_int 60 [0x3c]))) "wrong.c":9 676 {*aarch64_lshr_sisd_or_int_di3}
(note 19 17 20 2 NOTE_INSN_DELETED)
(insn 20 19 23 2 (set (reg:DI 91)
        (neg:DI (ne:DI (reg:CC 66 cc)
                (const_int 0 [0])))) "wrong.c":9 441 {cstoredi_neg}

with insn 20 still using the CC but the CC setting instruction being deleted.
There's a number of exciting things going on in combine so my analysis may not be entirely accurate, but I see things like:
Trying 14 -> 20:
   14: r87:DI=r103:DI 0>>0x3c
   20: r91:DI=r85:DI-r87:DI-ltu(cc:CC,0)
      REG_DEAD r87:DI
      REG_DEAD r85:DI
      REG_DEAD cc:CC
      REG_EQUAL -r87:DI-ltu(cc:CC,0)
Successfully matched this instruction:
(set (reg:DI 91)
    (neg:DI (ltu:DI (reg:CC 66 cc)
            (const_int 0 [0]))))

and

Trying 19 -> 26:
   19: cc:CC=cmp(r103:DI,0)
      REG_DEAD r103:DI
   26: {cc:CC_C=zero_extend(r90:DI)+zero_extend(r111:DI)!=zero_extend(r90:DI+r111:DI);r95:DI=r90:DI+r111:DI;}
      REG_DEAD r111:DI
      REG_DEAD r90:DI
Successfully matched this instruction:
(parallel [
        (set (reg:CC_C 66 cc)
            (ne:CC_C (plus:TI (zero_extend:TI (reg:DI 90))
                    (zero_extend:TI (reg:DI 111 [ b ])))
                (zero_extend:TI (plus:DI (reg:DI 90)
                        (reg:DI 111 [ b ])))))
        (set (reg:DI 95)
            (plus:DI (reg:DI 90)
                (reg:DI 111 [ b ])))
    ])
deferring deletion of insn with uid = 19.

So insn 19 ends up being combined into a subsequent instruction and ends up being deleted even though insn 20 depends on it's CC-setting effects
Comment 3 Segher Boessenkool 2018-02-02 15:53:48 UTC
Confirmed.  Does not happen with a compiler a few months old.
Comment 4 Segher Boessenkool 2018-02-02 19:14:17 UTC
This starts to fail with r249850 (don't know yet what the problem is).
Comment 5 Segher Boessenkool 2018-02-13 22:13:27 UTC
Author: segher
Date: Tue Feb 13 22:12:55 2018
New Revision: 257644

URL: https://gcc.gnu.org/viewcvs?rev=257644&root=gcc&view=rev
Log:
combine: Update links correctly for new I2 (PR84169)

If there is a LOG_LINK between two insns, this means those two insns
can be combined, as far as dataflow is concerned.  There never should
be a LOG_LINK between two unrelated insns.  If there is one, combine
will try to combine the insns without doing all the needed checks if
the earlier destination is used before the later insn, etc.

Unfortunately we do not update the LOG_LINKs correctly in some cases.
This patch fixes at least some of those cases.


	PR rtl-optimization/84169
	* combine.c (try_combine): New variable split_i2i3.  Set it to true if
	we generated a parallel as new i3 and we split that to new i2 and i3
	instructions.  Handle split_i2i3 similar to swap_i2i3: scan the
	LOG_LINKs of i3 to see which of those need to link to i2 now.  Link
	those to i2, not i1.  Partially rewrite this scan code.

gcc/testsuite/
	PR rtl-optimization/84169
	* gcc.c-torture/execute/pr84169.c: New.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr84169.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog
Comment 6 Jakub Jelinek 2018-02-13 22:32:56 UTC
Fixed.
Comment 7 Segher Boessenkool 2018-02-13 23:25:51 UTC
Yeah, not planning to do backports, although the core problem is ancient.
Comment 8 GCC Commits 2020-04-07 19:31:02 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:c23c899aedf11069e992eed7358802b262d62f98

commit r10-7607-gc23c899aedf11069e992eed7358802b262d62f98
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Apr 7 21:30:12 2020 +0200

    combine: Fix split_i2i3 ICE [PR94291]
    
    The following testcase ICEs on armv7hl-linux-gnueabi.
    try_combine is called on:
    (gdb) p debug_rtx (i3)
    (insn 20 12 22 2 (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
                    (const_int -4 [0xfffffffffffffffc])) [1 x+0 S4 A32])
            (reg:SI 125)) "pr94291.c":7:8 241 {*arm_movsi_insn}
         (expr_list:REG_DEAD (reg:SI 125)
            (nil)))
    (gdb) p debug_rtx (i2)
    (insn 12 7 20 2 (parallel [
                (set (reg:CC 100 cc)
                    (compare:CC (reg:SI 121 [ <retval> ])
                        (const_int 0 [0])))
                (set (reg:SI 125)
                    (reg:SI 121 [ <retval> ]))
            ]) "pr94291.c":7:8 248 {*movsi_compare0}
         (expr_list:REG_UNUSED (reg:CC 100 cc)
            (nil)))
    and tries to recognize cc = r121 cmp 0; [sfp-4] = r121 parallel,
    but that isn't recognized, so it splits it into two: split_i2i3
    [sfp-4] = r121 followed by cc = r121 cmp 0 which is recognized, but
    ICEs because the code below insist that the SET_DEST of newi2pat
    (or first set in PARALLEL thereof) must be a REG or SUBREG of REG,
    but it is a MEM in this case.  I don't see any condition that would
    guarantee that, perhaps for the swap_i2i3 case it was somehow guaranteed.
    
    As the code just wants to update LOG_LINKS and LOG_LINKS are only for
    registers, not for MEM or anything else, the patch just doesn't update those
    if it isn't a REG or SUBREG of REG.
    
    2020-04-07  Jakub Jelinek  <jakub@redhat.com>
    
            PR rtl-optimization/94291
            PR rtl-optimization/84169
            * combine.c (try_combine): For split_i2i3, don't assume SET_DEST
            must be a REG or SUBREG of REG; if it is not one of these, don't
            update LOG_LINKs.
    
            * gcc.dg/pr94291.c: New test.