Bug 90275 - [9 Regression] ICE: in insert_regs, at cse.c:1128 with -O2 -fno-dce -fno-tree-dce
Summary: [9 Regression] ICE: in insert_regs, at cse.c:1128 with -O2 -fno-dce -fno-tree...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 10.0
: P2 normal
Target Milestone: 9.5
Assignee: Jeffrey A. Law
URL:
Keywords: ice-on-valid-code
: 92388 93125 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-04-29 11:00 UTC by Zdenek Sojka
Modified: 2021-09-28 14:50 UTC (History)
6 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: armv7a-hardfloat-linux-gnueabi
Build:
Known to work:
Known to fail: 10.0, 8.3.1, 9.0
Last reconfirmed: 2019-12-12 00:00:00


Attachments
reduced testcase (128 bytes, text/plain)
2019-04-29 11:00 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2019-04-29 11:00:40 UTC
Created attachment 46259 [details]
reduced testcase

Compiler output:
$ armv7a-hardfloat-linux-gnueabi-gcc -O2 -fno-dce -fno-tree-dce testcase.c 
during RTL pass: cse_local
testcase.c: In function 'foo':
testcase.c:11:1: internal compiler error: in insert_regs, at cse.c:1129
   11 | }
      | ^
0x7d954d insert_regs
        /repo/gcc-trunk/gcc/cse.c:1129
0x16445a8 cse_insn
        /repo/gcc-trunk/gcc/cse.c:5980
0x164783b cse_extended_basic_block
        /repo/gcc-trunk/gcc/cse.c:6662
0x164783b cse_main
        /repo/gcc-trunk/gcc/cse.c:6841
0x164874d rest_of_handle_cse_after_global_opts
        /repo/gcc-trunk/gcc/cse.c:7820
0x164874d execute
        /repo/gcc-trunk/gcc/cse.c:7871
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


$ armv7a-hardfloat-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-armv7a-hardfloat/bin/armv7a-hardfloat-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-270639-checking-yes-rtl-df-extra-armv7a-hardfloat/bin/../libexec/gcc/armv7a-hardfloat-linux-gnueabi/10.0.0/lto-wrapper
Target: armv7a-hardfloat-linux-gnueabi
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-float=hard --with-fpu=vfpv4 --with-arch=armv7-a --with-sysroot=/usr/armv7a-hardfloat-linux-gnueabi --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=armv7a-hardfloat-linux-gnueabi --with-ld=/usr/bin/armv7a-hardfloat-linux-gnueabi-ld --with-as=/usr/bin/armv7a-hardfloat-linux-gnueabi-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-270639-checking-yes-rtl-df-extra-armv7a-hardfloat
Thread model: posix
gcc version 10.0.0 20190429 (experimental) (GCC)
Comment 1 David Binderman 2019-11-04 08:05:48 UTC
This C source code:

a, b, c;
long long d;
e() {
  char f;
  for (;;) {
    c = a = c ? 5 : 0;
    if (f) {
      b = a;
      f = d;
    }
    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
  }
}

when compiled by recent gcc trunk raspberry pi cross compiler
and compiler flag -O3, does something similar:

during RTL pass: cse_local
bug558.c: In function ‘e’:
bug558.c:13:1: internal compiler error: in insert_regs, at cse.c:1129
   13 | }
      | ^
0x77f215 insert_regs
	/home/dcb/gcc/trunk/gcc/cse.c:1129
0x160c923 cse_insn
	/home/dcb/gcc/trunk/gcc/cse.c:5956
0x160f164 cse_extended_basic_block
	/home/dcb/gcc/trunk/gcc/cse.c:6614
0x160f164 cse_main
	/home/dcb/gcc/trunk/gcc/cse.c:6793

$ /home/dcb/raspberrypi/results/bin/arm-linux-gnueabihf-gcc -v
Using built-in specs.
COLLECT_GCC=/home/dcb/raspberrypi/results/bin/arm-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=/home/dcb/raspberrypi/results/libexec/gcc/arm-linux-gnueabihf/10.0.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: /home/dcb/gcc/trunk/configure --prefix=/home/dcb/raspberrypi/results/ --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran --with-arch=armv6 --with-fpu=vfp --with-float=hard --disable-multilib --enable-checking=yes
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.0.0 20191103 (experimental) (GCC)
Comment 2 David Binderman 2019-12-12 12:21:08 UTC
Nothing has happened on this for over a month.

Who would be best placed to look deeper into this problem ?
Comment 3 Jakub Jelinek 2019-12-12 16:30:43 UTC
Hopefully less undefined testcase that still ICEs at -O3:
int a, b, c;
long long d;
typedef __UINTPTR_TYPE__ uintptr_t;

void
foo (void)
{
  char f = c;
  for (;;)
    {
      c = a = c ? 5 : 0;
      if (f)
	{
	  b = a;
	  f = d;
	}
      if ((d || b) >= ((uintptr_t) a > (uintptr_t) &c))
	(b ? 0 : f) || (d -= f);
    }
}
Comment 4 Jakub Jelinek 2019-12-12 16:48:25 UTC
I think this is related to the *movsi_compare0 ARM define_ins which prevents obvious cleanups, so we end up with:
(insn 97 90 98 24 (parallel [
            (set (reg:CC 100 cc)
                (compare:CC (reg:SI 131 [ d_lsm.22 ])
                    (const_int 0 [0])))
            (set (reg:SI 135)
                (reg:SI 131 [ d_lsm.22 ]))
        ]) "pr90275.c":18:20 248 {*movsi_compare0}
     (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.22 ])
        (nil)))
// unrelated insn that doesn't touch SI 131 or SI 135, but consumes CC register
(insn 154 98 155 24 (set (reg:SI 131 [ d_lsm.22 ])
        (reg:SI 135)) "pr90275.c":18:20 241 {*arm_movsi_insn}
     (expr_list:REG_DEAD (reg:SI 135)
        (nil)))
where CSE is unhappy about the pseudo being copied there and back.
Comment 5 Jakub Jelinek 2019-12-16 19:41:12 UTC
So, what I see that happens is that when processing that insn 97, insert_regs calls make_regs_eqv (135, 131) and as pseudo 131 is live at the end of the bb while pseudo 135 is not, 131 is selected as the canonical register for the equivalence.
5968		elt = insert (dest, sets[i].src_elt,
5969			      sets[i].dest_hash, GET_MODE (dest));
afterwards stores the table entry, but under the pseudo 135, such as lookup_for_remove (reg_135, HASH (reg_135), E_VOIDmode) is non-NULL and contains
in ->exp reg_135 and in ->first_same_value->exp reg_131, while
lookup_for_remove (reg_131, HASH (reg_131), E_VOIDmode) is NULL.
Later on we process the 131 = 135 assignment, canonicalize_insn canonicalizes that into 131 = 131 assignment (i.e. noop).
Later we invalidate_reg (reg_131) as the destination, which undoes the reg equivalency, but as lookup_for_remove (reg_131, HASH (reg_131), E_VOIDmode) used to be NULL, nothing is removed from the table.  And then insert_regs is called again, and ICEs, because
1128			  gcc_assert (REGNO_QTY_VALID_P (c_regno));
I'd think that invalidate_reg really should remove the traces of that pseudo from the tables, wonder e.g. if the remove_pseudo_from_table call in invalidate_reg couldn't be done before delete_reg_equiv and lookup_for_remove use exp_equiv_p.  It does use it already for the !REG_P case, but I believe it is never called with non-REG.
Comment 6 David Binderman 2020-01-24 19:02:18 UTC
I can confirm this is still going wrong in a raspberry pi
cross compiler dated 20200123.
Comment 7 Jeffrey A. Law 2020-01-30 21:30:05 UTC
*** Bug 93125 has been marked as a duplicate of this bug. ***
Comment 8 Jeffrey A. Law 2020-01-31 15:41:46 UTC
*** Bug 92388 has been marked as a duplicate of this bug. ***
Comment 9 Jeffrey A. Law 2020-02-01 02:47:02 UTC
FWIW, the failure seems to be related to having no-op sets in the IL.  Not sure why yet, but they're a consistent feature in every BZ where this ICE is triggering.
Comment 10 Jeffrey A. Law 2020-02-03 20:01:53 UTC
So the failure here is definitely related to the nop-moves in the IL.

In simplest terms cse_insn will invalidate the destination of the nop-set.  That sets is REG_QTY to a magic value that indicates its no longer valid.

Then we call insert_regs which is going to walk the value chain.  When that walk encounters the same reg in the value chain, but with an invalid REG_QTY we ICE.

The simplest solution here is to handle nop register moves in a manner similar to nop memory moves.  The only complication in a hunk of code that changes the source of a nop set to reference a different register from the value chain.  The idea here is to have their lifetimes abut rather than overlap.

I think we can just put the nop register handling right after that code which will resolve all these issues.
Comment 11 Jakub Jelinek 2020-03-04 09:46:24 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 12 Jeffrey A. Law 2020-03-12 22:14:15 UTC
This should be fixed on the trunk now.  The patch could be safely backported to the release branches if someone was interested.
Comment 13 Martin Liška 2020-03-13 09:11:15 UTC
commit r10-7149-g4aded535ea6ad7c362ab62d99af70e53c186d582
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Thu Mar 12 16:09:27 2020 -0600

    Remove no-op register to register copies in CSE just like we remove no-op memory to memory copies.
    
            PR rtl-optimization/90275
            * cse.c (cse_insn): Delete no-op register moves too.
    
            PR rtl-optimization/90275
            * gcc.c-torture/compile/pr90275.c: New test.
Comment 14 CVS Commits 2020-03-18 22:08:10 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:529ea7d9596b26ba103578eeab448e9862a2d2c5

commit r10-7268-g529ea7d9596b26ba103578eeab448e9862a2d2c5
Author: Jeff Law <law@redhat.com>
Date:   Wed Mar 18 16:07:28 2020 -0600

    Complete change to resolve pr90275.
    
            PR rtl-optimization/90275
            * cse.c (cse_insn): Delete no-op register moves too.
Comment 15 CVS Commits 2020-03-23 23:59:45 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:75fb811dfaa29d60a897924b0d1629577b90eee7

commit r10-7346-g75fb811dfaa29d60a897924b0d1629577b90eee7
Author: Jeff Law <law@redhat.com>
Date:   Mon Mar 23 17:55:20 2020 -0600

    Verify the code used for the optimized comparison is valid for the comparison's mode.
    
            PR rtl-optimization/90275
            PR target/94238
            PR target/94144
            * simplify-rtx.c (comparison_code_valid_for_mode): New function.
            (simplify_logical_relational_operation): Use it.
    
            PR target/94144
            PR target/94238
            * gcc.c-torture/compile/pr94144.c: New test.
            * gcc.c-torture/compile/pr94238.c: New test.
Comment 16 CVS Commits 2020-03-25 20:13:45 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:eeb0c7c07133634eb5e98ba0348392684a763c95

commit r10-7383-geeb0c7c07133634eb5e98ba0348392684a763c95
Author: Jeff Law <law@redhat.com>
Date:   Wed Mar 25 14:12:32 2020 -0600

    Fix vector-compare-1 regressions on sh4/sh4eb caused by pattern clobbering T reg without expressing that in its RTL.
    
            PR rtl-optimization/90275
            * config/sh/sh.md (mov_neg_si_t): Clobber the T register in the
            pattern.
Comment 17 CVS Commits 2020-03-25 20:33:21 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:48817fbd7616f086ac7bb1dd38b862f78762c9b8

commit r10-7384-g48817fbd7616f086ac7bb1dd38b862f78762c9b8
Author: Jeff Law <law@redhat.com>
Date:   Wed Mar 25 14:33:08 2020 -0600

        Fix vector-compare-1 regressions on sh4/sh4eb caused by pattern clobbering T reg without expressing that in its RTL.
    
                PR rtl-optimization/90275
                * config/sh/sh.md (mov_neg_si_t): Clobber the T register in the
                pattern.
Comment 18 David Binderman 2020-04-02 20:43:05 UTC
For this C code:

a() {
  short *b;
  short c;
  long long *d = a;
  for (;;) {
    long long *e = a;
    (*d *= *e - c) / *b ?: (*b = 0);
  }
}

on this week's raspberry pi C cross compiler, with flag -O2, does this:

$ /home/dcb/raspberrypi/results/bin/arm-linux-gnueabihf-gcc -c -w -O2 testFile.97.c
during RTL pass: cse_local
testFile.97.c: In function ‘a’:
testFile.97.c:9:1: internal compiler error: in insert_regs, at cse.c:1128
    9 | }
      | ^
0x7e5b19 insert_regs
	/home/dcb/gcc/trunk.git/gcc/cse.c:1128
0x17ab64b cse_insn
	/home/dcb/gcc/trunk.git/gcc/cse.c:5956
0x17ade1d cse_extended_basic_block
	/home/dcb/gcc/trunk.git/gcc/cse.c:6614
0x17ade1d cse_main
	/home/dcb/gcc/trunk.git/gcc/cse.c:6793

This code works on x86, so something arm specific is happening.
Comment 19 Martin Liška 2020-04-03 07:18:28 UTC
Confirmed, it really fails with current master.
Comment 20 Jeffrey A. Law 2020-04-06 20:57:18 UTC
90275, the gift that keeps giving.   While the failure is similar, this feels slightly different.

In this case we've got:

(insn 60 54 61 4 (parallel [
            (set (reg:CC 100 cc)
                (compare:CC (reg:SI 252 [ _5 ])
                    (const_int 0 [0])))
            (set (reg:SI 256 [ _5 ])
                (reg:SI 252 [ _5 ]))
        ]) "j.c":8:15 248 {*movsi_compare0}
     (expr_list:REG_UNUSED (reg:SI 256 [ _5 ])
        (nil)))


That gets (reg 252) into the tables.  We invalidate it when we hit this insn in the same block:

(insn 65 64 66 4 (parallel [
            (set (reg:SI 252 [ _5 ])
                (mult:SI (reg:SI 252 [ _5 ])
                    (reg:SI 252 [ _5 ])))
            (set (reg:SI 253 [ _5+4 ])
                (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg:SI 252 [ _5 ]))
                            (zero_extend:DI (reg:SI 252 [ _5 ])))
                        (const_int 32 [0x20]))))
        ]) "j.c":8:9 68 {umull}
     (nil))

We then trigger the assert when handling this insn from the block:

(insn 174 173 175 4 (set (reg:SI 0 r0)
        (reg:SI 252 [ _5 ])) "j.c":8:20 241 {*arm_movsi_insn}
     (nil))

At the point where we simplify insn 60 into the form above, we don't know the destination of the second set is unused.  That's not exposed until cse2 and I'm not terribly inclined to do the DF analysis earlier and try to split that insn.

I'm not sure of the best fix here, nor is it clear why we're having so much trouble with this code.  The real guts of this code hasn't changed materially in decades.
Comment 21 Jeffrey A. Law 2020-04-06 22:52:44 UTC
So we may be able to address this by setting "do_not_record" if we have multiple sets in an insn, one of which is a reg->reg copy to a destination that is mentioned in a REG_UNUSED note.  We'd only need to set it when processing the set with the destination referenced in the REG_UNUSED note.

If the sets were in different insns, then the reg->reg copy with an unused destination would be removed as dead.

If the source of the set were anything but a register, then we wouldn't be getting into the insert_regs routine with the validation check we're tripping.

I suspect there's still a problem if we have multiple sets, one of which is a nop set.  We may want to proactively address this case too, even if we don't have a C testcase which triggers it.
Comment 22 CVS Commits 2020-04-17 21:41:08 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:3737ccc424c56a2cecff202dd79f88d28850eeb2

commit r10-7781-g3737ccc424c56a2cecff202dd79f88d28850eeb2
Author: Jeff Law <law@redhat.com>
Date:   Fri Apr 17 15:38:13 2020 -0600

    [committed] [PR rtl-optimization/90275] Another 90275 related cse.c fix
    
    This time instead of having a NOP copy insn that we can completely ignore and
    ultimately remove, we have a NOP set within a multi-set PARALLEL.  It triggers,
    the same failure when the source of such a set is a hard register for the same
    reasons as we've already noted in the BZ and patches-to-date.
    
    For prior cases we've been able to mark the insn as a nop set and ignore it for
    the rest of cse_insn, ultimately removing it.  That's not really an option here
    as there are other sets that we have to preserve.
    
    We might be able to fix this instance by splitting the multi-set insn, but I'm
    not keen to introduce splitting into cse.  Furthermore, the target may not be
    able to split the insn.  So I considered this is non-starter.
    
    What I finally settled on was to use the existing do_not_record machinery to
    ignore the nop set within the parallel (and only that set within the parallel).
    
    One might argue that we should always ignore a REG_UNUSED set.  But I rejected
    that idea -- we could have cse-able divmod insns where the first had a
    REG_UNUSED note for a destination, but the second did not.
    
    One might also argue that we could have a nop set without a REG_UNUSED in a
    multi-set parallel and thus we could trigger yet another insert_regs ICE at
    some point.  I tend to think this is a possibility.  If we see this happen,
    we'll have to revisit.
    
            PR rtl-optimization/90275
            * cse.c (cse_insn): Avoid recording nop sets in multi-set parallels
            when the destination has a REG_UNUSED note.
Comment 23 Jakub Jelinek 2021-05-14 09:51:37 UTC
GCC 8 branch is being closed.
Comment 24 Richard Biener 2021-06-01 08:14:01 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 25 David Binderman 2021-09-28 14:50:48 UTC
This C source code:

$ more bug761.c
long a;
int b, c, e;
signed char d;
void f() {
  long long g = 105230154306549745590;
  b = (c ?: (d %= 11 * g)) + (e &= g += c);
  a = 5;
  for (; a <= 8;) {
    g = b != d ? e : g || (5 ? e = 1 : 0);
    a %= 0 < 0 / 0;
  }
}
pi@raspberrypi:~/creduce $ 

on recent gcc trunk on ARM and flag -O2, does this:

$ ../gcc/results/bin/arm-linux-gnueabihf-gcc -c -w -O2 bug761.c 
during RTL pass: cse_local
bug761.c: In function \u2018f\u2019:
bug761.c:12:1: internal compiler error: in insert_regs, at cse.c:1113
   12 | }
      | ^
0x16cd73f insert_regs(rtx_def*, table_elt*, int)
	../../trunk/gcc/cse.c:1113
0x16c9d9f cse_insn(rtx_insn*)
	../../trunk/gcc/cse.c:5926

...

$ ../gcc/results/bin/arm-linux-gnueabihf-gcc -v
Using built-in specs.
COLLECT_GCC=../gcc/results/bin/arm-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=/home/pi/gcc/results.20210928/libexec/gcc/arm-linux-gnueabihf/12.0.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../trunk/configure --prefix=/home/pi/gcc/results.20210928 --disable-bootstrap --disable-multilib --disable-werror --with-pkgversion=9cfb95f9b92326e8 --enable-checking=yes --enable-languages=c,c++,fortran --with-cpu=cortex-a72 --with-fpu=neon-fp-armv8 --with-float=hard --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210928 (experimental) (9cfb95f9b92326e8) 

I don't have a git revision where it works ok. Sorry.