109636 – [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -march=armv8.4-a+sve

Bug 109636 - [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -march=armv8.4-a+sve

Summary: [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -march=ar...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	14.0

Importance:	P1 normal
Target Milestone:	14.0
Assignee:	Tamar Christina

URL:
Keywords:	ice-on-valid-code, needs-bisection, testsuite-fail

Duplicates (1):	113229 (view as bug list)
Depends on:
Blocks:

Reported:	2023-04-26 17:28 UTC by Zdenek Sojka
Modified:	2024-01-24 16:01 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	aarch64-unknown-linux-gnu
Build:
Known to work:
Known to fail:	14.0
Last reconfirmed:	2023-12-05 00:00:00

Attachments
reduced testcase (157 bytes, text/plain) 2023-04-26 17:28 UTC, Zdenek Sojka	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zdenek Sojka 2023-04-26 17:28:49 UTC

Created attachment 54927 [details]
reduced testcase

Compiler output:
$ aarch64-unknown-linux-gnu-gcc -O -mcpu=a64fx testcase.c 
during RTL pass: expand
testcase.c: In function 'foo':
testcase.c:9:3: internal compiler error: in paradoxical_subreg_p, at rtl.h:3205
    9 |   bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) / v));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0x7e83b3 paradoxical_subreg_p(machine_mode, machine_mode)
        /repo/gcc-trunk/gcc/rtl.h:3205
0x7f1871 paradoxical_subreg_p(machine_mode, machine_mode)
        /repo/gcc-trunk/gcc/simplify-rtx.cc:7459
0x7f1871 simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>)
        /repo/gcc-trunk/gcc/simplify-rtx.cc:7533
0x1193a21 simplify_context::simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>)
        /repo/gcc-trunk/gcc/simplify-rtx.cc:7748
0x1af64c4 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>)
        /repo/gcc-trunk/gcc/rtl.h:3542
0x1af64c4 gen_udivv2di3(rtx_def*, rtx_def*, rtx_def*)
        /repo/gcc-trunk/gcc/config/aarch64/aarch64-simd.md:2910
0x104e9da expand_binop_directly
        /repo/gcc-trunk/gcc/optabs.cc:1442
0x104c481 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods)
        /repo/gcc-trunk/gcc/optabs.cc:1529
0x104f063 sign_expand_binop(machine_mode, optab_tag, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods)
        /repo/gcc-trunk/gcc/optabs.cc:2317
0xd92811 expand_divmod(int, tree_code, machine_mode, rtx_def*, rtx_def*, rtx_def*, int, optab_methods)
        /repo/gcc-trunk/gcc/expmed.cc:5268
0xd9f3e9 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
        /repo/gcc-trunk/gcc/expr.cc:9863
0xda6218 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
        /repo/gcc-trunk/gcc/expr.cc:10800
0xda0317 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
        /repo/gcc-trunk/gcc/expr.cc:8999
0xda0317 expand_normal(tree_node*)
        /repo/gcc-trunk/gcc/expr.h:316
0xda0317 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
        /repo/gcc-trunk/gcc/expr.cc:10453
0xda6218 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
        /repo/gcc-trunk/gcc/expr.cc:10800
0xc4d218 expand_normal(tree_node*)
        /repo/gcc-trunk/gcc/expr.h:316
0xc4d218 precompute_register_parameters
        /repo/gcc-trunk/gcc/calls.cc:988
0xc53ca0 expand_call(tree_node*, rtx_def*, int)
        /repo/gcc-trunk/gcc/calls.cc:3416
0xda4c51 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
        /repo/gcc-trunk/gcc/expr.cc:11867
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/mnt/main-repo/repo/gcc-trunk/binary-trunk-r14-268-20230426091040-ge02f68df385-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/14.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r14-268-20230426091040-ge02f68df385-checking-yes-rtl-df-extra-aarch64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20230426 (experimental) (GCC)

Comment 1 Andrew Pinski 2023-04-26 19:08:54 UTC

Are you sure this is not a regression also in GCC 13.1.0.
The most obvious revision which caused this is r13-6620-gf23dc726875c26f2c3 .

Comment 2 ktkachov 2023-04-26 22:26:32 UTC

(In reply to Andrew Pinski from comment #1)
> Are you sure this is not a regression also in GCC 13.1.0.
> The most obvious revision which caused this is r13-6620-gf23dc726875c26f2c3 .

I'd expect it's g:c69db3ef7f7d82a50f46038aa5457b7c8cc2d643 but haven't looked deeper yet

Comment 3 Andrew Pinski 2023-04-26 22:42:25 UTC

Oh simplify_gen_subreg should not be used I think. Rather gen_lowpart should be used instead. Especially when it comes to big endian.

Comment 4 ktkachov 2023-04-27 08:26:04 UTC

Confirmed. The operand that's blowing it up is:
(subreg:V2DI (reg/v:OI 97 [ w ]) 16)
at
rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], <MODE>mode, 0);

simplify_gen_subreg, lowpart_subreg, copy_to_mode_reg and force_reg all ICE :(

Comment 5 ktkachov 2023-04-28 08:17:05 UTC

The multiplication case also ICEs
void foom (V v, W w)
{
  bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v));
}

as mulv2di3 was implemented with a similar trick for TARGET_SVE.
I'll take this, once I figure out how to wire up the Neon modes through SVE...

Comment 6 Richard Sandiford 2023-04-28 08:25:26 UTC

Ugh.  I guess we've got no option but to force the original
subreg into a fresh register, but that's going to pessimise
cases where arithmetic is done on tuple types.

Perhaps we should just expose the SVE operation as a native
V2DI one.  Handling predicated ops would be a bit more challenging
though.

Comment 7 ktkachov 2023-04-28 14:41:56 UTC

(In reply to rsandifo@gcc.gnu.org from comment #6)
> Ugh.  I guess we've got no option but to force the original
> subreg into a fresh register, but that's going to pessimise
> cases where arithmetic is done on tuple types.
> 
> Perhaps we should just expose the SVE operation as a native
> V2DI one.  Handling predicated ops would be a bit more challenging
> though.

I did try a copy_to_mode_reg to a fresh V2DI register for non-REG_P arguments and that did progress, but (surprisingly?) still ICEd during fwprop:
during RTL pass: fwprop1
mulice.c: In function 'foom':
mulice.c:17:1: internal compiler error: in paradoxical_subreg_p, at rtl.h:3205
   17 | }
      | ^
0xe903b9 paradoxical_subreg_p(machine_mode, machine_mode)
        $SRC/gcc/rtl.h:3205
0xe903b9 simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>)
        $SRC/gcc/simplify-rtx.cc:7533
0xe1b5f7 insn_propagation::apply_to_rvalue_1(rtx_def**)
        $SRC/gcc/recog.cc:1176
0xe1b3d8 insn_propagation::apply_to_rvalue_1(rtx_def**)
        $SRC/gcc/recog.cc:1118
0xe1b7b7 insn_propagation::apply_to_rvalue_1(rtx_def**)
        $SRC/gcc/recog.cc:1254
0xe1babf insn_propagation::apply_to_pattern_1(rtx_def**)
        $SRC/gcc/recog.cc:1361
0xe1bae4 insn_propagation::apply_to_pattern(rtx_def**)
        $SRC/gcc/recog.cc:1383
0x1c22e5b try_fwprop_subst_pattern
        $SRC/gcc/fwprop.cc:454
0x1c22e5b try_fwprop_subst
        $SRC/gcc/fwprop.cc:627
0x1c239a9 forward_propagate_and_simplify
        $SRC/gcc/fwprop.cc:823
0x1c239a9 forward_propagate_into
        $SRC/gcc/fwprop.cc:886
0x1c23bc1 fwprop_insn
        $SRC/gcc/fwprop.cc:943
0x1c23d98 fwprop
        $SRC/gcc/fwprop.cc:995
0x1c240e1 execute
        $SRC/gcc/fwprop.cc:1033
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

fwprop ended up creating:
(mult:VNx2DI (subreg:VNx2DI (reg/v:V2DI 95 [ v ]) 0)
    (subreg:VNx2DI (subreg:V2DI (reg/v:OI 97 [ w ]) 16) 0))

and something blew up anyway, so it seems the RTL passes *really* don't like these kind of subregs ;)
I'll look into expressing these ops as native V2DI patterns. I guess for the unpredicated SVE2 mul that's easy, but for the predicated forms perhaps we can have them consume a predicate register, generated at expand time, similar to the  aarch64-sve.md expanders. Not super-pretty but maybe it'll be enough

Comment 8 Andrew Pinski 2024-01-10 08:01:21 UTC

*** Bug 113229 has been marked as a duplicate of this bug. ***

Comment 9 Andrew Pinski 2024-01-10 08:03:05 UTC

The following testcases now fail due to this ICE when compiled with -march=armv9-a+sve2 (something which I have been testing recently too):
gcc.dg/torture/pr70083.c
gcc.dg/pr69896.c
gcc.target/aarch64/pr70120-1.c

Comment 10 Andrew Pinski 2024-01-10 08:05:07 UTC

(In reply to Andrew Pinski from comment #9)
> The following testcases now fail due to this ICE when compiled with
> -march=armv9-a+sve2 (something which I have been testing recently too):
> gcc.dg/torture/pr70083.c
> gcc.dg/pr69896.c
> gcc.target/aarch64/pr70120-1.c

Note gcc.dg/pr69896.c has a different path (not via gen_*divv*) to the ICE though:
```

/home/apinski/src/upstream-full-cross/gcc/gcc/testsuite/gcc.dg/pr69896.c:22:1: internal compiler error: in paradoxical_subreg_p, at rtl.h:3213
0x80c65b paradoxical_subreg_p(machine_mode, machine_mode)
        ../../gcc/rtl.h:3213
0x80cfc8 paradoxical_subreg_p(machine_mode, machine_mode)
        ../../gcc/poly-int.h:2179
0x80cfc8 simplify_const_vector_subreg
        ../../gcc/simplify-rtx.cc:7423
0x80cfc8 simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>)
        ../../gcc/simplify-rtx.cc:7595
0xfae1c9 insn_propagation::apply_to_rvalue_1(rtx_def**)
        ../../gcc/recog.cc:1176
0xfadcab insn_propagation::apply_to_rvalue_1(rtx_def**)
        ../../gcc/recog.cc:1117
0xfade93 insn_propagation::apply_to_rvalue_1(rtx_def**)
        ../../gcc/recog.cc:1254
0xfae63f insn_propagation::apply_to_pattern(rtx_def**)
        ../../gcc/recog.cc:1396
0x1cfdb66 try_fwprop_subst_pattern
        ../../gcc/fwprop.cc:440
0x1cfdb66 try_fwprop_subst
        ../../gcc/fwprop.cc:613
0x1cfe500 forward_propagate_and_simplify
        ../../gcc/fwprop.cc:809
0x1cfe500 forward_propagate_into
        ../../gcc/fwprop.cc:872
0x1cfe89d forward_propagate_into
        ../../gcc/fwprop.cc:821
0x1cfe89d fwprop_insn
        ../../gcc/fwprop.cc:929
0x1cfe9c1 fwprop
        ../../gcc/fwprop.cc:981
```

Comment 11 Tamar Christina 2024-01-12 17:55:32 UTC

Have a patch for the division case and will finish the multiplication and submit when I'm back. Sorry for the delay.

Comment 12 GCC Commits 2024-01-24 15:58:51 UTC

The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:dfa17fd3b1a50cab51803e8a63c5c7b7db173523

commit r14-8394-gdfa17fd3b1a50cab51803e8a63c5c7b7db173523
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Wed Jan 24 15:58:34 2024 +0000

    AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636]
    
    As suggested in the ticket this replaces the expansion by converting the
    Advanced SIMD types to SVE types by simply printing out an SVE register for
    these instructions.
    
    This fixes the subreg issues since there are no subregs involved anymore.
    
    gcc/ChangeLog:
    
            PR target/109636
            * config/aarch64/aarch64-simd.md (<su_optab>div<mode>3,
            mulv2di3): Remove.
            * config/aarch64/iterators.md (VQDIV): Remove.
            (SVE_FULL_SDI_SIMD, SVE_FULL_HSDI_SIMD_DI,
            SVE_I_SIMD_DI): New.
            (VPRED, sve_lane_con): Add V4SI and V2DI.
            * config/aarch64/aarch64-sve.md (<optab><mode>3,
            @aarch64_pred_<optab><mode>): Support Advanced SIMD types.
            (mul<mode>3): New, split from <optab><mode>3.
            (@aarch64_pred_<optab><mode>, *post_ra_<optab><mode>3): New.
            * config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_<mode>,
            *aarch64_mul_unpredicated_<mode>): Change SVE_FULL_HSDI to
            SVE_FULL_HSDI_SIMD_DI.
    
    gcc/testsuite/ChangeLog:
    
            PR target/109636
            * gcc.target/aarch64/sve/pr109636_1.c: New test.
            * gcc.target/aarch64/sve/pr109636_2.c: New test.
            * gcc.target/aarch64/sve2/pr109636_1.c: New test.

Comment 13 Tamar Christina 2024-01-24 16:01:01 UTC

Fixed, thanks for the report.