113622 – [11/12/13 Regression] ICE with vectors in named registers

Bug 113622 - [11/12/13 Regression] ICE with vectors in named registers

Summary: [11/12/13 Regression] ICE with vectors in named registers

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	14.0

Importance:	P2 normal
Target Milestone:	11.5
Assignee:	Not yet assigned to anyone

URL:
Keywords:	ice-on-valid-code, testsuite-fail

Duplicates (1):	113626 (view as bug list)
Depends on:
Blocks:

Reported:	2024-01-26 20:16 UTC by Xi Ruoyao
Modified:	2024-05-06 11:49 UTC (History)
CC List:	3 users (show)

See Also:	113602 113670
Host:
Target:
Build:
Known to work:	14.0
Known to fail:	4.9.1
Last reconfirmed:	2024-01-26 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Xi Ruoyao 2024-01-26 20:16:26 UTC

Start from r14-8450 "tree-optimization/113602 - datarefs of non-addressables", various LoongArch target tests fail.  Reduced test case:

typedef double __attribute__ ((vector_size (32))) vec;
register vec a asm("f25"), b asm("f26"), c asm("f27");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = a[i] < b[i] ? 0.1 : 0.2;
}

$ ./gcc/cc1 t.c -O2 -msimd=lasx -fno-ident -o xvfcmp-f.s -nostdinc
t.c: In function ‘test’:
t.c:8:10: internal compiler error: in expand_assignment, at expr.cc:6073
    8 |     c[i] = a[i] < b[i] ? 0.1 : 0.2;
      |     ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
0x102029b expand_assignment(tree_node*, tree_node*, bool)
	../../gcc/gcc/expr.cc:6073
0xe7a76b expand_gimple_stmt_1
	../../gcc/gcc/cfgexpand.cc:3992
0xe7aaf9 expand_gimple_stmt
	../../gcc/gcc/cfgexpand.cc:4071
0xe8355b expand_gimple_basic_block
	../../gcc/gcc/cfgexpand.cc:6127
0xe85b0d execute
	../../gcc/gcc/cfgexpand.cc:6866

Comment 1 Xi Ruoyao 2024-01-26 20:23:35 UTC

On x86_64 it's reproducible likewise:

typedef double __attribute__ ((vector_size (32))) vec;
register vec a asm("ymm2"), b asm("ymm0"), c asm("ymm1");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = a[i] < b[i] ? 0.1 : 0.2;
}

$ gcc/cc1 t.c -mavx2

t.c:8:10: internal compiler error: in expand_assignment, at expr.cc:6073
    8 |     c[i] = a[i] < b[i] ? 0.1 : 0.2;
      |     ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
0x10d9759 expand_assignment(tree_node*, tree_node*, bool)
	../../gcc/gcc/expr.cc:6073
0xf2e9c0 expand_gimple_stmt_1
	../../gcc/gcc/cfgexpand.cc:3992
0xf2ed4e expand_gimple_stmt
	../../gcc/gcc/cfgexpand.cc:4071
0xf377be expand_gimple_basic_block
	../../gcc/gcc/cfgexpand.cc:6127
0xf39f72 execute
	../../gcc/gcc/cfgexpand.cc:6866

Self-confirming.

Comment 2 Andrew Pinski 2024-01-26 20:31:31 UTC

x86_64 Testcase (which invokes undefined behavior) which has been failing since at least 4.9.1 even:
```
typedef double __attribute__ ((vector_size (16))) vec;
register vec a asm("ymm2"), b asm("ymm0"), c asm("ymm1");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = a[i] < b[i] ? 0.1 : 0.2;
}
```

Comment 3 Xi Ruoyao 2024-01-26 20:36:49 UTC

(In reply to Andrew Pinski from comment #2)
> x86_64 Testcase (which invokes undefined behavior) which has been failing
> since at least 4.9.1 even:
> ```
> typedef double __attribute__ ((vector_size (16))) vec;
> register vec a asm("ymm2"), b asm("ymm0"), c asm("ymm1");
> 
> void
> test (void)
> {
>   for (int i = 0; i < 4; i++)
>     c[i] = a[i] < b[i] ? 0.1 : 0.2;
> }
> ```

Phew.  Then maybe I should refactor the LoongArch test cases to avoid using named registers...

Comment 4 Jakub Jelinek 2024-01-26 20:39:07 UTC

typedef double __attribute__ ((vector_size (16))) vec;
register vec a asm("xmm12"), b asm("xmm13"), c asm("xmm14");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = a[i] < b[i] ? 0.1 : 0.2;
}

ICEs with -O2 -mavx -ffixed-xmm{12,13,14} -std=gnu99 starting with
r0-104000-g30cd1c5d04c18770e8688d7199c20c2d528df1cd
when the vector indexing support has been added.

Comment 5 Andrew Pinski 2024-01-26 20:43:54 UTC

Here is a "valid" x86_64 testcase:

```
typedef float __attribute__ ((vector_size (64))) vec;
register vec a asm("zmm2"), b asm("zmm0"), c asm("zmm1");

void
test (void)
{
  for (int i = 0; i < 8; i++)
    c[i] = a[i] < b[i] ? 0.1 : 0.2;
}
```
Which fails with `-O2  -g0 -mavx512f` which started in GCC 4.9 when -mavx512f support was added.

Comment 6 Andrew Pinski 2024-01-26 20:59:40 UTC

The assert by the way:
```
          if (!MEM_P (to_rtx))
            {
              /* We can get constant negative offsets into arrays with broken
                 user code.  Translate this to a trap instead of ICEing.  */
              gcc_assert (TREE_CODE (offset) == INTEGER_CST);
              expand_builtin_trap ();
              to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
            }

```

Yes this is obvious is wrong when used with global registers ...

Comment 7 Andrew Pinski 2024-01-27 03:49:50 UTC

*** Bug 113626 has been marked as a duplicate of this bug. ***

Comment 8 Jakub Jelinek 2024-01-27 12:11:17 UTC

Guess for an rvalue (if even that crashes) we want to expand it to some permutation or whole vector shift which moves the indexed elements first and then extract it, for lvalue we need to insert it similarly.

Comment 9 Richard Biener 2024-01-29 07:56:42 UTC

I will have a look.

Comment 10 Richard Biener 2024-01-29 08:45:31 UTC

(In reply to Jakub Jelinek from comment #8)
> Guess for an rvalue (if even that crashes) we want to expand it to some
> permutation or whole vector shift which moves the indexed elements first and
> then extract it, for lvalue we need to insert it similarly.

If we can we should match this up with .VEC_SET / .VEC_EXTRACT, otherwise
we should go "simple" and spill.

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 7e2392ecd38..e94f292dd38 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun,
       machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
       machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref));
 
-      if (auto_var_in_fn_p (view_op0, fun->decl)
+      if ((auto_var_in_fn_p (view_op0, fun->decl)
+          || DECL_HARD_REGISTER (view_op0))
          && !TREE_ADDRESSABLE (view_op0)
          && ((!is_extract && can_vec_set_var_idx_p (outermode))
              || (is_extract

ensures the former and fixes the ICE on x86_64 on trunk.  The comment#5
testcase then results in the following loop:

.L3:
        movslq  %eax, %rdx
        vmovaps %zmm2, -56(%rsp)
        vmovaps %zmm0, -120(%rsp)
        vmovss  -120(%rsp,%rdx,4), %xmm4
        vmovss  -56(%rsp,%rdx,4), %xmm3
        vcmpltss        %xmm4, %xmm3, %xmm3
        vpbroadcastd    %eax, %zmm4
        addl    $1, %eax
        vpcmpd  $0, %zmm7, %zmm4, %k1
        vblendvps       %xmm3, %xmm5, %xmm6, %xmm3
        vbroadcastss    %xmm3, %zmm1{%k1}
        cmpl    $8, %eax
        jne     .L3

this isn't optimal of course, for optimality we need vectorization.  But
we still need to avoid the ICEs since vectorization can be disabled.  That
said, I'm quite sure in code using hard registers people are not doing
such stupid things so I wonder how important it is to avoid "regressing"
the vectorization here.

Comment 11 Jakub Jelinek 2024-01-29 08:56:18 UTC

I think it is most important we don't ICE and generate correct code.  I doubt this is used too much in real-world code, otherwise it would have been reported years ago, so how efficient it will be is less important.

Comment 12 Xi Ruoyao 2024-01-29 08:58:07 UTC

(In reply to Jakub Jelinek from comment #11)
> I think it is most important we don't ICE and generate correct code.  I
> doubt this is used too much in real-world code, otherwise it would have been
> reported years ago, so how efficient it will be is less important.

Hmm, but for another test case (LoongArch):

typedef double __attribute__ ((vector_size (32))) vec;
register vec a asm("f25"), b asm("f26"), c asm("f27");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = __builtin_isless (a[i], b[i]) ? 0.1 : 0.2;
}

I'll have to write a loop (because __builtin_isless does not work on vectors).  Or is there a vector built-in I'm missing?

Comment 13 Jakub Jelinek 2024-01-29 09:07:50 UTC

(In reply to Xi Ruoyao from comment #12)
> (In reply to Jakub Jelinek from comment #11)
> > I think it is most important we don't ICE and generate correct code.  I
> > doubt this is used too much in real-world code, otherwise it would have been
> > reported years ago, so how efficient it will be is less important.
> 
> Hmm, but for another test case (LoongArch):
> 
> typedef double __attribute__ ((vector_size (32))) vec;
> register vec a asm("f25"), b asm("f26"), c asm("f27");
> 
> void
> test (void)
> {
>   for (int i = 0; i < 4; i++)
>     c[i] = __builtin_isless (a[i], b[i]) ? 0.1 : 0.2;
> }
> 
> I'll have to write a loop (because __builtin_isless does not work on
> vectors).  Or is there a vector built-in I'm missing?

Why are you doing that?
Normally tests would do
vec
test (vec a, vec b)
{
  vec c = {};
  for (int i = 0; i < 4; i++)
    c[i] = __builtin_isless (a[i], b[i]) ? 0.1 : 0.2;
  return c;
}
or something similar.

Comment 14 Xi Ruoyao 2024-01-29 09:09:52 UTC

(In reply to Jakub Jelinek from comment #13)
> (In reply to Xi Ruoyao from comment #12)
> > (In reply to Jakub Jelinek from comment #11)
> > > I think it is most important we don't ICE and generate correct code.  I
> > > doubt this is used too much in real-world code, otherwise it would have been
> > > reported years ago, so how efficient it will be is less important.
> > 
> > Hmm, but for another test case (LoongArch):
> > 
> > typedef double __attribute__ ((vector_size (32))) vec;
> > register vec a asm("f25"), b asm("f26"), c asm("f27");
> > 
> > void
> > test (void)
> > {
> >   for (int i = 0; i < 4; i++)
> >     c[i] = __builtin_isless (a[i], b[i]) ? 0.1 : 0.2;
> > }
> > 
> > I'll have to write a loop (because __builtin_isless does not work on
> > vectors).  Or is there a vector built-in I'm missing?
> 
> Why are you doing that?
> Normally tests would do
> vec
> test (vec a, vec b)
> {
>   vec c = {};
>   for (int i = 0; i < 4; i++)
>     c[i] = __builtin_isless (a[i], b[i]) ? 0.1 : 0.2;
>   return c;
> }
> or something similar.

Because we are lacking a calling convention passing vectors in vector registers (it will be added in the future but not before GCC 14 release), thus I cannot test if the register operands are showing up in a correct order in the generated asm.

Comment 15 Richard Biener 2024-01-29 09:27:35 UTC

(In reply to Jakub Jelinek from comment #11)
> I think it is most important we don't ICE and generate correct code.  I
> doubt this is used too much in real-world code, otherwise it would have been
> reported years ago, so how efficient it will be is less important.

We do spill on the read side already.  On the write side the ICE is because
of r0-71337-g1e188d1e130034.  Note we're spilling parts of bitpos to offset:

  /* Otherwise, split it up.  */
  if (offset)
    {
      /* Avoid returning a negative bitpos as this may wreak havoc later.  */
      if (!bit_offset.to_shwi (pbitpos) || maybe_lt (*pbitpos, 0))
        {
          *pbitpos = num_trailing_bits (bit_offset.force_shwi ());
          poly_offset_int bytes = bits_to_bytes_round_down (bit_offset);
          offset = size_binop (PLUS_EXPR, offset,
                               build_int_cst (sizetype, bytes.force_shwi ()));
        }

      *poffset = offset;

but it can also be large positive when the bit amount doesn't fit a HWI.

The flow of 'to' expansion is a bit awkward, but the following properly
spills in case of variable offset and non-MEM_P:

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ee822c11dce..f54d0b1474e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6061,6 +6061,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
            to_rtx = adjust_address (to_rtx, BLKmode, 0);
        }
  
+      rtx stemp = NULL_RTX, old_to_rtx = NULL_RTX;
       if (offset != 0)
        {
          machine_mode address_mode;
@@ -6070,9 +6071,24 @@ expand_assignment (tree to, tree from, bool nontemporal)
            {
              /* We can get constant negative offsets into arrays with broken
                 user code.  Translate this to a trap instead of ICEing.  */
-             gcc_assert (TREE_CODE (offset) == INTEGER_CST);
-             expand_builtin_trap ();
-             to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
+             if (TREE_CODE (offset) == INTEGER_CST)
+               {
+                 expand_builtin_trap ();
+                 to_rtx = gen_rtx_MEM (BLKmode, const0_rtx);
+               }
+             /* Else spill for variable offset to the destination.  */
+             else
+               {
+                 gcc_assert (!TREE_CODE (from) == CALL_EXPR
+                             && COMPLETE_TYPE_P (TREE_TYPE (from))
+                             && (TREE_CODE (TYPE_SIZE (TREE_TYPE (from)))
+                                 != INTEGER_CST));
+                 stemp = assign_stack_temp (GET_MODE (to_rtx),
+                                            GET_MODE_SIZE (GET_MODE (to_rtx)));
+                 emit_move_insn (stemp, to_rtx);
+                 old_to_rtx = to_rtx;
+                 to_rtx = stemp;
+               }
            }
 
          offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode, EXPAND_SUM);
@@ -6305,6 +6321,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
                                  bitregion_start, bitregion_end,
                                  mode1, from, get_alias_set (to),
                                  nontemporal, reversep);
+         /* Move the temporary storage back to the non-MEM_P.  */
+         if (stemp)
+           emit_move_insn (old_to_rtx, stemp);
        }
 
       if (result)

Comment 16 Richard Biener 2024-01-29 09:38:42 UTC

typedef double __attribute__ ((vector_size (16))) vec;

void
test (void)
{
  register vec a asm("xmm1"), b asm("xmm2"), c asm("xmm3");
  for (int i = 0; i < 2; i++)
    c[i] = a[i] < b[i] ? 0.1 : 0.2;
}

also ICEs with -O0 -msse.

Comment 17 GCC Commits 2024-01-29 13:31:08 UTC

The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:96bc048d78f804bac0fa7b2ca3b6dd3a04c68217

commit r14-8497-g96bc048d78f804bac0fa7b2ca3b6dd3a04c68217
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jan 29 09:47:31 2024 +0100

    middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs
    
    The following expands .VEC_SET and .VEC_EXTRACT instruction selection
    to global hard registers, not only automatic variables (possibly)
    promoted to registers.  This can avoid some ICEs later and create
    better code.
    
            PR middle-end/113622
            * gimple-isel.cc (gimple_expand_vec_set_extract_expr):
            Also allow DECL_HARD_REGISTER variables.
    
            * gcc.target/i386/pr113622-1.c: New testcase.

Comment 18 GCC Commits 2024-01-29 13:31:15 UTC

The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:0f7945417f913c85bd556904c0c4e7bf77793488

commit r14-8498-g0f7945417f913c85bd556904c0c4e7bf77793488
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jan 29 10:24:39 2024 +0100

    middle-end/113622 - handle store with variable index to register
    
    The following implements storing to a non-MEM_P with a variable
    offset.  We usually avoid this by forcing expansion to memory but
    this doesn't work for hard register variables.  The solution is
    to spill and operate on the stack.
    
            PR middle-end/113622
            * expr.cc (expand_assignment): Spill hard registers if
            we index them with a variable offset.
    
            * gcc.target/i386/pr113622-2.c: New testcase.
            * gcc.target/i386/pr113622-3.c: Likewise.

Comment 19 Richard Biener 2024-01-29 13:32:02 UTC

Should be fixed on trunk, not sure to what extent backporting is suitable.

Comment 20 GCC Commits 2024-01-30 08:34:07 UTC

The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:47b81161c98cf2ff5495d4aa6386cc3c87f9d27b

commit r14-8515-g47b81161c98cf2ff5495d4aa6386cc3c87f9d27b
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 30 09:31:22 2024 +0100

    testsuite: Fix up pr113622-{2,3}.c for i686-linux [PR113622]
    
    The 2 new tests FAIL for me on i686-linux:
    .../gcc/testsuite/gcc.target/i386/pr113622-2.c:5:14: error: data type of 'a' isn't suitable for a register
    .../gcc/testsuite/gcc.target/i386/pr113622-2.c:5:29: error: data type of 'b' isn't suitable for a register
    .../gcc/testsuite/gcc.target/i386/pr113622-2.c:5:44: error: data type of 'c' isn't suitable for a register
    The problem is that the tests use vectors of double, something added
    only in SSE2, while the testcases ask for just -msse which only provides
    vectors of floats.
    
    So, either it should be using floats instead of doubles, or we need
    to add -msse2 to dg-options.
    
    I've done the latter.
    
    2024-01-30  Jakub Jelinek  <jakub@redhat.com>
    
            PR middle-end/113622
            * gcc.target/i386/pr113622-2.c: Use -msse2 instead of -msse in
            dg-options.
            * gcc.target/i386/pr113622-3.c: Likewise.

Comment 21 Xi Ruoyao 2024-01-30 13:22:32 UTC

This still blows up on LoongArch even after r14-8498:

typedef float __attribute__ ((vector_size (16))) vec;
typedef int __attribute__ ((vector_size (16))) ivec;
register vec a asm("f25"), b asm("f26");
register ivec c asm("f27");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = a[i] < b[i] ? -1 : 1;
}

$ gcc/cc1 -msimd=lsx t.c -O2 -fno-vect-cost-model -nostdinc
t.c: In function ‘test’:
t.c:7:1: internal compiler error: in expand_expr_addr_expr_1, at expr.cc:9139
    7 | test (void)
      | ^~~~
0x102d7cb expand_expr_addr_expr_1
	../../gcc/gcc/expr.cc:9139
0x102e13e expand_expr_addr_expr
	../../gcc/gcc/expr.cc:9252
0x103df07 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../gcc/gcc/expr.cc:12585
0x102e824 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../gcc/gcc/expr.cc:9440
0xe45096 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
	../../gcc/gcc/expr.h:316
0x102fc56 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
	../../gcc/gcc/expr.cc:9762
0x10361c9 expand_expr_real_gassign(gassign*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../gcc/gcc/expr.cc:11096
0xe798de expand_gimple_stmt_1
	../../gcc/gcc/cfgexpand.cc:4010
0xe79b81 expand_gimple_stmt
	../../gcc/gcc/cfgexpand.cc:4071
0xe825e3 expand_gimple_basic_block
	../../gcc/gcc/cfgexpand.cc:6127
0xe84b95 execute
	../../gcc/gcc/cfgexpand.cc:6866

Interestingly -fno-vect-cost-model is needed to trigger the ICE.

Comment 22 Xi Ruoyao 2024-01-30 13:27:31 UTC

On x86_64:

$ cat t.c
typedef float __attribute__ ((vector_size (16))) vec;
typedef int __attribute__ ((vector_size (16))) ivec;
register vec a asm("xmm0"), b asm("xmm1");
register ivec c asm("xmm2");

void
test (void)
{
  for (int i = 0; i < 4; i++)
    c[i] = a[i] < b[i] ? -1 : 1;
}
$ gcc/cc1 -msse2 t.c -O2 -fno-vect-cost-model -nostdinc -ffixed-xmm{0,1,2}
t.c: In function 'test':
t.c:7:1: internal compiler error: in expand_expr_addr_expr_1, at expr.cc:9139
    7 | test (void)
      | ^~~~
0x10e6d6e expand_expr_addr_expr_1
	../../gcc/gcc/expr.cc:9139
0x10e76e2 expand_expr_addr_expr
	../../gcc/gcc/expr.cc:9252
0x10f73a7 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../gcc/gcc/expr.cc:12585
0x10e7dc8 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../gcc/gcc/expr.cc:9440
0xef7346 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
	../../gcc/gcc/expr.h:316
0x10e91fa expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
	../../gcc/gcc/expr.cc:9762
0x10ef77d expand_expr_real_gassign(gassign*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
	../../gcc/gcc/expr.cc:11096
0xf2db31 expand_gimple_stmt_1
	../../gcc/gcc/cfgexpand.cc:4010
0xf2ddd4 expand_gimple_stmt
	../../gcc/gcc/cfgexpand.cc:4071
0xf36844 expand_gimple_basic_block
	../../gcc/gcc/cfgexpand.cc:6127
0xf38ff8 execute
	../../gcc/gcc/cfgexpand.cc:6866

Should I open a new ticket or add back 14 Regression to the subject?

Comment 23 rguenther@suse.de 2024-01-30 14:49:21 UTC

On Tue, 30 Jan 2024, xry111 at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622
> 
> --- Comment #22 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
> On x86_64:
> 
> $ cat t.c
> typedef float __attribute__ ((vector_size (16))) vec;
> typedef int __attribute__ ((vector_size (16))) ivec;
> register vec a asm("xmm0"), b asm("xmm1");
> register ivec c asm("xmm2");
> 
> void
> test (void)
> {
>   for (int i = 0; i < 4; i++)
>     c[i] = a[i] < b[i] ? -1 : 1;
> }
> $ gcc/cc1 -msse2 t.c -O2 -fno-vect-cost-model -nostdinc -ffixed-xmm{0,1,2}
> t.c: In function 'test':
> t.c:7:1: internal compiler error: in expand_expr_addr_expr_1, at expr.cc:9139
>     7 | test (void)
>       | ^~~~
> 0x10e6d6e expand_expr_addr_expr_1
>         ../../gcc/gcc/expr.cc:9139
> 0x10e76e2 expand_expr_addr_expr
>         ../../gcc/gcc/expr.cc:9252
> 0x10f73a7 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
>         ../../gcc/gcc/expr.cc:12585
> 0x10e7dc8 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
> rtx_def**, bool)
>         ../../gcc/gcc/expr.cc:9440
> 0xef7346 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
>         ../../gcc/gcc/expr.h:316
> 0x10e91fa expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
>         ../../gcc/gcc/expr.cc:9762
> 0x10ef77d expand_expr_real_gassign(gassign*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
>         ../../gcc/gcc/expr.cc:11096
> 0xf2db31 expand_gimple_stmt_1
>         ../../gcc/gcc/cfgexpand.cc:4010
> 0xf2ddd4 expand_gimple_stmt
>         ../../gcc/gcc/cfgexpand.cc:4071
> 0xf36844 expand_gimple_basic_block
>         ../../gcc/gcc/cfgexpand.cc:6127
> 0xf38ff8 execute
>         ../../gcc/gcc/cfgexpand.cc:6866
> 
> Should I open a new ticket or add back 14 Regression to the subject?

Please open a new ticked - this seems to be another vectorizer issue.

We end up with the invalid

_28 = (sizetype) &a;

Comment 24 Xi Ruoyao 2024-02-25 13:08:32 UTC

It looks I can rewrite the LoongArch test case (still broken though ICE is stopped) using check-function-bodies.  Will try later...

Comment 25 GCC Commits 2024-03-21 13:54:20 UTC

The releases/gcc-13 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:d4c0800aab864bb95260e12342d18695c6ebbec8

commit r13-8483-gd4c0800aab864bb95260e12342d18695c6ebbec8
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jan 29 09:47:31 2024 +0100

    middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs
    
    The following expands .VEC_SET and .VEC_EXTRACT instruction selection
    to global hard registers, not only automatic variables (possibly)
    promoted to registers.  This can avoid some ICEs later and create
    better code.
    
            PR middle-end/113622
            * gimple-isel.cc (gimple_expand_vec_set_extract_expr):
            Also allow DECL_HARD_REGISTER variables.
    
            * gcc.target/i386/pr113622-1.c: New testcase.
    
    (cherry picked from commit 96bc048d78f804bac0fa7b2ca3b6dd3a04c68217)

Comment 26 Richard Biener 2024-03-21 13:57:18 UTC

I've enabled vec_set for hard-regs on the branch and plugged the vectorizer hole for GCC 13.  I'm not sure to what extent we need the expansion change.