Bug 98190 - [11 Regression] GCC11 miscompiles code using _Bool when inlining: bfxil instruction misused since r11-165
Summary: [11 Regression] GCC11 miscompiles code using _Bool when inlining: bfxil instr...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 11.0
: P1 normal
Target Milestone: 11.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-08 00:22 UTC by Victor Stinner
Modified: 2020-12-11 11:50 UTC (History)
3 users (show)

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-12-08 00:00:00


Attachments
bug_bool.c reproducer (428 bytes, text/plain)
2020-12-08 00:22 UTC, Victor Stinner
Details
gcc11-pr98190.patch (1.22 KB, patch)
2020-12-10 08:22 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Stinner 2020-12-08 00:22:52 UTC
Created attachment 49704 [details]
bug_bool.c reproducer

GCC 11 miscompiles attached bug_bool.c on AArch64 with -O1:

* Using -O0, memory_richcompare() returns 1 as expected
* Using -O1 or higher, memory_richcompare() returns 0 => BUG

"bfxil	w4, w6, #0, #8" instruction is used to extend a single byte registry (w6) into a larger registry (w4). It works for w4 when it's equal to 0. But it doesn't with "bfxil	w5, w6, #0, #8" when w5 is not equal to 0. Sorry, I don't know AArch64, I cannot help too much.

$ gcc bug_bool.c -o bug_bool -O0 && ./bug_bool
equal = 1

$ gcc bug_bool.c -o bug_bool -O1 && ./bug_bool
equal = 0

* gcc (GCC) 11.0.0 20201204 (Red Hat 11.0.0-0) (Fedora: gcc-11.0.0-0.7.fc34.aarch64)
* Fedora Rawhide (Linux 5.10.0-0.rc6.90.fc34.aarch64)

If _Bool is replaced with "char", the code works as expected. GCC optimizes _Bool differently.


=== GCC -O1 ===

Dump of assembler code for function cmp_base2:
   // w4 = 0
   // w5 = 0xbb0aebc39d5f8094
   0x00000000004005e8 <+0>:	cmp	w2, #0x0
   0x00000000004005ec <+4>:	b.le	0x400628 <cmp_base2+64>
   0x00000000004005f0 <+8>:	mov	x3, #0x0                   	// #0

   0x00000000004005f4 <+12>:	ldrb	w6, [x0, x3]   // w6=1
   0x00000000004005f8 <+16>:	bfxil	w4, w6, #0, #8 // w4=1

   0x00000000004005fc <+20>:	ldrb	w6, [x1, x3]   // w6=1
   0x0000000000400600 <+24>:	bfxil	w5, w6, #0, #8 // w5=0x9d5f8001

   0x0000000000400604 <+28>:	cmp	w4, w5         // 1 is not equal to 0x9d5f8001 !

   0x0000000000400608 <+32>:	b.ne	0x400620 <cmp_base2+56>  // b.any
   0x000000000040060c <+36>:	add	x3, x3, #0x1
   0x0000000000400610 <+40>:	cmp	w2, w3
   0x0000000000400614 <+44>:	b.gt	0x4005f4 <cmp_base2+12>
   0x0000000000400618 <+48>:	mov	w0, #0x1                   	// #1
   0x000000000040061c <+52>:	b	0x400624 <cmp_base2+60>
   0x0000000000400620 <+56>:	mov	w0, #0x0                   	// #0
   0x0000000000400624 <+60>:	ret
   0x0000000000400628 <+64>:	mov	w0, #0x1                   	// #1
   0x000000000040062c <+68>:	b	0x400624 <cmp_base2+60>

Dump of assembler code for function memory_richcompare:
   0x0000000000400630 <+0>:	stp	x29, x30, [sp, #-32]!
   0x0000000000400634 <+4>:	mov	x29, sp
   0x0000000000400638 <+8>:	mov	w0, #0x1                   	// #1
   0x000000000040063c <+12>:	str	w0, [sp, #24]
   0x0000000000400640 <+16>:	ldr	w2, [sp, #24]
   0x0000000000400644 <+20>:	adrp	x0, 0x400000
   0x0000000000400648 <+24>:	add	x0, x0, #0x770
   0x000000000040064c <+28>:	mov	x1, x0
   0x0000000000400650 <+32>:	bl	0x4005e8 <cmp_base2>
   0x0000000000400654 <+36>:	ldp	x29, x30, [sp], #32
   0x0000000000400658 <+40>:	ret


=== GCC -O0 ===

(gdb) disassemble cmp_base2 
Dump of assembler code for function cmp_base2:
   0x00000000004005e8 <+0>:	sub	sp, sp, #0x30
   0x00000000004005ec <+4>:	str	x0, [sp, #24]
   0x00000000004005f0 <+8>:	str	x1, [sp, #16]
   0x00000000004005f4 <+12>:	str	w2, [sp, #12]
   0x00000000004005f8 <+16>:	str	wzr, [sp, #44]
   0x00000000004005fc <+20>:	b	0x400668 <cmp_base2+128>
   0x0000000000400600 <+24>:	ldr	x0, [sp, #24]
   0x0000000000400604 <+28>:	ldrb	w0, [x0]
   0x0000000000400608 <+32>:	strb	w0, [sp, #39]
   0x000000000040060c <+36>:	ldr	x0, [sp, #16]
   0x0000000000400610 <+40>:	ldrb	w0, [x0]
   0x0000000000400614 <+44>:	strb	w0, [sp, #38]
   0x0000000000400618 <+48>:	ldrb	w1, [sp, #39]
   0x000000000040061c <+52>:	ldrb	w0, [sp, #38]
   0x0000000000400620 <+56>:	cmp	w1, w0
   0x0000000000400624 <+60>:	cset	w0, eq  // eq = none
   0x0000000000400628 <+64>:	and	w0, w0, #0xff
   0x000000000040062c <+68>:	str	w0, [sp, #40]
   0x0000000000400630 <+72>:	ldr	w0, [sp, #40]
   0x0000000000400634 <+76>:	cmp	w0, #0x0
   0x0000000000400638 <+80>:	b.gt	0x400644 <cmp_base2+92>
   0x000000000040063c <+84>:	ldr	w0, [sp, #40]
   0x0000000000400640 <+88>:	b	0x40067c <cmp_base2+148>
   0x0000000000400644 <+92>:	ldr	x0, [sp, #24]
   0x0000000000400648 <+96>:	add	x0, x0, #0x1
   0x000000000040064c <+100>:	str	x0, [sp, #24]
   0x0000000000400650 <+104>:	ldr	x0, [sp, #16]
   0x0000000000400654 <+108>:	add	x0, x0, #0x1
   0x0000000000400658 <+112>:	str	x0, [sp, #16]
   0x000000000040065c <+116>:	ldr	w0, [sp, #44]
   0x0000000000400660 <+120>:	add	w0, w0, #0x1
   0x0000000000400664 <+124>:	str	w0, [sp, #44]
   0x0000000000400668 <+128>:	ldr	w1, [sp, #44]
   0x000000000040066c <+132>:	ldr	w0, [sp, #12]
   0x0000000000400670 <+136>:	cmp	w1, w0
   0x0000000000400674 <+140>:	b.lt	0x400600 <cmp_base2+24>  // b.tstop
   0x0000000000400678 <+144>:	mov	w0, #0x1                   	// #1
   0x000000000040067c <+148>:	add	sp, sp, #0x30
   0x0000000000400680 <+152>:	ret

Dump of assembler code for function memory_richcompare:
   0x0000000000400684 <+0>:	stp	x29, x30, [sp, #-32]!
   0x0000000000400688 <+4>:	mov	x29, sp
   0x000000000040068c <+8>:	adrp	x0, 0x400000
   0x0000000000400690 <+12>:	add	x0, x0, #0x7c0
   0x0000000000400694 <+16>:	str	x0, [sp, #24]
   0x0000000000400698 <+20>:	mov	w0, #0x1                   	// #1
   0x000000000040069c <+24>:	str	w0, [sp, #16]
   0x00000000004006a0 <+28>:	ldr	w0, [sp, #16]
   0x00000000004006a4 <+32>:	mov	w2, w0
   0x00000000004006a8 <+36>:	ldr	x1, [sp, #24]
   0x00000000004006ac <+40>:	ldr	x0, [sp, #24]
   0x00000000004006b0 <+44>:	bl	0x4005e8 <cmp_base2>
   0x00000000004006b4 <+48>:	ldp	x29, x30, [sp], #32
   0x00000000004006b8 <+52>:	ret


Note: bug discovered in Python on Fedora Rawhide when running test_buffer: https://bugs.python.org/issue42587
Comment 1 Andrew Pinski 2020-12-08 00:29:01 UTC
I think this is undefined behavior.
Comment 2 Andrew Pinski 2020-12-08 00:31:25 UTC
This is a dup of bug 88662.

*** This bug has been marked as a duplicate of bug 88662 ***
Comment 3 Victor Stinner 2020-12-08 00:33:25 UTC
Well, either all 64 bits of w4 and w5 registries should be initialized properly, or the comparison should be done only on the least significant 8 bits:

(gdb) p ($w5 & 0xff) == ($w4 & 0xff)
$7 = 1

These bits are equal as expected ;-)
Comment 4 Andrew Pinski 2020-12-08 00:37:28 UTC
(In reply to Victor Stinner from comment #3)
> Well, either all 64 bits of w4 and w5 registries should be initialized
> properly, or the comparison should be done only on the least significant 8
> bits:
> 
> (gdb) p ($w5 & 0xff) == ($w4 & 0xff)
> $7 = 1
> 
> These bits are equal as expected ;-)

Again this is undefined behavior.
Comment 5 Jakub Jelinek 2020-12-08 13:08:46 UTC
I don't see UB there, UB would be if you copy a value other than 0 or 1 to the _Bool variable, but that is not happening here.

Consider even:
static int __attribute__((noipa))
foo (const char *p, const char *q, const int len)
{
  for (int i = 0; i < len; p++, q++, i++)
    {
      int equal;
      _Bool x, y;
      __builtin_memcpy ((char *) &x, p, sizeof x);
      __builtin_memcpy ((char *) &y, q, sizeof y);
      equal = (x == y);
      if (equal <= 0)
	return equal;
    }
  return 1;
}

int
main ()
{
  const _Bool buf[4] = { 1, 0, 0, 0 };
  register long x4 asm ("x4") = 0xdeadbeefULL;
  register long x5 asm ("x5") = 0xdeadbeefULL;
  asm volatile (""::"r" (x4), "r" (x5));
  if (foo ((char *) &buf[0], (char *) &buf[0], 1) != 1)
    __builtin_abort ();
  return 0;
}

Copying through char * from _Bool to _Bool really must work, that is what happens e.g. in structure assignments etc. if it has _Bool fields.

The reason this is miscompiled is that the aarch64 backend decides that the x and y variables should be promoted from QImode to SImode and the expansion of the memcpy folded into assignment sets a MEM_REF with QImode (i.e. low parts of the promoted DECL_RTL), but nothing sign or zero extends it.
Comment 6 Jakub Jelinek 2020-12-08 13:55:28 UTC
Started with r11-165.
Perhaps decls set through MEM_REF even if it sets them in full size, but their promoted mode is wider, shouldn't have the DECL_GIMPLE_REG_P bit set?
Comment 7 Richard Biener 2020-12-08 14:52:15 UTC
(In reply to Jakub Jelinek from comment #6)
> Started with r11-165.
> Perhaps decls set through MEM_REF even if it sets them in full size, but
> their promoted mode is wider, shouldn't have the DECL_GIMPLE_REG_P bit set?

Huh, no, that's for sure not a "fix".  I presume

  _14 = MEM[(char * {ref-all})p_10(D) + ivtmp.8_4 * 1];
  MEM[(char * {ref-all})&x] = _14;
  _16 = MEM[(char * {ref-all})q_11(D) + ivtmp.8_4 * 1];
  MEM[(char * {ref-all})&y] = _16;
  x.0_1 = x;
  y.1_2 = y;
  if (x.0_1 != y.1_2)

is problematical, expanded as

(insn 21 20 0 (set (zero_extract:SI (reg/v:SI 99 [ y ])
            (const_int 8 [0x8])
            (const_int 0 [0]))
        (reg:SI 107)) "t.c":9:7 -1
     (nil))

;; if (x.0_1 != y.1_2)

(insn 22 21 23 (set (reg:CC 66 cc)
        (compare:CC (reg/v:SI 98 [ x ])
            (reg/v:SI 99 [ y ]))) "t.c":11:10 -1
     (nil))

where the (set (zero_extract:SI ...) ..) leaves the upper parts of the
register unspecified (old content) but the compare now uses the
full register.

IMHO the compare using the full register w/o first zero/sign-extending
it has the bug.  Who guarantees the inits of such promoted regs are
properly extended?

I'd say a fix in expansion of the non-MEM_P store would be OK-ish
(the generated code is also suboptimal I guess).

Note the bug would show up for VECTOR_TYPEs with integer modes in case
we expand those as promoted reg even before the change in question.

Semantic differences between GIMPLE and RTL are bad :/

How do we expand

 x = _14;

?  I guess one could carefully craft a testcase for this case.
Comment 8 Jakub Jelinek 2020-12-08 19:07:29 UTC
So, do we need to special case MEM_REF stores that store all bits (i.e. bitpos 0 bitsize equal to mode bitsize) into non-MEM variables which are promoted?
Something like:
--- gcc/expr.c.jj	2020-12-02 11:40:47.000000000 +0100
+++ gcc/expr.c	2020-12-08 19:57:05.147004740 +0100
@@ -5451,6 +5451,14 @@ expand_assignment (tree to, tree from, b
 					       mode1, to_rtx, to, from,
 					       reversep))
 	    result = NULL;
+	  else if (TREE_CODE (to) == MEM_REF
+		   && !REF_REVERSE_STORAGE_ORDER (to)
+		   && mem_ref_refers_to_non_mem_p (to)
+		   && SUBREG_P (to_rtx)
+		   && SUBREG_PROMOTED_VAR_P (to_rtx)
+		   && known_eq (bitpos, 0)
+		   && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
+	    result = store_expr (from, to_rtx, 0, nontemporal, false);
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
 				  bitregion_start, bitregion_end,

Not sure if it additionally doesn't have to check same mode, or if e.g. should
wrap from into a VCE if it has significantly different type.
It will not handle reverse storage order though.
Comment 9 Jakub Jelinek 2020-12-08 19:16:07 UTC
Perhaps some of those checks on the other side are redundant and could be turned e.g. into gcc_checking_assert of gcc_assert, I bet if the MEM_REF doesn't overwrite all bits, but only some subset of them, then the destination couldn't be a nonmem decl and thus couldn't be promoted.
Comment 10 rsandifo@gcc.gnu.org 2020-12-09 09:04:08 UTC
(In reply to Jakub Jelinek from comment #9)
> Perhaps some of those checks on the other side are redundant and could be
> turned e.g. into gcc_checking_assert of gcc_assert, I bet if the MEM_REF
> doesn't overwrite all bits, but only some subset of them, then the
> destination couldn't be a nonmem decl and thus couldn't be promoted.
Yeah, asserting sounds good.  Checking the other conditions makes it
look like we could still fall through to the else for some promoted
subregs.

If we can't assert, I guess the rule is that we need to extend
whenever we're storing to the MSB of the inner register.  We can
do that either by extending the source value and the range to
the outer register, or by assigning to the inner register and
then extending it separately.
Comment 11 rguenther@suse.de 2020-12-09 09:16:51 UTC
On Wed, 9 Dec 2020, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98190
> 
> --- Comment #10 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> (In reply to Jakub Jelinek from comment #9)
> > Perhaps some of those checks on the other side are redundant and could be
> > turned e.g. into gcc_checking_assert of gcc_assert, I bet if the MEM_REF
> > doesn't overwrite all bits, but only some subset of them, then the
> > destination couldn't be a nonmem decl and thus couldn't be promoted.
> Yeah, asserting sounds good.  Checking the other conditions makes it
> look like we could still fall through to the else for some promoted
> subregs.
> 
> If we can't assert, I guess the rule is that we need to extend
> whenever we're storing to the MSB of the inner register.  We can
> do that either by extending the source value and the range to
> the outer register, or by assigning to the inner register and
> then extending it separately.

So I guess if you do a GIMPLE FE testcase with a __BIT_INSERT
to the MSB of a promoted var that should end up doing the extension
as well?  Like (probably doesn't parse, needs a 1-bit precision '1')

signed char __GIMPLE () foo()
{
  signed char _1;
  signed char _2;

__BB(2):
  _2 = __BIT_INSERT (_1(D), 1, 7);
  return _2;
}

int main()
{
  if (foo() > 0)
    abort ();
}

?
Comment 12 Jakub Jelinek 2020-12-09 09:41:18 UTC
(In reply to rsandifo@gcc.gnu.org from comment #10)
> If we can't assert, I guess the rule is that we need to extend
> whenever we're storing to the MSB of the inner register.  We can
> do that either by extending the source value and the range to
> the outer register, or by assigning to the inner register and
> then extending it separately.

So perhaps:
--- gcc/expr.c.jj	2020-12-09 00:00:08.622548080 +0100
+++ gcc/expr.c	2020-12-09 10:36:12.198801940 +0100
@@ -5451,6 +5451,33 @@ expand_assignment (tree to, tree from, b
 					       mode1, to_rtx, to, from,
 					       reversep))
 	    result = NULL;
+          else if (SUBREG_P (to_rtx)
+		   && SUBREG_PROMOTED_VAR_P (to_rtx))
+	    {
+	      /* If to_rtx is a promoted subreg, this must be a store to the
+		 whole variable, otherwise to_rtx would need to be MEM.
+		 We need to zero or sign extend the value afterwards.  */
+	      gcc_assert (known_eq (bitpos, 0)
+			  && known_eq (bitsize,
+				       GET_MODE_BITSIZE (GET_MODE (to_rtx))));
+	      if (TREE_CODE (to) == MEM_REF && !REF_REVERSE_STORAGE_ORDER (to))
+		result = store_expr (from, to_rtx, 0, nontemporal, false);
+	      else
+		{
+		  result = store_field (to_rtx, bitsize, bitpos,
+					bitregion_start, bitregion_end,
+					mode1, from, get_alias_set (to),
+					nontemporal, reversep);
+		  rtx to_rtx1
+		    = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
+				      SUBREG_REG (to_rtx),
+				      subreg_promoted_mode (to_rtx));
+		  to_rtx1 = convert_to_mode (subreg_promoted_mode (to_rtx),
+					     to_rtx1,
+					     SUBREG_PROMOTED_SIGN (to_rtx));
+		  emit_move_insn (SUBREG_REG (to_rtx), to_rtx1);
+		}
+	    }
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
 				  bitregion_start, bitregion_end,

then?  As in, if store_expr can handle it, use that, otherwise perform the extension at the end.

As for BIT_INSERT_EXPR, I'm not sure if SSA_NAMEs can get promoted SUBREGs or not, but in any case it wouldn't be this code path, it would be store_expr which handles the promoted SUBREGs already, because destination would not be a MEM_REF with non-mem decl or reversed order, nor handled_component_p, nor ARRAY_TYPE destination.
Comment 13 rguenther@suse.de 2020-12-09 15:03:20 UTC
On Wed, 9 Dec 2020, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98190
> 
> --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to rsandifo@gcc.gnu.org from comment #10)
> > If we can't assert, I guess the rule is that we need to extend
> > whenever we're storing to the MSB of the inner register.  We can
> > do that either by extending the source value and the range to
> > the outer register, or by assigning to the inner register and
> > then extending it separately.
> 
> So perhaps:
> --- gcc/expr.c.jj       2020-12-09 00:00:08.622548080 +0100
> +++ gcc/expr.c  2020-12-09 10:36:12.198801940 +0100
> @@ -5451,6 +5451,33 @@ expand_assignment (tree to, tree from, b
>                                                mode1, to_rtx, to, from,
>                                                reversep))
>             result = NULL;
> +          else if (SUBREG_P (to_rtx)
> +                  && SUBREG_PROMOTED_VAR_P (to_rtx))
> +           {
> +             /* If to_rtx is a promoted subreg, this must be a store to the
> +                whole variable, otherwise to_rtx would need to be MEM.

Yes, that's true - all !DECL_NOT_GIMPLE_REG_P may not have partial defs.

> +                We need to zero or sign extend the value afterwards.  */
> +             gcc_assert (known_eq (bitpos, 0)
> +                         && known_eq (bitsize,
> +                                      GET_MODE_BITSIZE (GET_MODE (to_rtx))));
> +             if (TREE_CODE (to) == MEM_REF && !REF_REVERSE_STORAGE_ORDER (to))
> +               result = store_expr (from, to_rtx, 0, nontemporal, false);
> +             else
> +               {
> +                 result = store_field (to_rtx, bitsize, bitpos,
> +                                       bitregion_start, bitregion_end,
> +                                       mode1, from, get_alias_set (to),
> +                                       nontemporal, reversep);
> +                 rtx to_rtx1
> +                   = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
> +                                     SUBREG_REG (to_rtx),
> +                                     subreg_promoted_mode (to_rtx));
> +                 to_rtx1 = convert_to_mode (subreg_promoted_mode (to_rtx),
> +                                            to_rtx1,
> +                                            SUBREG_PROMOTED_SIGN (to_rtx));
> +                 emit_move_insn (SUBREG_REG (to_rtx), to_rtx1);
> +               }
> +           }
>           else
>             result = store_field (to_rtx, bitsize, bitpos,
>                                   bitregion_start, bitregion_end,
> 
> then?  As in, if store_expr can handle it, use that, otherwise perform the
> extension at the end.
> 
> As for BIT_INSERT_EXPR, I'm not sure if SSA_NAMEs can get promoted SUBREGs or
> not, but in any case it wouldn't be this code path, it would be store_expr
> which handles the promoted SUBREGs already, because destination would not be a
> MEM_REF with non-mem decl or reversed order, nor handled_component_p, nor
> ARRAY_TYPE destination.

True.
Comment 14 rsandifo@gcc.gnu.org 2020-12-09 16:30:06 UTC
(In reply to Jakub Jelinek from comment #12)
> (In reply to rsandifo@gcc.gnu.org from comment #10)
> > If we can't assert, I guess the rule is that we need to extend
> > whenever we're storing to the MSB of the inner register.  We can
> > do that either by extending the source value and the range to
> > the outer register, or by assigning to the inner register and
> > then extending it separately.
> 
> So perhaps:
> --- gcc/expr.c.jj	2020-12-09 00:00:08.622548080 +0100
> +++ gcc/expr.c	2020-12-09 10:36:12.198801940 +0100
> @@ -5451,6 +5451,33 @@ expand_assignment (tree to, tree from, b
>  					       mode1, to_rtx, to, from,
>  					       reversep))
>  	    result = NULL;
> +          else if (SUBREG_P (to_rtx)
> +		   && SUBREG_PROMOTED_VAR_P (to_rtx))
> +	    {
> +	      /* If to_rtx is a promoted subreg, this must be a store to the
> +		 whole variable, otherwise to_rtx would need to be MEM.
> +		 We need to zero or sign extend the value afterwards.  */
> +	      gcc_assert (known_eq (bitpos, 0)
> +			  && known_eq (bitsize,
> +				       GET_MODE_BITSIZE (GET_MODE (to_rtx))));
> +	      if (TREE_CODE (to) == MEM_REF && !REF_REVERSE_STORAGE_ORDER (to))
> +		result = store_expr (from, to_rtx, 0, nontemporal, false);
> +	      else
> +		{
> +		  result = store_field (to_rtx, bitsize, bitpos,
> +					bitregion_start, bitregion_end,
> +					mode1, from, get_alias_set (to),
> +					nontemporal, reversep);
> +		  rtx to_rtx1
> +		    = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
> +				      SUBREG_REG (to_rtx),
> +				      subreg_promoted_mode (to_rtx));
> +		  to_rtx1 = convert_to_mode (subreg_promoted_mode (to_rtx),
> +					     to_rtx1,
> +					     SUBREG_PROMOTED_SIGN (to_rtx));
> +		  emit_move_insn (SUBREG_REG (to_rtx), to_rtx1);
> +		}
> +	    }
>  	  else
>  	    result = store_field (to_rtx, bitsize, bitpos,
>  				  bitregion_start, bitregion_end,
> 
> then?  As in, if store_expr can handle it, use that, otherwise perform the
> extension at the end.
LGTM FWIW, although I shouldn't be the one to review.  I'm not
sure off-hand whether it's OK to store an unpromoted value to
a promoted subreg lhs, with the promoted subreg being temporarily
invalid.  If that's a problem, it might be safer to pass to_rtx1
to store_field instead.
Comment 15 Jakub Jelinek 2020-12-10 08:22:46 UTC
Created attachment 49727 [details]
gcc11-pr98190.patch

So, I have bootstrapped/regtested this patch last night on x86_64, i686, aarch64, armv7hl, powerpc64le (and s390x still pending) linux.
Unfortunately, on aarch64 it regresses:
gcc.c-torture/execute/pr93213.c
and on powerpc64le that test plus:
g++.dg/warn/Wstrict-aliasing-bogus-char-1.C
gcc.dg/pr87273.c
gcc.dg/torture/pr91656-1.c
gcc.dg/tree-ssa/pr92085-2.c
gcc.dg/tree-ssa/pr94703.c

Seems the assumption that for promoted SUBREG to_rtx the store is always to all the bits is incorrect, e.g. on pr93213.c  the memcpy is copying just half of the bits.  So, shall we check the bitpos 0 bitsize all to_rtx bits for the store_rtx case and otherwise check depending on endianity if the most significant bit of to_rtx is overwritten and extend in that case?
Comment 16 rguenther@suse.de 2020-12-10 11:39:38 UTC
On Thu, 10 Dec 2020, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98190
> 
> --- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Created attachment 49727 [details]
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49727&action=edit
> gcc11-pr98190.patch
> 
> So, I have bootstrapped/regtested this patch last night on x86_64, i686,
> aarch64, armv7hl, powerpc64le (and s390x still pending) linux.
> Unfortunately, on aarch64 it regresses:
> gcc.c-torture/execute/pr93213.c
> and on powerpc64le that test plus:
> g++.dg/warn/Wstrict-aliasing-bogus-char-1.C
> gcc.dg/pr87273.c
> gcc.dg/torture/pr91656-1.c
> gcc.dg/tree-ssa/pr92085-2.c
> gcc.dg/tree-ssa/pr94703.c
> 
> Seems the assumption that for promoted SUBREG to_rtx the store is always to all
> the bits is incorrect, e.g. on pr93213.c  the memcpy is copying just half of
> the bits.  So, shall we check the bitpos 0 bitsize all to_rtx bits for the
> store_rtx case and otherwise check depending on endianity if the most
> significant bit of to_rtx is overwritten and extend in that case?

in foo() you mean?  For

  __builtin_memmove (&u16_1, &u128_1, 1);

?  So that's a parameter destination - does it at least have correctly
DECL_NOT_GIMPLE_REG_P set?  Did expansion really do sth different
when we had it TREE_ADDRESSABLE?
Comment 17 CVS Commits 2020-12-11 10:12:44 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:3e60ddeb8220ed388819bb3f14e8caa9309fd3c2

commit r11-5927-g3e60ddeb8220ed388819bb3f14e8caa9309fd3c2
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Dec 11 11:10:17 2020 +0100

    expansion: Sign or zero extend on MEM_REF stores into SUBREG with SUBREG_PROMOTED_VAR_P [PR98190]
    
    Some targets decide to promote certain scalar variables to wider mode,
    so their DECL_RTL is a SUBREG with SUBREG_PROMOTED_VAR_P.
    When storing to such vars, store_expr takes care of sign or zero extending,
    but if we store e.g. through MEM_REF into them, no sign or zero extension
    happens and that leads to wrong-code e.g. on the following testcase on
    aarch64-linux.
    
    The following patch uses store_expr if we overwrite all the bits and it is
    not reversed storage order, i.e. something that store_expr handles normally,
    and otherwise (if the most significant bit is (or for pdp11 might be, but
    pdp11 doesn't promote) being modified), the code extends manually.
    
    2020-12-11  Jakub Jelinek  <jakub@redhat.com>
    
            PR middle-end/98190
            * expr.c (expand_assignment): If to_rtx is a promoted SUBREG,
            ensure sign or zero extension either through use of store_expr
            or by extending manually.
    
            * gcc.dg/pr98190.c: New test.
Comment 18 Jakub Jelinek 2020-12-11 11:50:09 UTC
Fixed.