Bug 88739 - [7 Regression] Big-endian union bug
Summary: [7 Regression] Big-endian union bug
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.3.0
: P2 normal
Target Milestone: 7.5
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2019-01-07 12:06 UTC by John Dong
Modified: 2019-03-26 13:20 UTC (History)
11 users (show)

See Also:
Host:
Target: aarch64be, powerpc, s390x
Build:
Known to work: 7.4.1, 8.2.1, 9.0
Known to fail: 7.3.0, 7.4.0, 8.2.0
Last reconfirmed: 2019-01-07 00:00:00


Attachments
demo code to recur error (493 bytes, text/x-csrc)
2019-01-07 12:06 UTC, John Dong
Details
exhaustive testcase (2.60 KB, text/plain)
2019-01-08 12:27 UTC, Richard Biener
Details
script to generate testcase (327 bytes, text/plain)
2019-01-08 12:28 UTC, Richard Biener
Details
temp patch for Bug 88739 (429 bytes, patch)
2019-01-08 12:29 UTC, John Dong
Details | Diff
patch (3.33 KB, patch)
2019-01-08 12:37 UTC, Richard Biener
Details | Diff
patch that changes get_ref_base_and_extent for bare SSA_NAMEs (441 bytes, patch)
2019-01-09 18:58 UTC, rsandifo@gcc.gnu.org
Details | Diff
workaround (651 bytes, patch)
2019-01-24 14:36 UTC, Richard Biener
Details | Diff
adjusted patch (1.02 KB, patch)
2019-01-25 10:03 UTC, Richard Biener
Details | Diff
fix the union bug on 7.3.0 (520 bytes, patch)
2019-03-26 09:34 UTC, John Dong
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Dong 2019-01-07 12:06:40 UTC
Created attachment 45364 [details]
demo code to recur error

#include<stdio.h>
typedef  unsigned int U32;
typedef  unsigned short  U16;
typedef  unsigned char U8;

typedef struct MEM_HEAD_4DW
{
    /* first word */
    U32 b11               : 1;
    U32 b12               : 3;
    U32 b13               : 3;
    U32 b14               : 1;
    U32 b15               : 16;
    U32 b16               : 8;

    /* second word */
    U32 b21               : 16;
    U32 b22               : 1;
    U32 b23               : 4;
    U32 b24               : 11;

    /* third word */
    U32 b31               : 32;

    /* fourth word */
    U32 b30AddrL          : 30;
    U32 b2AddrType        : 2;
}MEM_REQ_HEAD_4DW;

typedef union HEAD_DW4_UNION
{
    MEM_REQ_HEAD_4DW strMemHead;
    U32 aulValue[4];
    U16 ausValue[8];
            U8  aucValue[16];
}HEAD_REQ_DW4_UNION;



U32 Test_func(U32 ulAddr)
{
    HEAD_REQ_DW4_UNION unData;

    unData.strMemHead.b30AddrL        = ulAddr >> 2;
    unData.strMemHead.b2AddrType      = 0;
    printf("unData.strMemHead.b30AddrL=0x%x\r\n",unData.strMemHead.b30AddrL);
    printf("unData.strMemHead.b2AddrType=0x%x\r\n",unData.strMemHead.b2AddrType);
    printf("unData.aulValue[3]=0x%x\r\n",unData.aulValue[3]);       // 0000 0000 0000 0001 0000 0010 0010 0100
    printf("unData.ausValue[6]=0x%x\r\n",unData.ausValue[6]);       // why get 0x0 instead of 0x1 ?
    printf("unData.ausValue[7]=0x%x\r\n",unData.ausValue[7]);
            
    return 0;
}

int main()

{
  Test_func(0x10224);
  return 0;
}

Hi,

For ARM64 big-endian, when compiled above code with -O0, the output of ausValue[6] is 0x1 which is correct, but when I try to compile with -O1 or -O2, the output of ausValue[6] is 0x0 which is wrong. Will appreciate if you can help me look into this issue? is this gcc bug?
Comment 1 Wilco 2019-01-07 16:17:24 UTC
Confirmed - this is a generic mid-end bug that happens in fre1 when it replaces the bitfield store and load with the wrong bitfield extract:

Test_func (U32 ulAddr)
{
  union HEAD_REQ_DW4_UNION unData;
  unsigned int _1;
  <unnamed-unsigned:30> _2;
  int _4;
  unsigned int _7;
  int _9;
  short unsigned int _10;
  int _11;
  short unsigned int _23;

  <bb 2> :
  _1 = ulAddr_12(D) >> 2;
  _2 = (<unnamed-unsigned:30>) _1;
  unData.strMemHead.b30AddrL = _2;
  unData.strMemHead.b2AddrType = 0;
  _4 = (int) _2;
  printf ("unData.strMemHead.b30AddrL=0x%x\r\n", _4);
  printf ("unData.strMemHead.b2AddrType=0x%x\r\n", 0);
  _7 = unData.aulValue[3];
  printf ("unData.aulValue[3]=0x%x\r\n", _7);
  _23 = BIT_FIELD_REF <_2, 16, 0>;            // WRONG: should be _2, 14, 0
  _9 = (int) _23;
  printf ("unData.ausValue[6]=0x%x\r\n", _9);
  _10 = unData.ausValue[7];
  _11 = (int) _10;
  printf ("unData.ausValue[7]=0x%x\r\n", _11);
  unData ={v} {CLOBBER};
  return 0;

}
Comment 2 Richard Earnshaw 2019-01-07 16:25:57 UTC
>   _23 = BIT_FIELD_REF <_2, 16, 0>;            // WRONG: should be _2, 14, 0

_2 is declared as a 30-bit integer, so perhaps the statement is right, but expand needs to understand that the shift extract of the top 16 bits comes from a different location in big-endian.
Comment 3 Wilco 2019-01-07 16:27:55 UTC
(In reply to Richard Earnshaw from comment #2)
> >   _23 = BIT_FIELD_REF <_2, 16, 0>;            // WRONG: should be _2, 14, 0
> 
> _2 is declared as a 30-bit integer, so perhaps the statement is right, but
> expand needs to understand that the shift extract of the top 16 bits comes
> from a different location in big-endian.

So the question becomes what format is this in?

  <unnamed-unsigned:30> _2;

Is it big-endian memory format (so value is in top 30 bits) or simply a 30-bit value in a virtual register?
Comment 4 Richard Earnshaw 2019-01-07 17:32:05 UTC
manual inspection of the output from gcc-5.4.0 suggests this version produces correct code.
Comment 5 Richard Earnshaw 2019-01-08 10:11:55 UTC
Also on Arm and probably other big-endian machines as well.
Comment 6 Richard Biener 2019-01-08 10:29:02 UTC
(In reply to Wilco from comment #3)
> (In reply to Richard Earnshaw from comment #2)
> > >   _23 = BIT_FIELD_REF <_2, 16, 0>;            // WRONG: should be _2, 14, 0
> > 
> > _2 is declared as a 30-bit integer, so perhaps the statement is right, but
> > expand needs to understand that the shift extract of the top 16 bits comes
> > from a different location in big-endian.
> 
> So the question becomes what format is this in?
> 
>   <unnamed-unsigned:30> _2;
> 
> Is it big-endian memory format (so value is in top 30 bits) or simply a
> 30-bit value in a virtual register?

The middle-end (GIMPLE) thinks this is a 30-bit value in a virtual register.
And BIT_FIELD_REF <..., 16, 0> reads the first (counting from LSB) 16 bits.

That is, as far as I understand "endianess" is irrelevant for registers
but matters for memory.

We expand

  _1 = ulAddr_3(D) >> 2;
  _2 = (<unnamed-unsigned:30>) _1;
  _6 = BIT_FIELD_REF <_2, 16, 0>;

to (_6 is unsigned short)

(insn 6 5 7 (set (reg:SI 95)
        (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
            (const_int 2 [0x2]))) "t.c":42:48 -1
     (nil))

(insn 7 6 8 (set (reg:SI 96)
        (and:SI (reg:SI 95)
            (const_int 1073741823 [0x3fffffff]))) "t.c":42:48 -1
     (nil))

(insn 8 7 9 (set (subreg:DI (reg:HI 97) 0)
        (zero_extract:DI (subreg:DI (reg:SI 96) 0)
            (const_int 16 [0x10])
            (const_int 16 [0x10]))) "t.c":44:8 -1
     (nil))

now I suppose for subregs (and its offset) endianess starts to matter.
Now my head of course starts to hurt when we build a paradoxical DImode
subreg of a SImode reg in big endian.

But going back what possibly goes wrong is when FRE does

Value numbering stmt = unData.strMemHead.b30AddrL = _2;
No store match
Value numbering store unData.strMemHead.b30AddrL to _2
..
Value numbering stmt = _3 = unData.ausValue[6];
Inserting name _9 for expression BIT_FIELD_REF <_2, 16, 0>
Setting value number of _3 to _9 (changed)

it analyzes unData.strMemHead.b30AddrL to be a reference at
a bit-offset with some bit-size, matching that up with the
same data from unData.ausValue[6] and translating that to
a BIT_FIELD_REF:

      base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
                                       &offset2, &size2, &maxsize2,
                                       &reverse);
      if (!reverse
          && known_size_p (maxsize2)
          && known_eq (maxsize2, size2)
          && operand_equal_p (base, base2, 0)
          && known_subrange_p (offset, maxsize, offset2, size2)
          /* ???  We can't handle bitfield precision extracts without
             either using an alternate type for the BIT_FIELD_REF and
             then doing a conversion or possibly adjusting the offset
             according to endianness.  */
          && (! INTEGRAL_TYPE_P (vr->type)
              || known_eq (ref->size, TYPE_PRECISION (vr->type)))
          && multiple_p (ref->size, BITS_PER_UNIT))
        {
          gimple_match_op op (gimple_match_cond::UNCOND,
                              BIT_FIELD_REF, vr->type,
                              vn_valueize (gimple_assign_rhs1 (def_stmt)),
                              bitsize_int (ref->size),
                              bitsize_int (offset - offset2));

here def_stmt is unData.strMemHead.b30AddrL = _2 while offset / ref
are from the load.  There's already a comment about endianess but
it's oddly applied to ref->size vs. vr->type precision equality.
I think we need adjustments whenever ref->size (from the load) is
not equal to size2 (from the store)?  That is, arbitrary sub-parts,
while contiguous in memory, might not be contiguous in the register?

*head hurts*

for the testcase

(gdb) p ref->size
$1 = {<poly_int_pod<2u, long>> = {coeffs = {16, 0}}, <No data fields>}
(gdb) p size2
$2 = {<poly_int_pod<2u, long>> = {coeffs = {30, 0}}, <No data fields>}

and offset == offset2 == 0.

Oh, and there's of course a plethora of variants, not to mention
FLOAT_WORDS_BIG_ENDIAN and REG_WORDS_BIG_ENDIAN.

Note the code above was exactly added to elide this kind of memory
operation...

Since the folding happens only since GCC 7 this is a regression.

Andrew somewhere mentioned that BIT_INSERT_EXPR expansion is also
wrong for BE (it's currently only used for vector element stuff
so that's a latent issue).
Comment 7 Richard Biener 2019-01-08 10:36:25 UTC
So FRE tries to match up a store with (possibly bit-precision) type T with a
load of (possibly bit-precision) type U.  The load and store are analyzed
by get_ref_base_and_extent to a common base object hand happen at
bit-position offset/offset2 with bitsize size/size2.  Since they are
memory ops all the *_BIG_ENDIAN "matter" (since they are typed even
FLOAT_WORDS_BIG_ENDIAN might matter).  In the end we are BIT_FIELD_REFing
the stored _register_ though...
Comment 8 Richard Biener 2019-01-08 11:55:35 UTC
So I think part of a fix would be the following.  Not sure if
REG_WORDS_BIG_ENDIAN or FLOAT_WORDS_BIG_ENDIAN come into play.
With the fix we no longer simplify this for aarch64 since
BITS_BIG_ENDIAN is 0 even in BE mode.

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c        (revision 267553)
+++ gcc/tree-ssa-sccvn.c        (working copy)
@@ -2229,13 +2229,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
             according to endianness.  */
          && (! INTEGRAL_TYPE_P (vr->type)
              || known_eq (ref->size, TYPE_PRECISION (vr->type)))
-         && multiple_p (ref->size, BITS_PER_UNIT))
+         && multiple_p (ref->size, BITS_PER_UNIT)
+         && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
+         && BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN)
        {
          gimple_match_op op (gimple_match_cond::UNCOND,
                              BIT_FIELD_REF, vr->type,
                              vn_valueize (gimple_assign_rhs1 (def_stmt)),
                              bitsize_int (ref->size),
-                             bitsize_int (offset - offset2));
+                             bitsize_int
+                               (BYTES_BIG_ENDIAN
+                                ? size2 - (offset - offset2) - ref->size
+                                : offset - offset2));
          tree val = vn_nary_build_or_lookup (&op);
          if (val
              && (TREE_CODE (val) != SSA_NAME


In case both sizes/offsets
are mutliples of BITS_PER_UNIT BITS_BIG_ENDIAN shouldn't matter
though (likewise if size2 < WORD_SIZE).  So a less conservative check
might be

          && (known_le (size2, BITS_PER_WORD)
              || BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN)
          && (BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN
              || (multiple_p (offset2, BITS_PER_UNIT)
                  && multiple_p (offset, BITS_PER_UNIT)
                  && (!BYTES_BIG_ENDIAN
                      || multiple_p (size2, BITS_PER_UNIT)))))

but I guess we lack exhaustive testing (and a list of targets that cover
all combinations of endianesses).

That is, not sure if BIT_FIELD_REF <_2, 16, 14> would be correct as suggested
for aarch64_be because of that BITS_BIG_ENDIAN setting.
Comment 9 Richard Biener 2019-01-08 12:09:30 UTC
So somehow build up exhaustive testing via

struct S # n # AT # m { \
    T pad : m; \
    T val : n; \
};
T __attribute__((noinline)) load8at0 () { return u.s8at0.val; }
...
union U {
    T val;
    struct S8at0 s8at0;
...
} u;
int main()
{
  volatile T val = 0x0102030405060708;
  u.val = val;
  if (u.s8at0.val != load8at0 ())
    __builtin_abort ();
...
  return 0;
}

with T possibly unsigned __int128_t, for n in {8, 16, 32, 64}
and m in [0, 128 - n].
Comment 10 Richard Biener 2019-01-08 12:27:02 UTC
Created attachment 45377 [details]
exhaustive testcase

testcase, compile with -DT="unsigned long".
Comment 11 Richard Biener 2019-01-08 12:28:13 UTC
Created attachment 45378 [details]
script to generate testcase

Script to generate a testcase.  The attached was created by

./t.sh 64 > t.c

expanding to __int128_t would be possible with passing 128.

trivial offset zero loads are not exercised.
Comment 12 John Dong 2019-01-08 12:29:17 UTC
Created attachment 45379 [details]
temp patch for Bug 88739

I tried to fix this bug with attached patch, but we not sure if BIT_FIELD_REF ((unsigned:30)var, 16, 0) would appear in another situation, unless we can make sure it will trigger the wrong input when expanding. please advise.
Comment 13 John Dong 2019-01-08 12:31:08 UTC
(In reply to John Dong from comment #12)
> Created attachment 45379 [details]
> temp patch for Bug 88739
> 
> I tried to fix this bug with attached patch, but we not sure if
> BIT_FIELD_REF ((unsigned:30)var, 16, 0) would appear in another situation,
> unless we can make sure it will trigger the wrong input when expanding.
> please advise.

sorry, unless we can make sure it will not trigger the wrong input when expanding.
Comment 14 Richard Biener 2019-01-08 12:37:06 UTC
Created attachment 45380 [details]
patch

Can folks please test this on aarch64_be (and maybe arm_be as well to cover
the multi-word case aka WORDS_BIG_ENDIAN).

I've commented the BITS_BIG_ENDIAN parts but I think it will be necessary
for aarch64_be/arm_be so you should see the new testcase FAIL.

It should pass on powerpc BE and s390x.
Comment 15 Richard Biener 2019-01-08 12:41:00 UTC
Oh, and with making U.val a same-sized float type one might be able to
test FLOAT_WORDS_BIG_ENDIAN effects (only mmix, pdp11, tilegx and visum
affected).

Not sure about REG_WORDS_BIG_ENDIAN at all (c6x is the only target defining this).
Comment 16 Wilco 2019-01-08 12:58:07 UTC
(In reply to Richard Biener from comment #8)
> So I think part of a fix would be the following.  Not sure if
> REG_WORDS_BIG_ENDIAN or FLOAT_WORDS_BIG_ENDIAN come into play.
> With the fix we no longer simplify this for aarch64 since
> BITS_BIG_ENDIAN is 0 even in BE mode.

A conservative fix to only do this for little endian would be fine given one shouldn't really do this kind of low-level hacking frequently. I guess FLOAT_WORDS_BIG_ENDIAN will matter too since you could store a double and read back part of it as a bitfield (or store a __int128 bitfield and read a double back). No idea why it would use BITS_BIG_ENDIAN

> That is, not sure if BIT_FIELD_REF <_2, 16, 14> would be correct as suggested
> for aarch64_be because of that BITS_BIG_ENDIAN setting.

Possibly - but then <_2, 16, 0> doing a right shift by 16 would be an incorrect expansion for big-endian. So something is wrong there...

I think we need to simplify the many BIG_ENDIAN macros so it is feasible to get big-endian to work reliably on all targets. There seem to be far too many options which affect too many unrelated things. Big-endian is fundamentally about memory byte ordering, so allowing to different byte/bit orderings in registers just makes things overly complex without any benefit.
Comment 17 rguenther@suse.de 2019-01-08 13:07:14 UTC
On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #16 from Wilco <wilco at gcc dot gnu.org> ---
> I think we need to simplify the many BIG_ENDIAN macros so it is feasible to get
> big-endian to work reliably on all targets. There seem to be far too many
> options which affect too many unrelated things. Big-endian is fundamentally
> about memory byte ordering, so allowing to different byte/bit orderings in
> registers just makes things overly complex without any benefit.

It's unfortunately not the compiler writers choice but the CPU designers.
Comment 18 Segher Boessenkool 2019-01-08 18:22:51 UTC
Well, it is always possible to generate code with the opposite endianness to
what the hardware "wants".  It just won't be very fast code.

BITS_BIG_ENDIAN is just a convenience to the target code writer.  The other
four do matter, and are quite obvious really (and all four are necessary).
Comment 19 Wilco 2019-01-08 19:08:11 UTC
(In reply to Segher Boessenkool from comment #18)
> Well, it is always possible to generate code with the opposite endianness to
> what the hardware "wants".  It just won't be very fast code.
> 
> BITS_BIG_ENDIAN is just a convenience to the target code writer.  The other
> four do matter, and are quite obvious really (and all four are necessary).

What I'm suggesting is that if you have a non-standard endian layout (eg. FLOAT_WORDS_BIG_ENDIAN, WORDS_BIG_ENDIAN and REG_WORDS_BIG_ENDIAN are not the same as BYTES_BIG_ENDIAN), optimizations which depend the particular layout should be disabled. There are far too few occurrences of these macros to believe that all possible combinations will work correctly on all optimizations that rely on it.
Comment 20 Eric Botcazou 2019-01-08 20:59:21 UTC
> BITS_BIG_ENDIAN is just a convenience to the target code writer.  The other
> four do matter, and are quite obvious really (and all four are necessary).

Yes, I agree that BITS_BIG_ENDIAN shouldn't matter here, it's pure RTL stuff.
Comment 21 Wilco 2019-01-08 21:24:40 UTC
(In reply to Eric Botcazou from comment #20)
> > BITS_BIG_ENDIAN is just a convenience to the target code writer.  The other
> > four do matter, and are quite obvious really (and all four are necessary).
> 
> Yes, I agree that BITS_BIG_ENDIAN shouldn't matter here, it's pure RTL stuff.

Is it really pure RTL, therefore not used in tree? So the above patch using BITS_BIG_ENDIAN for tree stuff would be incorrect to use it?
Comment 22 Eric Botcazou 2019-01-08 22:26:31 UTC
> Is it really pure RTL, therefore not used in tree? So the above patch using
> BITS_BIG_ENDIAN for tree stuff would be incorrect to use it?

I wouldn't say incorrect, just inappropriate and unnecessary.  And, yes, it isn't used at the tree level and should stay so IMO.  BYTES_BIG_ENDIAN alone already implicitly enforces a numbering on bits.
Comment 23 John Dong 2019-01-09 06:54:42 UTC
diff -urp a/gcc/expr.c b/gcc/expr.c
--- a/gcc/expr.c	2019-01-09 03:19:03.750205982 +0800
+++ b/gcc/expr.c	2019-01-09 03:38:23.414174738 +0800
@@ -10760,6 +10760,16 @@ expand_expr_real_1 (tree exp, rtx target
 		&& GET_MODE_CLASS (ext_mode) == MODE_INT)
 	      reversep = TYPE_REVERSE_STORAGE_ORDER (type);
 
+		int modePrecision = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (tem)));
+		int typePrecision = TYPE_PRECISION (TREE_TYPE (tem));
+		int shiftSize = modePrecision - typePrecision;
+		rtx regTarget = gen_reg_rtx (GET_MODE (op0));
+
+		if (shiftSize && REG_P (op0))
+		  op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
+				      shiftSize, regTarget,
+				      TYPE_UNSIGNED (TREE_TYPE (tem)));
+
 	    op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp,
 				     (modifier == EXPAND_STACK_PARM
 				      ? NULL_RTX : target),

Tried to fix the bug when expand.
Comment 24 rguenther@suse.de 2019-01-09 08:46:47 UTC
On Wed, 9 Jan 2019, dongjianqiang2 at huawei dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #23 from John Dong <dongjianqiang2 at huawei dot com> ---
> diff -urp a/gcc/expr.c b/gcc/expr.c
> --- a/gcc/expr.c        2019-01-09 03:19:03.750205982 +0800
> +++ b/gcc/expr.c        2019-01-09 03:38:23.414174738 +0800
> @@ -10760,6 +10760,16 @@ expand_expr_real_1 (tree exp, rtx target
>                 && GET_MODE_CLASS (ext_mode) == MODE_INT)
>               reversep = TYPE_REVERSE_STORAGE_ORDER (type);
> 
> +               int modePrecision = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE
> (tem)));
> +               int typePrecision = TYPE_PRECISION (TREE_TYPE (tem));
> +               int shiftSize = modePrecision - typePrecision;
> +               rtx regTarget = gen_reg_rtx (GET_MODE (op0));
> +
> +               if (shiftSize && REG_P (op0))
> +                 op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
> +                                     shiftSize, regTarget,
> +                                     TYPE_UNSIGNED (TREE_TYPE (tem)));
> +
>             op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp,
>                                      (modifier == EXPAND_STACK_PARM
>                                       ? NULL_RTX : target),
> 
> Tried to fix the bug when expand.

The bug is clearly in value-numbering, not RTL expansion
Comment 25 Wilco 2019-01-09 12:27:03 UTC
(In reply to rguenther@suse.de from comment #17)
> On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > 
> > --- Comment #16 from Wilco <wilco at gcc dot gnu.org> ---
> > I think we need to simplify the many BIG_ENDIAN macros so it is feasible to get
> > big-endian to work reliably on all targets. There seem to be far too many
> > options which affect too many unrelated things. Big-endian is fundamentally
> > about memory byte ordering, so allowing to different byte/bit orderings in
> > registers just makes things overly complex without any benefit.
> 
> It's unfortunately not the compiler writers choice but the CPU designers.

It's more a bad ABI choice. The initial Arm ABI had 4-byte aligned little-endian long long and big-endian doubles! ARM2 only supported little-endian so it didn't matter at the time. However it doesn't allow unaligned accesses, tightly packed bitfields and runtime endian swapping as required by the embedded space, or hardware floating point. No surprise it was replaced by the Arm EABI.
Comment 26 Richard Biener 2019-01-09 12:30:04 UTC
(In reply to Wilco from comment #25)
> (In reply to rguenther@suse.de from comment #17)
> > On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > > 
> > > --- Comment #16 from Wilco <wilco at gcc dot gnu.org> ---
> > > I think we need to simplify the many BIG_ENDIAN macros so it is feasible to get
> > > big-endian to work reliably on all targets. There seem to be far too many
> > > options which affect too many unrelated things. Big-endian is fundamentally
> > > about memory byte ordering, so allowing to different byte/bit orderings in
> > > registers just makes things overly complex without any benefit.
> > 
> > It's unfortunately not the compiler writers choice but the CPU designers.
> 
> It's more a bad ABI choice. The initial Arm ABI had 4-byte aligned
> little-endian long long and big-endian doubles! ARM2 only supported
> little-endian so it didn't matter at the time. However it doesn't allow
> unaligned accesses, tightly packed bitfields and runtime endian swapping as
> required by the embedded space, or hardware floating point. No surprise it
> was replaced by the Arm EABI.

Whatever ;)

Did anybody test the patch?  Testing on x86_64 will be quite pointless...
Comment 27 Wilco 2019-01-09 12:32:20 UTC
(In reply to Eric Botcazou from comment #22)
> > Is it really pure RTL, therefore not used in tree? So the above patch using
> > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it?
> 
> I wouldn't say incorrect, just inappropriate and unnecessary.  And, yes, it
> isn't used at the tree level and should stay so IMO.  BYTES_BIG_ENDIAN alone
> already implicitly enforces a numbering on bits.

I mean incorrect as in the optimization would still trigger and give incorrect results if BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN (given that BITS_BIG_ENDIAN has no bearing on the bitfield offsets used on tree level).
Comment 28 Richard Biener 2019-01-09 12:52:18 UTC
(In reply to Wilco from comment #27)
> (In reply to Eric Botcazou from comment #22)
> > > Is it really pure RTL, therefore not used in tree? So the above patch using
> > > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it?
> > 
> > I wouldn't say incorrect, just inappropriate and unnecessary.  And, yes, it
> > isn't used at the tree level and should stay so IMO.  BYTES_BIG_ENDIAN alone
> > already implicitly enforces a numbering on bits.
> 
> I mean incorrect as in the optimization would still trigger and give
> incorrect results if BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN (given that
> BITS_BIG_ENDIAN has no bearing on the bitfield offsets used on tree level).

Given that it matters for

  /* If I2 is setting a pseudo to a constant and I3 is setting some
     sub-part of it to another constant, merge them by making a new
     constant.  */
  if (i1 == 0
...
      if (GET_CODE (dest) == ZERO_EXTRACT)
        {
...
              if (BITS_BIG_ENDIAN)
                offset = GET_MODE_PRECISION (dest_mode) - width - offset;

and VN tries to do sth similar I wonder if it does matter after all...

That said, the docs also refer to 'bit-field instructions' but do not
elaborate further -- I guess zero_extract is such but I'd have guessed
BIT_FIELD_REF (on trees) is as well.  But yes, RTL expansion adjusts
things based on BITS_BIG_ENDIAN so it looks like GENERIC doesn't care
(or assumes BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN).
Comment 29 Wilco 2019-01-09 12:56:45 UTC
(In reply to Richard Biener from comment #26)

> Did anybody test the patch?  Testing on x86_64 will be quite pointless...

Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes:

ubfx	x1, x20, 2, 16

This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue is that we're using a bitfield reference on a value that is claimed not to be a bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work correctly.
Comment 30 Eric Botcazou 2019-01-09 13:01:09 UTC
> That said, the docs also refer to 'bit-field instructions' but do not
> elaborate further -- I guess zero_extract is such but I'd have guessed
> BIT_FIELD_REF (on trees) is as well.  But yes, RTL expansion adjusts
> things based on BITS_BIG_ENDIAN so it looks like GENERIC doesn't care
> (or assumes BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN).

Yes, BYTES_BIG_ENDIAN is implicitly propagated to bits at the tree level.
I don't think that we want to support BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN at the tree level, that would be a nightmare.
Comment 31 rguenther@suse.de 2019-01-09 13:19:59 UTC
On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #29 from Wilco <wilco at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #26)
> 
> > Did anybody test the patch?  Testing on x86_64 will be quite pointless...
> 
> Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes:
> 
> ubfx    x1, x20, 2, 16
> 
> This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue is
> that we're using a bitfield reference on a value that is claimed not to be a
> bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work
> correctly.

So that's because TYPE_PRECISION != GET_MODE_PRECISION and the
BIT_FIELD_REF expansion counting from GET_MODE_PRECISION I suppose.

Thus there is a RTL expansion side of the bug after all?

The "fixed" RTL is

(insn 6 5 7 (set (reg:SI 95)
        (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
            (const_int 2 [0x2]))) "t.c":42:48 -1
     (nil))

(insn 7 6 8 (set (reg:SI 96)
        (and:SI (reg:SI 95)
            (const_int 1073741823 [0x3fffffff]))) "t.c":42:48 -1
     (nil))

(insn 8 7 9 (set (subreg:DI (reg:HI 97) 0)
        (zero_extract:DI (subreg:DI (reg:SI 96) 0)
            (const_int 16 [0x10])
            (const_int 2 [0x2]))) "t.c":44:8 -1
     (nil))

so the 30bit value is in reg:SI 96 (the :30 cast causes the
and with 0x3fffffff) but then the zero_extract we generate
is bogus.

So maybe the :30 cast should have been a shift for BYTES_BIG_ENDIAN?

We might be able to work around this by optimization on GIMPLE,
combining

  _1 = ulAddr_3(D) >> 2;
  _2 = (<unnamed-unsigned:30>) _1;
  _6 = BIT_FIELD_REF <_2, 16, 14>;

as far as eliminating at least the non-mode precision type...

Of course that would just work around the underlying RTL expansion
bug?

Note we can end up with things like

 _2 = (<unnamed-unsigned:35) _1;
 _3 = (<unnamed-unsigned:35) _4;
 _5 = _2 + 3;

as well so shifting at the conversion might not be the correct
answer (but instead BIT_FIELD_REF expansion needs to be fixed).

Alternatively we could declare it invalid GIMPLE and require
BIT_FIELD_REF positions to be always relative to the mode
(but then I'd rather disallow BIT_FIELD_REF on non-mode
precision entities...).

Sth like the following might fix the RTL expansion issue
which then generates

Test_func:
        ubfx    x0, x0, 2, 16
        cmp     w0, 1
        bne     .L6
        mov     w0, 0

and just

(insn 6 5 7 (set (reg:SI 95)
        (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
            (const_int 2 [0x2]))) "t.c":42:48 -1
     (nil))

(insn 7 6 8 (set (reg:SI 96)
        (and:SI (reg:SI 95)
            (const_int 1073741823 [0x3fffffff]))) "t.c":42:48 -1
     (nil))

(insn 8 7 9 (set (reg:SI 97)
        (zero_extend:SI (subreg:HI (reg:SI 96) 2))) "t.c":44:8 -1
     (nil))

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 267553)
+++ gcc/expr.c  (working copy)
@@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target
           infinitely recurse.  */
        gcc_assert (tem != exp);
 
+       /* When extracting from non-mode bitsize entities adjust the
+          bit position for BYTES_BIG_ENDIAN.  */
+       if (INTEGRAL_TYPE_P (TREE_TYPE (tem))
+           && (TYPE_PRECISION (TREE_TYPE (tem))
+               < GET_MODE_BITSIZE (as_a <scalar_int_mode> (TYPE_MODE 
(TREE_TYPE (tem)))))
+           && BYTES_BIG_ENDIAN)
+         bitpos += (GET_MODE_BITSIZE (as_a <scalar_int_mode> (TYPE_MODE 
(TREE_TYPE (tem))))
+                    - TYPE_PRECISION (TREE_TYPE (tem)));
+
        /* If TEM's type is a union of variable size, pass TARGET to the 
inner
           computation, since it will need a temporary and TARGET is known
           to have to do.  This occurs in unchecked conversion in Ada.  */
Comment 32 Richard Biener 2019-01-09 13:23:51 UTC
(In reply to rguenther@suse.de from comment #31)
> On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > 
> > --- Comment #29 from Wilco <wilco at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #26)
> > 
> > > Did anybody test the patch?  Testing on x86_64 will be quite pointless...
> > 
> > Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes:
> > 
> > ubfx    x1, x20, 2, 16
> > 
> > This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue is
> > that we're using a bitfield reference on a value that is claimed not to be a
> > bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work
> > correctly.
> 
> So that's because TYPE_PRECISION != GET_MODE_PRECISION and the
> BIT_FIELD_REF expansion counting from GET_MODE_PRECISION I suppose.
> 
> Thus there is a RTL expansion side of the bug after all?
> 
> The "fixed" RTL is
> 
> (insn 6 5 7 (set (reg:SI 95)
>         (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
>             (const_int 2 [0x2]))) "t.c":42:48 -1
>      (nil))
> 
> (insn 7 6 8 (set (reg:SI 96)
>         (and:SI (reg:SI 95)
>             (const_int 1073741823 [0x3fffffff]))) "t.c":42:48 -1
>      (nil))
> 
> (insn 8 7 9 (set (subreg:DI (reg:HI 97) 0)
>         (zero_extract:DI (subreg:DI (reg:SI 96) 0)
>             (const_int 16 [0x10])
>             (const_int 2 [0x2]))) "t.c":44:8 -1
>      (nil))
> 
> so the 30bit value is in reg:SI 96 (the :30 cast causes the
> and with 0x3fffffff) but then the zero_extract we generate
> is bogus.
> 
> So maybe the :30 cast should have been a shift for BYTES_BIG_ENDIAN?
> 
> We might be able to work around this by optimization on GIMPLE,
> combining
> 
>   _1 = ulAddr_3(D) >> 2;
>   _2 = (<unnamed-unsigned:30>) _1;
>   _6 = BIT_FIELD_REF <_2, 16, 14>;
> 
> as far as eliminating at least the non-mode precision type...
> 
> Of course that would just work around the underlying RTL expansion
> bug?
> 
> Note we can end up with things like
> 
>  _2 = (<unnamed-unsigned:35) _1;
>  _3 = (<unnamed-unsigned:35) _4;
>  _5 = _2 + 3;
> 
> as well so shifting at the conversion might not be the correct
> answer (but instead BIT_FIELD_REF expansion needs to be fixed).
> 
> Alternatively we could declare it invalid GIMPLE and require
> BIT_FIELD_REF positions to be always relative to the mode
> (but then I'd rather disallow BIT_FIELD_REF on non-mode
> precision entities...).
> 
> Sth like the following might fix the RTL expansion issue
> which then generates
> 
> Test_func:
>         ubfx    x0, x0, 2, 16
>         cmp     w0, 1
>         bne     .L6
>         mov     w0, 0
> 
> and just
> 
> (insn 6 5 7 (set (reg:SI 95)
>         (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
>             (const_int 2 [0x2]))) "t.c":42:48 -1
>      (nil))
> 
> (insn 7 6 8 (set (reg:SI 96)
>         (and:SI (reg:SI 95)
>             (const_int 1073741823 [0x3fffffff]))) "t.c":42:48 -1
>      (nil))
> 
> (insn 8 7 9 (set (reg:SI 97)
>         (zero_extend:SI (subreg:HI (reg:SI 96) 2))) "t.c":44:8 -1
>      (nil))
> 
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  (revision 267553)
> +++ gcc/expr.c  (working copy)
> @@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target
>            infinitely recurse.  */
>         gcc_assert (tem != exp);
>  
> +       /* When extracting from non-mode bitsize entities adjust the
> +          bit position for BYTES_BIG_ENDIAN.  */
> +       if (INTEGRAL_TYPE_P (TREE_TYPE (tem))
> +           && (TYPE_PRECISION (TREE_TYPE (tem))
> +               < GET_MODE_BITSIZE (as_a <scalar_int_mode> (TYPE_MODE 
> (TREE_TYPE (tem)))))
> +           && BYTES_BIG_ENDIAN)
> +         bitpos += (GET_MODE_BITSIZE (as_a <scalar_int_mode> (TYPE_MODE 
> (TREE_TYPE (tem))))
> +                    - TYPE_PRECISION (TREE_TYPE (tem)));
> +
>         /* If TEM's type is a union of variable size, pass TARGET to the 
> inner
>            computation, since it will need a temporary and TARGET is known
>            to have to do.  This occurs in unchecked conversion in Ada.  */

Btw, this needs to be amended for WORDS_BIG_ENDIAN of course.  I guess
we might even run into the case that such BIT_FIELD_REF references
a non-contiguous set of bits... (that's also true for BITS_BIG_ENDIAN !=
BYTES_BIG_ENDIAN I guess).
Comment 33 Wilco 2019-01-09 14:39:15 UTC
(In reply to Richard Biener from comment #32)

> > 
> > Index: gcc/expr.c
> > ===================================================================
> > --- gcc/expr.c  (revision 267553)
> > +++ gcc/expr.c  (working copy)
> > @@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target
> >            infinitely recurse.  */
> >         gcc_assert (tem != exp);
> >  
> > +       /* When extracting from non-mode bitsize entities adjust the
> > +          bit position for BYTES_BIG_ENDIAN.  */
> > +       if (INTEGRAL_TYPE_P (TREE_TYPE (tem))
> > +           && (TYPE_PRECISION (TREE_TYPE (tem))
> > +               < GET_MODE_BITSIZE (as_a <scalar_int_mode> (TYPE_MODE 
> > (TREE_TYPE (tem)))))
> > +           && BYTES_BIG_ENDIAN)
> > +         bitpos += (GET_MODE_BITSIZE (as_a <scalar_int_mode> (TYPE_MODE 
> > (TREE_TYPE (tem))))
> > +                    - TYPE_PRECISION (TREE_TYPE (tem)));
> > +
> >         /* If TEM's type is a union of variable size, pass TARGET to the 
> > inner
> >            computation, since it will need a temporary and TARGET is known
> >            to have to do.  This occurs in unchecked conversion in Ada.  */
> 
> Btw, this needs to be amended for WORDS_BIG_ENDIAN of course.  I guess
> we might even run into the case that such BIT_FIELD_REF references
> a non-contiguous set of bits... (that's also true for BITS_BIG_ENDIAN !=
> BYTES_BIG_ENDIAN I guess).

Was that meant to be instead or in addition to the tree-ssa-sccvn.c patch? With both I get:

	lsr	w20, w1, 2
        ...
	and	w1, w20, 65535

With only the expr.c patch it starts to look as expected:

	lsr	w20, w1, 2
        ...
	lsr	w1, w20, 14

And with the latter case the new torture test now passes on big-endian!
Comment 34 Wilco 2019-01-09 17:12:10 UTC
With just the expr.c patch the gcc regression tests all pass on big-endian AArch64. Interestingly this includes the new torture test, ie. it does not trigger the union bug.
Comment 35 rsandifo@gcc.gnu.org 2019-01-09 17:33:10 UTC
Yeah, the expr.c patch makes the original testcase work, but we still fail for:

#include<stdio.h>
typedef  unsigned int U32;
typedef  unsigned short  U16;
typedef  unsigned char U8;

typedef struct MEM_HEAD_4DW
{
    /* first word */
    U32 b11               : 1;
    U32 b12               : 3;
    U32 b13               : 3;
    U32 b14               : 1;
    U32 b15               : 16;
    U32 b16               : 8;

    /* second word */
    U32 b21               : 16;
    U32 b22               : 1;
    U32 b23               : 4;
    U32 b24               : 11;

    /* third word */
    U32 b31               : 32;

    /* fourth word */
    U32 b30AddrL          : 30;
    U32 b2AddrType        : 2;
}MEM_REQ_HEAD_4DW;

typedef union HEAD_DW4_UNION
{
    MEM_REQ_HEAD_4DW strMemHead;
    U32 aulValue[4];
    U16 ausValue[8];
            U8  aucValue[16];
}HEAD_REQ_DW4_UNION;



U32 Test_func(U32 ulAddr)
{
    HEAD_REQ_DW4_UNION unData;

    unData.strMemHead.b30AddrL        = ulAddr >> 2;
    unData.strMemHead.b2AddrType      = 0;
    printf("unData.ausValue[6]=0x%x\r\n",unData.ausValue[6]);       // why get 0x0 instead of 0x1 ?
            
    return 0;
}

int main()

{
  Test_func(0x10224);
  return 0;
}

Like Wilco says, the torture test seems to pass with an unpatched compiler (but seems like a good thing to have anyway).
Comment 36 Eric Botcazou 2019-01-09 17:46:25 UTC
> Like Wilco says, the torture test seems to pass with an unpatched compiler
> (but seems like a good thing to have anyway).

Likewise on SPARC at any optimization level.
Comment 37 Wilco 2019-01-09 18:06:35 UTC
(In reply to rsandifo@gcc.gnu.org from comment #35)
> Yeah, the expr.c patch makes the original testcase work, but we still fail
> for:

That's the folding in ccp1 after inlining, which will require a similar fix. There are likely more places that need to be fixed to handle the 'short' bit types.
Comment 38 rsandifo@gcc.gnu.org 2019-01-09 18:58:32 UTC
Created attachment 45392 [details]
patch that changes get_ref_base_and_extent for bare SSA_NAMEs

(In reply to Wilco from comment #37)
> (In reply to rsandifo@gcc.gnu.org from comment #35)
> > Yeah, the expr.c patch makes the original testcase work, but we still fail
> > for:
> 
> That's the folding in ccp1 after inlining, which will require a similar fix.
> There are likely more places that need to be fixed to handle the 'short' bit
> types.

Yeah, seems like a can of worms.

The expr.c approach treats a reference to an N-bit integer in an
M>N-bit mode is relative to M rather than N (i.e. it's relative
to the addressable storage.)  So maybe the point this goes wrong
is when we ask for get_ref_base_and_extent on a bare 30-bit SSA_NAME
(no component accesses) and get back an offset of 0.  If everything's
relative to the addressable storage then maybe it should be 2 for
big-endian?

The attached patch does that and seems to pass all three tests
in the PR so far.  I'll give a spin overnight just in case
it's at least vaguely sensible.
Comment 39 rguenther@suse.de 2019-01-10 08:22:00 UTC
On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #34 from Wilco <wilco at gcc dot gnu.org> ---
> With just the expr.c patch the gcc regression tests all pass on big-endian
> AArch64. Interestingly this includes the new torture test, ie. it does not
> trigger the union bug.

Does it still pass if you apply the tree-ssa-sccvn.c patch or does
that break things again?

I'm somewhat confused that only the expr.c fix is enough...
Comment 40 rguenther@suse.de 2019-01-10 08:41:47 UTC
On Wed, 9 Jan 2019, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rsandifo at gcc dot gnu.org
> 
> --- Comment #35 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> Yeah, the expr.c patch makes the original testcase work, but we still fail for:
> 
> #include<stdio.h>
> typedef  unsigned int U32;
> typedef  unsigned short  U16;
> typedef  unsigned char U8;
> 
> typedef struct MEM_HEAD_4DW
> {
>     /* first word */
>     U32 b11               : 1;
>     U32 b12               : 3;
>     U32 b13               : 3;
>     U32 b14               : 1;
>     U32 b15               : 16;
>     U32 b16               : 8;
> 
>     /* second word */
>     U32 b21               : 16;
>     U32 b22               : 1;
>     U32 b23               : 4;
>     U32 b24               : 11;
> 
>     /* third word */
>     U32 b31               : 32;
> 
>     /* fourth word */
>     U32 b30AddrL          : 30;
>     U32 b2AddrType        : 2;
> }MEM_REQ_HEAD_4DW;
> 
> typedef union HEAD_DW4_UNION
> {
>     MEM_REQ_HEAD_4DW strMemHead;
>     U32 aulValue[4];
>     U16 ausValue[8];
>             U8  aucValue[16];
> }HEAD_REQ_DW4_UNION;
> 
> 
> 
> U32 Test_func(U32 ulAddr)
> {
>     HEAD_REQ_DW4_UNION unData;
> 
>     unData.strMemHead.b30AddrL        = ulAddr >> 2;
>     unData.strMemHead.b2AddrType      = 0;
>     printf("unData.ausValue[6]=0x%x\r\n",unData.ausValue[6]);       // why get
> 0x0 instead of 0x1 ?
> 
>     return 0;
> }
> 
> int main()
> 
> {
>   Test_func(0x10224);
>   return 0;
> }
> 
> Like Wilco says, the torture test seems to pass with an unpatched compiler (but
> seems like a good thing to have anyway).

Yeah, the torture test fails to have the "large" store being a bitfield 
one.  I'll see if I can reasonably extend the testcase to covert this 
case.  Or rather add variants of the testcase.

Richard.
Comment 41 rguenther@suse.de 2019-01-10 08:48:37 UTC
On Wed, 9 Jan 2019, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #38 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> Created attachment 45392 [details]
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45392&action=edit
> patch that changes get_ref_base_and_extent for bare SSA_NAMEs
> 
> (In reply to Wilco from comment #37)
> > (In reply to rsandifo@gcc.gnu.org from comment #35)
> > > Yeah, the expr.c patch makes the original testcase work, but we still fail
> > > for:
> > 
> > That's the folding in ccp1 after inlining, which will require a similar fix.
> > There are likely more places that need to be fixed to handle the 'short' bit
> > types.
> 
> Yeah, seems like a can of worms.
> 
> The expr.c approach treats a reference to an N-bit integer in an
> M>N-bit mode is relative to M rather than N (i.e. it's relative
> to the addressable storage.)  So maybe the point this goes wrong
> is when we ask for get_ref_base_and_extent on a bare 30-bit SSA_NAME
> (no component accesses) and get back an offset of 0.  If everything's
> relative to the addressable storage then maybe it should be 2 for
> big-endian?
> 
> The attached patch does that and seems to pass all three tests
> in the PR so far.  I'll give a spin overnight just in case
> it's at least vaguely sensible.

I considered this.  I guess we need to document this somewhere
though.  Incidentially the GIMPLE verifier already does

          if (!AGGREGATE_TYPE_P (TREE_TYPE (op))
              && maybe_gt (size + bitpos,
                           tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE 
(op)))))
            {
              error ("position plus size exceeds size of referenced object 
in "
                     "BIT_FIELD_REF");
              return true;
            }


so it uses TYPE_SIZE and not TYPE_PREICISON to verify the bounds of
the BIT_FIELD_REF access.

That said we should probably have exhaustive testing on this.
Maybe simply try to add GIMPLE testcases exercising the
BIT_FIELD_REF of bit-precision entities case.

I also wonder whether for the GIMPLE checking we want to verify
that for bit-precision OP the extracted range is within what
is valid (which depends on endianess then?).
Comment 42 Richard Biener 2019-01-14 12:21:07 UTC
(In reply to rguenther@suse.de from comment #41)
> On Wed, 9 Jan 2019, rsandifo at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > 
> > --- Comment #38 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> > Created attachment 45392 [details]
> >   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45392&action=edit
> > patch that changes get_ref_base_and_extent for bare SSA_NAMEs
> > 
> > (In reply to Wilco from comment #37)
> > > (In reply to rsandifo@gcc.gnu.org from comment #35)
> > > > Yeah, the expr.c patch makes the original testcase work, but we still fail
> > > > for:
> > > 
> > > That's the folding in ccp1 after inlining, which will require a similar fix.
> > > There are likely more places that need to be fixed to handle the 'short' bit
> > > types.
> > 
> > Yeah, seems like a can of worms.
> > 
> > The expr.c approach treats a reference to an N-bit integer in an
> > M>N-bit mode is relative to M rather than N (i.e. it's relative
> > to the addressable storage.)  So maybe the point this goes wrong
> > is when we ask for get_ref_base_and_extent on a bare 30-bit SSA_NAME
> > (no component accesses) and get back an offset of 0.  If everything's
> > relative to the addressable storage then maybe it should be 2 for
> > big-endian?

Btw, get_inner_reference should be changed the same way, likewise
eventually get_addr_base_and_unit_offset.

> > The attached patch does that and seems to pass all three tests
> > in the PR so far.  I'll give a spin overnight just in case
> > it's at least vaguely sensible.
> 
> I considered this.  I guess we need to document this somewhere
> though.  Incidentially the GIMPLE verifier already does
> 
>           if (!AGGREGATE_TYPE_P (TREE_TYPE (op))
>               && maybe_gt (size + bitpos,
>                            tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE 
> (op)))))
>             {
>               error ("position plus size exceeds size of referenced object 
> in "
>                      "BIT_FIELD_REF");
>               return true;
>             }
> 
> 
> so it uses TYPE_SIZE and not TYPE_PREICISON to verify the bounds of
> the BIT_FIELD_REF access.
> 
> That said we should probably have exhaustive testing on this.
> Maybe simply try to add GIMPLE testcases exercising the
> BIT_FIELD_REF of bit-precision entities case.
> 
> I also wonder whether for the GIMPLE checking we want to verify
> that for bit-precision OP the extracted range is within what
> is valid (which depends on endianess then?).
Comment 43 Richard Biener 2019-01-24 14:22:16 UTC
So to get back to this - my thinking was that for a reference REF I can do

 base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode, &unsignedp, &reversep, &volatilep);

and get the semantically same REF building

 REF' = BIT_FIELD_REF<*(&base + offset), bitsize, bitpos>

(plus setting REF_REVERSE_STORAGE_ORDER and TREE_THIS_VOLATILE on REF').

This appearantly breaks down (similarly for get_ref_base_and_extent) for
bigendian and DECL_BIT_FIELD outer COMPONENT_REFs.  And maybe for more?

Is my expectation that the above "works" flawed?  So "bit position"
and "position of the first referenced bit" are two separate things?

get_inner_reference will basically simply funnel through arguments so
any fix to get_inner_reference/get_ref_base_and_extent will be broken
since we'll apply it "recursively" when iterating the above
BIT_FIELD_REF building.  This means it is RTL expansion that is wrong
which means my comment#31 fix is correct?  Because the issue is not
the bit position but how we interpret the underlying object?
Comment 44 Richard Biener 2019-01-24 14:36:48 UTC
Created attachment 45523 [details]
workaround

So I am testing the following workaround, at least "most suitable" for branches.
It avoids generating affected BIT_FIELD_REFs (with bases with integral types
that have a precision not matching the size of the mode).

Can arm folks please test this patch and aid me (provide) a dg-torture
testcase that currently fails?

It builds OK on x86_64-linux, I'll test it with the endianess test commented
out to see if there's any testsuite fallout.

Thanks.
Comment 45 Eric Botcazou 2019-01-24 16:06:19 UTC
> So to get back to this - my thinking was that for a reference REF I can do
> 
>  base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
> &unsignedp, &reversep, &volatilep);
> 
> and get the semantically same REF building
> 
>  REF' = BIT_FIELD_REF<*(&base + offset), bitsize, bitpos>
> 
> (plus setting REF_REVERSE_STORAGE_ORDER and TREE_THIS_VOLATILE on REF').
> 
> This appearantly breaks down (similarly for get_ref_base_and_extent) for
> bigendian and DECL_BIT_FIELD outer COMPONENT_REFs.  And maybe for more?
> 
> Is my expectation that the above "works" flawed?  So "bit position"
> and "position of the first referenced bit" are two separate things?

No, they are the same, but I think that BIT_FIELD_REF is not supposed to be itself applied to a bit-field, as in the case at hand, since it's precisely meant to designate a bit-field.  In other words, the above base cannot be a bit-field.

So the safest route is probably to forbid such an abomination, i.e. to make sure that the first argument of BIT_FIELD_REF is a bona-fide base type.
Comment 46 Richard Biener 2019-01-25 07:51:22 UTC
(In reply to Richard Biener from comment #44)
> Created attachment 45523 [details]
> workaround
> 
> So I am testing the following workaround, at least "most suitable" for
> branches.
> It avoids generating affected BIT_FIELD_REFs (with bases with integral types
> that have a precision not matching the size of the mode).
> 
> Can arm folks please test this patch and aid me (provide) a dg-torture
> testcase that currently fails?
> 
> It builds OK on x86_64-linux, I'll test it with the endianess test commented
> out to see if there's any testsuite fallout.
> 
> Thanks.

No regressions on x86_64-linux when also enabling there.
Comment 47 Richard Biener 2019-01-25 08:06:48 UTC
(In reply to Eric Botcazou from comment #45)
> > So to get back to this - my thinking was that for a reference REF I can do
> > 
> >  base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
> > &unsignedp, &reversep, &volatilep);
> > 
> > and get the semantically same REF building
> > 
> >  REF' = BIT_FIELD_REF<*(&base + offset), bitsize, bitpos>
> > 
> > (plus setting REF_REVERSE_STORAGE_ORDER and TREE_THIS_VOLATILE on REF').
> > 
> > This appearantly breaks down (similarly for get_ref_base_and_extent) for
> > bigendian and DECL_BIT_FIELD outer COMPONENT_REFs.  And maybe for more?
> > 
> > Is my expectation that the above "works" flawed?  So "bit position"
> > and "position of the first referenced bit" are two separate things?
> 
> No, they are the same, but I think that BIT_FIELD_REF is not supposed to be
> itself applied to a bit-field, as in the case at hand, since it's precisely
> meant to designate a bit-field.  In other words, the above base cannot be a
> bit-field.
> 
> So the safest route is probably to forbid such an abomination, i.e. to make
> sure that the first argument of BIT_FIELD_REF is a bona-fide base type.

OK, we can certainly try to enforce this.  Just to make sure - this
refers to TREE_TYPE (TREE_OPERAND (bfr, 0)), not the base of the
component-ref chain eventually rooted here?

On GIMPLE, if we want to extract some bits from a register that happens to
be a non-"base" type this means we have to use shifting and masking.  I
suppose the same issue hits BIT_INSERT_EXPR.  This makes it somewhat
awkward (aka non-canonical) on the GIMPLE side :/  The most natural thing
there would then be to disallow such types in the IL.  Btw, are
PSImode / int : 24 "bona-fide base types"?  Thus, is the criteria that
precision matches TYPE_SIZE? GET_MODE_PRECISION? GET_MODE_BITSIZE?
I know we try to avoid introducing bit-precision ops during GIMPLE
optimization but FEs happily leak those into the IL (as in this testcase).

Just to remind where we're coming from - we have

unData.strMemHead.b30AddrL        = ulAddr >> 2;

where this is a 30bit bitfield store (at bit offset % BITS_PER_UNIT == 0)
from a unsigned :30 register.  Via the union we now try to load
8 bits from memory contained within that 30 bits and try to extract those
from the unsigned :30 register instead (with the goal to elide the
memory object).  In which circumnstances can we view the RHS of the above
store as "proper" for applying the BIT_FIELD_REF as translated to the
LHS memory to the RHS register?  Is it enough to constrain the RHS
type to TYPE_PRECISION == TYPE_SIZE?  RTL expansion now handles various
cases of handled-component expansion where the base ends up not being
memory but various other kinds of entities - do they run into the same
issue when, for BIT_FIELD_REFs, register bases are not equal to memory
bases? [Just looking at FLOAT_WORD_ORDER and friends and considering the
punning we're happily introducing via the above transform]
Comment 48 Eric Botcazou 2019-01-25 08:57:36 UTC
> OK, we can certainly try to enforce this.  Just to make sure - this
> refers to TREE_TYPE (TREE_OPERAND (bfr, 0)), not the base of the
> component-ref chain eventually rooted here?

Yes, formally it's the former, i.e. the type of operand #0 of BIT_FIELD_REF.

> On GIMPLE, if we want to extract some bits from a register that happens to
> be a non-"base" type this means we have to use shifting and masking.  I
> suppose the same issue hits BIT_INSERT_EXPR.  This makes it somewhat
> awkward (aka non-canonical) on the GIMPLE side :/

Yes, I think that the handling of BIT_INSERT_EXPR should be symmetrical.

> The most natural thing there would then be to disallow such types in the IL.

Yes, as type of operand #0 of BIT_FIELD_REF/BIT_INSERT_EXPR.

> Btw, are PSImode / int : 24 "bona-fide base types"?  Thus, is the criteria 
> that precision matches TYPE_SIZE? GET_MODE_PRECISION? GET_MODE_BITSIZE?
> I know we try to avoid introducing bit-precision ops during GIMPLE
> optimization but FEs happily leak those into the IL (as in this testcase).

The predicate type_has_mode_precision_p is the one used by the RTL expander to identify bit-field types.  Moreover it's also already used in a few GIMPLE optimization passes (fwprop, vrp, vectorize) so it seems to be fit the bill.

So a base type would be (!INTEGRAL_TYPE_P || type_has_mode_precision_p).
Comment 49 Eric Botcazou 2019-01-25 09:15:21 UTC
> Just to remind where we're coming from - we have
> 
> unData.strMemHead.b30AddrL        = ulAddr >> 2;
> 
> where this is a 30bit bitfield store (at bit offset % BITS_PER_UNIT == 0)
> from a unsigned :30 register.  Via the union we now try to load
> 8 bits from memory contained within that 30 bits and try to extract those
> from the unsigned :30 register instead (with the goal to elide the
> memory object).  In which circumnstances can we view the RHS of the above
> store as "proper" for applying the BIT_FIELD_REF as translated to the
> LHS memory to the RHS register?

I think that type_has_mode_precision_p would work and presumably not pessimize too much even on little-endian platforms.
Comment 50 Richard Biener 2019-01-25 10:03:17 UTC
Created attachment 45532 [details]
adjusted patch

This adjusts the workaround to use type_has_mode_precision_p on all targets.
It also includes IL verification (definitely not to be backported).

As of pessimizing cases it's merely cases like the testcase where we might
fail to elide a memory object that is initialized only via
!type_has_mode_precision_p stores but read in byte-size chunks.
The existing VN trick for unknown reason limits itself to
multiples of BITS_PER_UNIT sized accesses in addition to the
check that somewhat resembles type_has_mode_precision_p but applied
to the load we want to elide given other constraints on the BIT_FIELD_REF IL.

We may want to elide the multiple_p (ref->size, BITS_PER_UNIT) check for GCC 10.
Comment 51 rsandifo@gcc.gnu.org 2019-01-25 11:42:56 UTC
FWIW, the (In reply to Richard Biener from comment #44)
> Created attachment 45523 [details]
> workaround
> 
> So I am testing the following workaround, at least "most suitable" for
> branches.
> It avoids generating affected BIT_FIELD_REFs (with bases with integral types
> that have a precision not matching the size of the mode).
> 
> Can arm folks please test this patch and aid me (provide) a dg-torture
> testcase that currently fails?

It passed testing on aarch64-linux-gnu (LE) and aarch64_be-elf.
Comment 52 Richard Biener 2019-01-28 08:16:14 UTC
Author: rguenth
Date: Mon Jan 28 08:15:42 2019
New Revision: 268332

URL: https://gcc.gnu.org/viewcvs?rev=268332&root=gcc&view=rev
Log:
2019-01-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/88739
	* tree-cfg.c (verify_types_in_gimple_reference): Verify
	BIT_FIELD_REFs only are applied to mode-precision operands
	when they are integral.
	(verify_gimple_assign_ternary): Likewise for BIT_INSERT_EXPR.
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid generating
	BIT_FIELD_REFs of non-mode-precision integral operands.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-ssa-sccvn.c
Comment 53 Richard Biener 2019-01-28 08:17:13 UTC
Fixed on trunk sofar, still waiting for somebody to produce a testcase for the testsuite (I can't run-test on BE).
Comment 54 John Dong 2019-01-29 12:54:54 UTC
(In reply to Richard Biener from comment #53)
> Fixed on trunk sofar, still waiting for somebody to produce a testcase for
> the testsuite (I can't run-test on BE).

hi, any plan to fix on gcc-7-branch?
Comment 55 Richard Biener 2019-01-29 13:02:44 UTC
(In reply to John Dong from comment #54)
> (In reply to Richard Biener from comment #53)
> > Fixed on trunk sofar, still waiting for somebody to produce a testcase for
> > the testsuite (I can't run-test on BE).
> 
> hi, any plan to fix on gcc-7-branch?

Yes, maybe you can help with a testcase that can be put into gcc/testsuite/gcc.dg/torture/ and that currently fails on the branches but passes on trunk?  I didn't want to backport w/o a testcase.
Comment 56 Jakub Jelinek 2019-02-08 19:02:09 UTC
Author: jakub
Date: Fri Feb  8 19:01:37 2019
New Revision: 268706

URL: https://gcc.gnu.org/viewcvs?rev=268706&root=gcc&view=rev
Log:
	PR tree-optimization/88739
	* gcc.c-torture/execute/pr88739.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr88739.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 57 Richard Biener 2019-02-13 10:03:18 UTC
Author: rguenth
Date: Wed Feb 13 10:02:47 2019
New Revision: 268838

URL: https://gcc.gnu.org/viewcvs?rev=268838&root=gcc&view=rev
Log:
2019-02-13  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2019-02-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/89253
	* tree-ssa-loop-split.c (tree_ssa_split_loops): Check we can
	duplicate the loop.

	* gfortran.dg/pr89253.f: New testcase.

	2019-02-08  Richard Biener  <rguenther@suse.de>

	PR middle-end/89223
	* tree-data-ref.c (initialize_matrix_A): Fail if constant
	doesn't fit in HWI.
	(analyze_subscript_affine_affine): Handle failure from
	initialize_matrix_A.

	* gcc.dg/torture/pr89223.c: New testcase.

	2019-01-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/88739
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid generating
	BIT_FIELD_REFs of non-mode-precision integral operands.

	* gcc.c-torture/execute/pr88739.c: New test.

Added:
    branches/gcc-8-branch/gcc/testsuite/gcc.c-torture/execute/pr88739.c
    branches/gcc-8-branch/gcc/testsuite/gcc.dg/torture/pr89223.c
    branches/gcc-8-branch/gcc/testsuite/gfortran.dg/pr89253.f
Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/testsuite/ChangeLog
    branches/gcc-8-branch/gcc/tree-data-ref.c
    branches/gcc-8-branch/gcc/tree-ssa-loop-split.c
    branches/gcc-8-branch/gcc/tree-ssa-sccvn.c
Comment 58 John Dong 2019-03-26 09:34:20 UTC
Created attachment 46022 [details]
fix the union bug on 7.3.0
Comment 59 John Dong 2019-03-26 09:36:43 UTC
(In reply to John Dong from comment #58)
> Created attachment 46022 [details]
> fix the union bug on 7.3.0

hi, I tried to fix the bug when expanding. is it OK?
Comment 60 rguenther@suse.de 2019-03-26 09:49:42 UTC
On Tue, 26 Mar 2019, dongjianqiang2 at huawei dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #59 from John Dong <dongjianqiang2 at huawei dot com> ---
> (In reply to John Dong from comment #58)
> > Created attachment 46022 [details]
> > fix the union bug on 7.3.0
> 
> hi, I tried to fix the bug when expanding. is it OK?

No, the fix is going to be backported soon.
Comment 61 Richard Biener 2019-03-26 13:18:56 UTC
Author: rguenth
Date: Tue Mar 26 13:18:23 2019
New Revision: 269942

URL: https://gcc.gnu.org/viewcvs?rev=269942&root=gcc&view=rev
Log:
2019-02-26  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2019-02-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/89253
	* tree-ssa-loop-split.c (tree_ssa_split_loops): Check we can
	duplicate the loop.

	* gfortran.dg/pr89253.f: New testcase.

	2019-02-08  Richard Biener  <rguenther@suse.de>

	PR middle-end/89223
	* tree-data-ref.c (initialize_matrix_A): Fail if constant
	doesn't fit in HWI.
	(analyze_subscript_affine_affine): Handle failure from
	initialize_matrix_A.

	* gcc.dg/torture/pr89223.c: New testcase.

	2019-01-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/88739
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid generating
	BIT_FIELD_REFs of non-mode-precision integral operands.

	* gcc.c-torture/execute/pr88739.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.c-torture/execute/pr88739.c
    branches/gcc-7-branch/gcc/testsuite/gcc.dg/torture/pr89223.c
    branches/gcc-7-branch/gcc/testsuite/gfortran.dg/pr89253.f
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/tree-data-ref.c
    branches/gcc-7-branch/gcc/tree-ssa-loop-split.c
    branches/gcc-7-branch/gcc/tree-ssa-sccvn.c
Comment 62 Richard Biener 2019-03-26 13:20:28 UTC
Fixed.