57748 – [4.8 Regression] ICE when expanding assignment to unaligned zero-sized array

Bug 57748 - [4.8 Regression] ICE when expanding assignment to unaligned zero-sized array

Summary: [4.8 Regression] ICE when expanding assignment to unaligned zero-sized array

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.8.0

Importance:	P2 normal
Target Milestone:	4.8.5
Assignee:	Not yet assigned to anyone

URL:	http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:	ice-on-valid-code, wrong-code

Depends on:
Blocks:	59134
	Show dependency tree / graph

Reported:	2013-06-28 03:56 UTC by Khem Raj
Modified:	2015-01-27 17:10 UTC (History)
CC List:	7 users (show)

See Also:
Host:	x86_64-linux
Target:	x86_64-linux,arm-none-linux-gnueabi
Build:	x86_64-linux
Known to work:	4.7.0, 4.7.1, 4.7.2, 4.9.0
Known to fail:	4.8.0, 4.8.1, 4.8.2, 4.8.3
Last reconfirmed:	2013-07-05 00:00:00

Attachments
testcase (66.80 KB, application/x-bzip2) 2013-06-28 03:56 UTC, Khem Raj	Details
Simple x86_64 testcase (301 bytes, text/plain) 2013-07-31 14:08 UTC, Martin Jambor	Details
Untested fix (842 bytes, patch) 2013-08-01 14:40 UTC, Martin Jambor	Details \| Diff
Another attempt at a fix (1.50 KB, patch) 2013-08-30 18:20 UTC, Martin Jambor	Details \| Diff
Yet another untested fix (842 bytes, patch) 2013-09-02 23:35 UTC, Bernd Edlinger	Details \| Diff
Next try for a fix (2.08 KB, patch) 2013-09-10 07:55 UTC, Bernd Edlinger	Details \| Diff
Testcase for both the assignment and read issues (397 bytes, text/plain) 2013-09-18 14:44 UTC, Martin Jambor	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Khem Raj 2013-06-28 03:56:54 UTC

Created attachment 30405 [details]
testcase

Attached testcase causes and ICE on gcc 4.8+ and happens on trunk too. It works ok on 4.7

./gcc/cc1 ~/callchain.i -mfpu=neon -Os -mfloat-abi=softfp


util/callchain.c: In function 'append_chain':
util/callchain.c:374:18: internal compiler error: in expand_assignment, at expr.c:4676
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

Comment 1 Khem Raj 2013-06-28 03:58:43 UTC

Here is how compiler is configured

configure --build=x86_64-linux --host=x86_64-linux --target=arm-none-linux-gnueabi --prefix=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr --exec_prefix=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr --bindir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/bin/ppce500v2-angstrom-linux-gnuspe.gcc-cross-initial --sbindir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/bin/ppce500v2-angstrom-linux-gnuspe.gcc-cross-initial --libexecdir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/libexec/ppce500v2-angstrom-linux-gnuspe.gcc-cross-initial --datadir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/share --sysconfdir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/etc --sharedstatedir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/com --localstatedir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/var --libdir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/lib/ppce500v2-angstrom-linux-gnuspe.gcc-cross-initial --includedir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/include --oldincludedir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/include --infodir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/share/info --mandir=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux/usr/share/man --disable-silent-rules --disable-dependency-tracking --with-libtool-sysroot=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/x86_64-linux --with-newlib --without-headers --disable-shared --disable-threads --disable-multilib --disable-__cxa_atexit --enable-languages=c --program-prefix=powerpc-angstrom-linux-gnuspe- --with-sysroot=/builds1/angstrom/build/tmp-angstrom_next-eglibc/sysroots/p2020ds --with-build-sysroot=/builds1/angstrom/build/tmp-angstrom_next-eglibc/work/ppce500v2-angstrom-linux-gnuspe/gcc-cross-initial/4.8.1-r0/gcc-4.8.1/build.x86_64-linux.powerpc-angstrom-linux-gnuspe/tmpsysroot --disable-libmudflap --disable-libgomp --disable-libssp --disable-libquadmath --with-system-zlib --disable-lto --disable-plugin --enable-decimal-float=no --disable-nls

Comment 2 Khem Raj 2013-06-28 04:00:18 UTC

I have traced it to this commit r187101

    2012-05-03  Martin Jambor  <mjambor@suse.cz>

        * builtins.c (get_object_alignment_1): Return whether we can determine
        the alignment or conservatively assume byte alignment.  Return the
        alignment by reference.  Use get_pointer_alignment_1 for dereference
        alignment.
        (get_pointer_alignment_1): Return whether we can determine the
        alignment or conservatively assume byte alignment.  Return the
        alignment by reference.  Use get_ptr_info_alignment to get SSA name
        alignment.
        (get_object_alignment): Update call to get_object_alignment_1.
        (get_object_or_type_alignment): Likewise, fall back to type alignment
        only when it returned false.
        (get_pointer_alignment): Update call to get_pointer_alignment_1.
        * fold-const.c (get_pointer_modulus_and_residue): Update call to
        get_object_alignment_1.
        * ipa-prop.c (ipa_modify_call_arguments): Update call to
        get_pointer_alignment_1.
        * tree-sra.c (build_ref_for_offset): Likewise, fall back to the type
        of MEM_REF or TARGET_MEM_REF only when it returns false.
        * tree-ssa-ccp.c (get_value_from_alignment): Update call to
        get_object_alignment_1.
        (ccp_finalize): Use set_ptr_info_alignment.
        * tree.h (get_object_alignment_1): Update declaration.
        (get_pointer_alignment_1): Likewise.
        * gimple-pretty-print.c (dump_gimple_phi): Use get_ptr_info_alignment.
        (dump_gimple_stmt): Likewise.
        * tree-flow.h (ptr_info_def): Updated comments of fields align and
        misalign.
        (get_ptr_info_alignment): Declared.
        (mark_ptr_info_alignment_unknown): Likewise.
        (set_ptr_info_alignment): Likewise.
        (adjust_ptr_info_misalignment): Likewise.
        * tree-ssa-address.c (copy_ref_info): Use new access functions to get
        and set alignment of SSA names.
        * tree-ssa-loop-ivopts.c (rewrite_use_nonlinear_expr): Call
        mark_ptr_info_alignment_unknown.
        * tree-ssanames.c (get_ptr_info_alignment): New function.
        (mark_ptr_info_alignment_unknown): Likewise.
        (set_ptr_info_alignment): Likewise.
        (adjust_ptr_info_misalignment): Likewise.
        (get_ptr_info): Call mark_ptr_info_alignment_unknown.
        * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
        Likewise.
        (bump_vector_ptr): Likewise.
        * tree-vect-stmts.c (create_array_ref): Use set_ptr_info_alignment.
        (vectorizable_store): Likewise.
        (vectorizable_load): Likewise.

Comment 3 Martin Jambor 2013-06-28 14:52:18 UTC

Unfortunately it is quite likely I won't be able to look at this in
the next three (or even four) weeks.  Can you please have a look at
what exactly triggers the ICE?  In my checkouts of trunk and the 4.8
branch, there is nothing likely to cause such a problem on or around
line 4676 in expr.c.  Thanks.

Comment 4 philb 2013-07-03 16:34:01 UTC

I was able to get Khem's testcase to provoke a crash at:

4761		      gcc_assert (TREE_CODE (offset) == INTEGER_CST);

Apparently OFFSET is:

 <plus_expr 0x7ffff6380d48
    type <integer_type 0x7ffff6c60000 sizetype public unsigned SI
        size <integer_cst 0x7ffff6c5c080 constant 32>
        unit size <integer_cst 0x7ffff6c5c0a0 constant 4>
        align 32 symtab 0 alias set -1 canonical type 0x7ffff6c60000 precision 32 min <integer_cst 0x7ffff6c5c0c0 0> max <integer_cst 0x7ffff6c4b000 4294967295>>
   
    arg 0 <mult_expr 0x7ffff6380d20 type <integer_type 0x7ffff6c60000 sizetype>
       
        arg 0 <nop_expr 0x7ffff6381b80 type <integer_type 0x7ffff6c60000 sizetype>
           
            arg 0 <ssa_name 0x7ffff6374900 type <integer_type 0x7ffff6c605e8 int>
                var <var_decl 0x7ffff637a428 j>def_stmt j_22 = PHI <0(4), j_31(7)>

                version 22>>
        arg 1 <integer_cst 0x7ffff6c5c600 constant 16>>
    arg 1 <integer_cst 0x7ffff6c5c120 type <integer_type 0x7ffff6c60000 sizetype> constant 8>>

Comment 5 Martin Jambor 2013-07-26 16:47:48 UTC

So I only got to this and I definitely won't be able to finish it
today or even this week but here is what I have figured out so far.

We ICE when expanding statement

  MEM[(struct resolved_chain *)_19].ips[j_22].ip = _21;

before my patch, that statement was not there, instead we had

  filtered_7->ips[j_22].ip = D.11868_21;

which copyprop2 replaces to the above form with my patch.  I have not
looked at why.  The difference is subtle, _19 is a void pointer.  I
have not looked at why this transformation took place, however, I had
the MEM_REF tree dumped and stared at it for a while and see nothing
wrong with it.

As the code progresses through expand_assignment, offset as filled in
get_inner_reference is the same, however get_object_alignment (tem)
used to return 64, and now only returns 32 which then pushes us the
wrong path which does not handle this case.  So now I guess I should
figure out why get_object_alignment thinks the alignment is so
small...

Comment 6 Bernd Edlinger 2013-07-29 20:49:14 UTC

(In reply to Martin Jambor from comment #5)
> expand_assignment, offset as filled in get_inner_reference is the same,
> however get_object_alignment (tem) used to return 64, and now only returns
> 32 which then pushes us the wrong path which does not handle this case.  So
> now I guess I should figure out why get_object_alignment thinks the
> alignment is so small...

hhmm..

set_ptr_info_alignment is always called with align=4,
and by the way, the crash goes away if I change this line
(but I cannot tell if the code is correct):

--- builtins.c.jj       2013-07-06 11:34:17.000000000 +0200
+++ builtins.c  2013-07-29 21:50:56.000000000 +0200
@@ -503,7 +503,7 @@ get_pointer_alignment_1 (tree exp, unsig
          *bitposp = ptr_misalign * BITS_PER_UNIT;
          *alignp = ptr_align * BITS_PER_UNIT;
          /* We cannot really tell whether this result is an approximation.  */
-         return true;
+         return false;
        }
       else
        {

Comment 7 Martin Jambor 2013-07-30 14:53:31 UTC

(In reply to Bernd Edlinger from comment #6)
> hhmm..
> 
> set_ptr_info_alignment is always called with align=4,
> and by the way, the crash goes away if I change this line
> (but I cannot tell if the code is correct):

No, the comment is a bit misleading but this change is incorrect and
would lead to misaligned access traps.

The returned alignment is OK because MALLOC_ABI_ALIGNMENT is 32 and
the pointer the problematic MEM_REF is based on is a result of calloc.
So it is the old value that is incorrect.  If MALLOC_ABI_ALIGNMENT is
incorrectly low for this target, increasing it to the correct value
will make the ICE go away.

In any event, it is clear that the code in expand_assignment cannot
cope with unaligned tem and non-NULL offset.  So currently I'm
considering the following patch, although I am not really sure it is
enough (it does fix the ICE, though).  If you can run the testcase on
the platform, would you run it with this patch applied, please?


diff --git a/gcc/expr.c b/gcc/expr.c
index 2c5f21a..267174b 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4707,6 +4707,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
       mode = TYPE_MODE (TREE_TYPE (tem));
       if (TREE_CODE (tem) == MEM_REF
          && mode != BLKmode
+         && !offset
          && ((align = get_object_alignment (tem))
              < GET_MODE_ALIGNMENT (mode))
          && ((icode = optab_handler (movmisalign_optab, mode))

Comment 8 Bernd Edlinger 2013-07-30 22:52:53 UTC

(In reply to Martin Jambor from comment #7)
> In any event, it is clear that
> the code in expand_assignment cannot cope with unaligned tem and non-NULL
> offset.  So currently I'm considering the following patch, although I am not
> really sure it is enough (it does fix the ICE, though).  If you can run the
> testcase on the platform, would you run it with this patch applied, please?

No, unfortunately I can only look at the assembler listing.

But wait a moment...

If the object is assumed to be unaligned here this patch will
likely just compute the unaligned address, add the offset,
and store the result there without any special precautions.
While the code in the if statement seems to store the expression
on a register and move that register to the final destination.

Well, I believe this unaligned arrays are generally broken.

consider this example:

struct test
{
  char x;
  long long y[10];
} __attribute__((packed));

long long foo(struct test *x, long long y, long long z)
{
   long long a = x->y[y];
   x->y[y] = z;
   return a;
}

gets compiled to:

foo:
        @ Function supports interworking.
        @ args = 8, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        stmfd   sp!, {r4, r5}
        add     r5, sp, #8
        ldmia   r5, {r4-r5}
        add     r2, r0, r2, asl #3
        add     r1, r2, #1
        ldmia   r1, {r0-r1}
        str     r4, [r2, #1]
        str     r5, [r2, #5]
        ldmfd   sp!, {r4, r5}
        bx      lr

Won't these ldmia statements statements fault on unaligned
addresses, even on a cortex-a9 ?

Furthermore str on odd addresses are always there, regardless of the
-mno-unaligned-access setting.

Comment 9 Martin Jambor 2013-07-31 14:08:19 UTC

Created attachment 30577 [details]
Simple x86_64 testcase

Simple x86_64 testcase triggering the ICE.

Comment 10 Martin Jambor 2013-07-31 16:39:23 UTC

The problem is that the type of the record that contains the scalar
data we are accessing has non-BLK mode despite that we are not
accessing a part of it.  This is because it has a zero sized trailing
array:

typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));

in the x86_64-linux example from comment #9, and 

struct resolved_chain {
 u64 nr;
 struct resolved_ip ips[0];
};

in the original testcase.

Because it has non-BLK mode, it passes the condition in
expand-assignment that is there to handle accesses to small,
scalar-sized structures (or even arrays) which should be written to
memory through movmisalign_optab.  E.g. to access an element of a
vector.

However, in these testcases, the structure we are writing to is bigger
than a scalar.  I wonder why structures with trailing zero-sized
arrays have non-BLK mode.  I think that is the root of the problem,
actually.

Comment 11 Martin Jambor 2013-07-31 16:42:18 UTC

(In reply to Bernd Edlinger from comment #8)
> (In reply to Martin Jambor from comment #7)
> > In any event, it is clear that
> > the code in expand_assignment cannot cope with unaligned tem and non-NULL
> > offset.  So currently I'm considering the following patch, although I am not
> > really sure it is enough (it does fix the ICE, though).  If you can run the
> > testcase on the platform, would you run it with this patch applied, please?
> 
> No, unfortunately I can only look at the assembler listing.
> 
> But wait a moment...
> 
> If the object is assumed to be unaligned here this patch will
> likely just compute the unaligned address, add the offset,
> and store the result there without any special precautions.
> While the code in the if statement seems to store the expression
> on a register and move that register to the final destination.
> 
> Well, I believe this unaligned arrays are generally broken.
> 
> consider this example:

With or without the patch?  If without the patch and you are
reasonably confident the output is indeed wrong, please open a new PR
(and CC me, I'm interested) because this particular ICE is certainly
caused by trailing zero sized arrays.  I have tried reproducing your
problem with x86_64 MMX vectors but couldn't.  I do not have access to
an ARM machine to verify myself.  Thanks.

Comment 12 Bernd Edlinger 2013-08-01 02:10:35 UTC

(In reply to Martin Jambor from comment #11)
>> Well, I believe this
>> unaligned arrays are generally broken.
>> 
>> consider this example:
> With or
> without the patch?  If without the patch and you are reasonably confident
> the output is indeed wrong, please open a new PR (and CC me, I'm interested)
> because this particular ICE is certainly caused by trailing zero sized
> arrays.  I have tried reproducing your problem with x86_64 MMX vectors but
> couldn't.  I do not have access to an ARM machine to verify myself.  Thanks.

My example just produces wrong code. I tried everything,
trunk with or without the patch does not matter, and it does
not hit the ICE at all, but I tried to write the example to go
into the if-statement here, and was somehow surprised, it produces
wrong code instead.

Your example aborts without the patch, and correctly produces
16 strb instructions with the patch.

That means that store_field can handle unaligned address
in to_rtx in *some* cases. Is this if-statement a work around for
something, that should have been fixed in store_field instead?


I'm chasing problems with unaligned structures that exist on the ARM GCC
but not on Intel. All that started probably in 2011 with GCC 4.6, and
meanwhile I'm concerned, because new processors arrive all the time
and we must soon fix that or think of alternatives to GCC :-(

I started with the discovery that volatile accesses to packed
structure members are completely broken:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56341

However this seems to be a similar but completely different bug.
I will file it today.


PS: If you want to build a cross compiler for ARM, I can
help you out with eCos sys-include headers, if that is present
in the install tree, the cross-compiler can be built on X86_64 too.

Comment 13 Martin Jambor 2013-08-01 14:40:29 UTC

Created attachment 30583 [details]
Untested fix

This is how I'd like to fix the problem if the patch passes bootstrap
and testing (on x86_64-linux, any additional testing welcome).

I believe that the problem is that while compute_record_mode in
stor-layout.c makes sure it assigns BLK mode to structs with flexible
arrays, it has no such provisions for zero length arrays
(http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Zero-Length.html).  I
think that in order to avoid problems and surprises like this one, we
should assign both variable array possibilities the same mode.

Comment 14 Bernd Edlinger 2013-08-02 07:11:32 UTC

Martin,

Your patch is of course OK, but the MALLOC_ABI_ALIGNMENT is probably wrong too.
At least in targets with neon processor it should be raised to 64 bits.

If the malloc would really return 4 byte aligned pointers, and you
pass such a pointer to an external function, that function may assume
naturally 8 byte aligned pointers and fault at runtime, right?

I've just re-read the relevant ARM ABI document AAPCS:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf

I have not found specific alignment requirements for malloc,
but they specify alignments of different basic types up to 8 byte.
Therefore I assume that the default value for MALLOC_ABI_ALIGNMENT is
generally too low for the ARM architecture.

The usual Doug Lee Malloc implementation has by design a lowest possible
alignment of 8 bytes.

What I mean is, maybe the defautlt for MALOC_ABI_ALIGNMENT
should changed to BIGGEST_ALIGNMENT. What do You think?

Comment 15 Martin Jambor 2013-08-02 12:31:28 UTC

I have submitted the patch to the mailing list for review:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00082.html


(In reply to Bernd Edlinger from comment #14)
> What I mean is, maybe the defautlt for MALOC_ABI_ALIGNMENT
> should changed to BIGGEST_ALIGNMENT. What do You think?

If ARM can trap accessing an 8 byte variable of a type that can be
defined without any fancy attributes, then unambiguously yes, malloc
has to return memory aligned to at least 8 bytes and
MALLOC_ABI_ALIGNMENT needs to be bumped to at least 64.

I am not sure whether BIGGEST_ALIGNMENT is the right value, e.g. on
x86_64 the value of MALLOC_ABI_ALIGNMENT is 64 whereas
BIGGEST_ALIGNMENT is 128.  Of course, on ARM the latter is also 64, so
at the moment it would not make a real difference.

Anyway, the policy of GCC seems to be that the default of
MALLOC_ABI_ALIGNMENT is ultra-safe and targets should override it.  So
I would suggest, again :-), that you open a separate bug and CC ARM
maintainers that should take care of it.

Comment 16 Bernd Edlinger 2013-08-02 22:49:26 UTC

(In reply to Martin Jambor from comment #15)
> Anyway, the policy of GCC
> seems to be that the default of MALLOC_ABI_ALIGNMENT is ultra-safe and
> targets should override it.  So I would suggest, again :-), that you open a
> separate bug and CC ARM maintainers that should take care of it.

Done. Bug#58065

Comment 17 David Abdurachmanov 2013-08-03 22:36:43 UTC

Just for the record.

I hit the same bug on Cortex-A9 NEON GCC 4.8.1 hard floats, while compiling scipy 0.8.0 RPM. There was not issue on 4.8.0 and pre-4.9.0 w/o NEON as FPU.

scipy/spatial/ckdtree.c: In function '__pyx_f_5scipy_7spatial_7ckdtree_7cKDTree___query':
scipy/spatial/ckdtree.c:4194:66: internal compiler error: in expand_assignment, at expr.c:4761
         (__pyx_v_inf2->side_distances[__pyx_v_inode->split_dim]) = __pyx_f_5scipy_7spatial_7ckdtree_dabs((__pyx_v_inode->split - (__pyx_v_x[__pyx_v_inode->split_dim])));
                                                                  ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
error: Command "gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/build/davidlt/481-ext/a/fc18_armv7hl_gcc481/external/py2-numpy/1.6.1/lib/python2.7/site-packages/numpy/core/include -I/build/davidlt/481-ext/a/fc18_armv7hl_gcc481/external/python/2.7.3/include/python2.7 -c scipy/spatial/ckdtree.c -o build/temp.linux-armv7l-2.7/scipy/spatial/ckdtree.o" failed with exit status 1

### GCC configuration ###

Target: armv7hl-redhat-linux-gnueabi
Configured with: ../configure --prefix=/build/davidlt/481/b/tmp/BUILDROOT/4464b4bee054158584ab26d304d9b1a8/opt/cmssw/fc18_armv7hl_gcc481/external/gcc/4.8.1 --disable-multilib --disable-nls --with-zlib=no --enable-languages=c,c++,fortran --enable-gold=yes --enable-ld=default --enable-lto --with-gmp=/build/davidlt/481/b/tmp/BUILDROOT/4464b4bee054158584ab26d304d9b1a8/opt/cmssw/fc18_armv7hl_gcc481/external/gcc/4.8.1 --with-mpfr=/build/davidlt/481/b/tmp/BUILDROOT/4464b4bee054158584ab26d304d9b1a8/opt/cmssw/fc18_armv7hl_gcc481/external/gcc/4.8.1 --with-mpc=/build/davidlt/481/b/tmp/BUILDROOT/4464b4bee054158584ab26d304d9b1a8/opt/cmssw/fc18_armv7hl_gcc481/external/gcc/4.8.1 --with-isl=/build/davidlt/481/b/tmp/BUILDROOT/4464b4bee054158584ab26d304d9b1a8/opt/cmssw/fc18_armv7hl_gcc481/external/gcc/4.8.1 --with-cloog=/build/davidlt/481/b/tmp/BUILDROOT/4464b4bee054158584ab26d304d9b1a8/opt/cmssw/fc18_armv7hl_gcc481/external/gcc/4.8.1 --enable-checking=release --build=armv7hl-redhat-linux-gnueabi --host=armv7hl-redhat-linux-gnueabi --enable-libstdcxx-time=rt --enable-bootstrap --enable-threads=posix --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --enable-linker-build-id --disable-build-with-cxx --disable-build-poststage1-with-cxx --with-cpu=cortex-a9 --with-tune=cortex-a9 --with-arch=armv7-a --with-float=hard --with-fpu=neon --with-abi=aapcs-linux --disable-sjlj-exceptions --enable-shared CC='gcc -fPIC' CXX='c++ -fPIC' CPP=cpp CXXCPP='c++ -E'
Thread model: posix

Comment 18 David Abdurachmanov 2013-08-06 20:30:31 UTC

Tested the patch on top of final 4.8.1 Cortex-A9 NEON FPU. GCC no more ICE'ing while compiling scipy.

Comment 19 Richard Biener 2013-08-28 08:20:02 UTC

Barking up wrong trees.  Hacky fix looks like:

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 202043)
+++ gcc/expr.c  (working copy)
@@ -4753,6 +4753,9 @@ expand_assignment (tree to, tree from, b
        {
          enum machine_mode address_mode;
          rtx offset_rtx;
+         rtx saved_to_rtx = to_rtx;
+         if (misalignp)
+           to_rtx = mem;
 
          if (!MEM_P (to_rtx))
            {
@@ -4785,6 +4788,11 @@ expand_assignment (tree to, tree from, b
          to_rtx = offset_address (to_rtx, offset_rtx,
                                   highest_pow2_factor_for_target (to,
                                                                   offset));
+         if (misalignp)
+           {
+             mem = to_rtx;
+             to_rtx = saved_to_rtx;
+           }
        }
 
       /* No action is needed if the target is not a memory and the field


volatile bitfield case to be audited as well:

      /* If the bitfield is volatile, we want to access it in the
         field's mode, not the computed mode.
         If a MEM has VOIDmode (external with incomplete type),
         use BLKmode for it instead.  */
      if (MEM_P (to_rtx))
        {
          if (volatilep && flag_strict_volatile_bitfields > 0)
            to_rtx = adjust_address (to_rtx, mode1, 0);
          else if (GET_MODE (to_rtx) == VOIDmode)
            to_rtx = adjust_address (to_rtx, BLKmode, 0);
        }

checks the wrong RTX if it got the movmisalign path.  Or rather,
-fstrict-volatile bitfields doesn't seem to work properly for
misaligned accesses?

Comment 20 Bernd Edlinger 2013-08-28 09:00:32 UTC

(In reply to Richard Biener from comment #19)
> volatile bitfield case to be audited as well:
> 
>       /* If the bitfield is volatile, we want to access it in the
>          field's mode, not the computed mode.
>          If a MEM has VOIDmode (external with incomplete type),
>          use BLKmode for it instead.  */
>       if (MEM_P (to_rtx))
>         {
>           if (volatilep && flag_strict_volatile_bitfields > 0)
>             to_rtx = adjust_address (to_rtx, mode1, 0);
>           else if (GET_MODE (to_rtx) == VOIDmode)
>             to_rtx = adjust_address (to_rtx, BLKmode, 0);
>         }
> 
> checks the wrong RTX if it got the movmisalign path.  Or rather,
> -fstrict-volatile bitfields doesn't seem to work properly for
> misaligned accesses?

Definitely. -fstrict-volatile-bitfields does not work at all.
Especially not fo misaligned accesses, or packed structures.
That's what Sanda Loosemore's patch is trying to fix.
As I repeatedly told. Please have a look at it if you can.

If flag_struct_volatile_bitfilds > 0 the
mode1 is diffent here than without.

Comment 21 Bernd Edlinger 2013-08-29 21:29:02 UTC

Richard, I have one question:
does this code at expr.c line 4717 look right?
I mean does the code in the if block use the offset at all?

          misalignp = true;
          to_rtx = gen_reg_rtx (mode);
          mem = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);

          /* If the misaligned store doesn't overwrite all bits, perform
             rmw cycle on MEM.  */
          if (bitsize != GET_MODE_BITSIZE (mode))
            {
              create_input_operand (&ops[0], to_rtx, mode);
              create_fixed_operand (&ops[1], mem);
              /* The movmisalign<mode> pattern cannot fail, else the assignment
                 would silently be omitted.  */
              expand_insn (icode, 2, ops);

              mem = copy_rtx (mem);
            }

what must be done to execute in that block?
create a struct with an array of struct of bitfields or something?
that happens to be unaligned, and have BLKmode?

Comment 22 Martin Jambor 2013-08-30 18:20:53 UTC

Created attachment 30732 [details]
Another attempt at a fix

I simply moved the decision whether to go the misalignp path or not a
bit down in the function, below the address adjustments done for
non-NULL offset, strict volatile bit fields etc. and ran the
testsuite, expecting some fallout.  But there was none the patch even
survives a bootstrap on x86_64-linux.  I'm hesitant to call it the
fix, I'd like to have a second look at it after the weekend but if
someone wants to test meanwhile, such input would be highly welcome.

Comment 23 Bernd Edlinger 2013-08-31 18:12:46 UTC

Martin, one of the errors with strict volatile bitfields
was with a structure like this.

struct S0 {
   signed a : 7;
   unsigned b : 28;
} __attribute__((packed));

here the member b is using SImode but the data are spread over 5 bytes.

This stucture access does not go thru the misalign code path.
But it it did, then I think we could get into trouble.

The code in the misalign code path assumes, that bitpos == 0.
Otherwise the condition if (bitsize != GET_MODE_BITSIZE (mode))
will not do the right thing.
 
And the misalign code path will break the
-fstrict-volatile-bitfields if the access is volatile.

What I mean is if we could find an example where this code path
is executed for a bit field it will break the strict volatile bitfileds
once again, even with Sandra Loosemore's patch.

Probably there is a reason why this can not happen but I do not see
it. At least there should be some assertions here.

Comment 24 Bernd Edlinger 2013-09-02 23:35:16 UTC

Created attachment 30743 [details]
Yet another untested fix

Ok, this is somewhat similar to Martin's very first attempt
to fix this.

After staring at the code for quite a long time,
I believe the misalign code path is meant for structures
or unions that can be accessed in a mode != BLKmode which
is the mode of the largest member of a union for instance.

But if that mode has a movmisalign op that should be used.
However that is only an optimization, store_field will always
be able to store the value byte by byte if necessary.

If offset != 0 we have a non-constant (or maybe negative)
array index here.
If volatile we have a volatile access.
If bitpos + bitsize > GET_MODE_BITSIZE we probably
have an array with a constant index.
If any of these happens, better not go the misalign code path.

The second hunk is necessary because the if block
sets bitpos = 0, but leaves bitregion_start and bitregion_end untouched,
creating an inconsistent bitregion, which may or may not assert later.
Therefore check that this is no bitregion. If it is a bit region
store_field can handle it.

How do you like this patch?

Comment 25 Richard Biener 2013-09-03 10:12:17 UTC

I think that

      tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
                                 &unsignedp, &volatilep, true);

      if (TREE_CODE (to) == COMPONENT_REF
          && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
        get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos, &offset);

      /* If we are going to use store_bit_field and extract_bit_field,
         make sure to_rtx will be safe for multiple use.  */
      mode = TYPE_MODE (TREE_TYPE (tem));
      if (TREE_CODE (tem) == MEM_REF
          && mode != BLKmode
          && ((align = get_object_alignment (tem))
              < GET_MODE_ALIGNMENT (mode))
          && ((icode = optab_handler (movmisalign_optab, mode))
              != CODE_FOR_nothing))

is overly pessimistic.  What matters is the mode of the access, not that
of the access base object!

Which means the pre-handling of

  /* Handle misaligned stores.  */
  mode = TYPE_MODE (TREE_TYPE (to));
  if ((TREE_CODE (to) == MEM_REF
       || TREE_CODE (to) == TARGET_MEM_REF)
      && mode != BLKmode
      && !mem_ref_refers_to_non_mem_p (to)

should be made to trigger for all 'to', not just bare MEM_REF/TARGET_MEM_REF.
Then the other movmisalign path can be completely removed.

Comment 26 Bernd Edlinger 2013-09-03 11:41:22 UTC

(In reply to Richard Biener from comment #25)
> I think that
> 
>       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
>                                  &unsignedp, &volatilep, true);
> 
>       if (TREE_CODE (to) == COMPONENT_REF
>           && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
>         get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos,
> &offset);
> 
>       /* If we are going to use store_bit_field and extract_bit_field,
>          make sure to_rtx will be safe for multiple use.  */
>       mode = TYPE_MODE (TREE_TYPE (tem));
>       if (TREE_CODE (tem) == MEM_REF
>           && mode != BLKmode
>           && ((align = get_object_alignment (tem))
>               < GET_MODE_ALIGNMENT (mode))
>           && ((icode = optab_handler (movmisalign_optab, mode))
>               != CODE_FOR_nothing))
> 
> is overly pessimistic.  What matters is the mode of the access, not that
> of the access base object!
> 
> Which means the pre-handling of
> 
>   /* Handle misaligned stores.  */
>   mode = TYPE_MODE (TREE_TYPE (to));
>   if ((TREE_CODE (to) == MEM_REF
>        || TREE_CODE (to) == TARGET_MEM_REF)
>       && mode != BLKmode
>       && !mem_ref_refers_to_non_mem_p (to)
> 
> should be made to trigger for all 'to', not just bare MEM_REF/TARGET_MEM_REF.
> Then the other movmisalign path can be completely removed.

struct s
{
  int x:19;
} __attribute__((packed));
int foo(struct s *s)
{
  s->x=1;
}

in this case of a packed field the TREE_CODE (to) will be COMPONENT_REF
TYPE_MODE (TREE_TYPE (to)) will be SImode.
If this would go to the pre-handling, this will not work for packed bit-fields.

this handling is just for: int *x;

*x = 1;

or: struct s *x, *y;

*x = *y; //if struct s is not BLKmode.

Comment 27 Bernd Edlinger 2013-09-04 09:18:46 UTC

(In reply to Martin Jambor from comment #22)
> Created attachment 30732 [details]
> Another attempt at a fix
> 
> I simply moved the decision whether to go the misalignp path or not a
> bit down in the function, below the address adjustments done for
> non-NULL offset, strict volatile bit fields etc. and ran the
> testsuite, expecting some fallout.  But there was none the patch even
> survives a bootstrap on x86_64-linux.  I'm hesitant to call it the
> fix, I'd like to have a second look at it after the weekend but if
> someone wants to test meanwhile, such input would be highly welcome.

Martin, this patch is wrong:
consider this test example:

/* PR middle-end/57748 */
/* { dg-do run } */

#include <stdlib.h>

extern void abort (void);

typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

typedef double V1
  __attribute__ ((vector_size (1 * sizeof (double)), may_alias));

typedef struct S { V a; V1 b[0]; } P __attribute__((aligned (1)));

struct __attribute__((packed)) T { char c; P s; };

void __attribute__((noinline, noclone))
check (struct T *t)
{
  if (t->s.b[0][0] != 3)
    abort ();
}

int __attribute__((noinline, noclone))
get_i (void)
{
  return 0;
}

void __attribute__((noinline, noclone))
foo (P *p)
{
  V1 a = { 3 };
  int i = get_i();
  p->b[i] = a;
}

int
main ()
{
  struct T *t = (struct T *) malloc (128);

  foo (&t->s);
  check (t);

  return 0;
}

this example is designed to go thru the
if (bitsize != GET_MODE_BITSIZE (mode)) path.
Because the struct mode is derived from V but the bitsize
is from V1, only half of V.

the resulting code from this patch accesses out of bounds:

foo:
.LFB9:
        .cfi_startproc
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        movq    %rdi, %rbx
        subq    $16, %rsp
        .cfi_def_cfa_offset 32
        call    get_i
        cltq
        movq    .LC1(%rip), %xmm1
        addq    $2, %rax
        movdqu  (%rbx,%rax,8), %xmm0
        psrldq  $8, %xmm0
        punpcklqdq      %xmm0, %xmm1
        movdqu  %xmm1, (%rbx,%rax,8)
        addq    $16, %rsp
        .cfi_def_cfa_offset 16
        popq    %rbx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

while my latest patch (which meanwhile was boot-straped without regressions)
generates this:

foo:
.LFB9:
        .cfi_startproc
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        movq    %rdi, %rbx
        call    get_i
        movsd   .LC0(%rip), %xmm0
        cltq
        movsd   %xmm0, 16(%rbx,%rax,8)
        popq    %rbx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

Comment 28 Bernd Edlinger 2013-09-04 09:33:23 UTC

(In reply to Richard Biener from comment #19)
> Barking up wrong trees.  Hacky fix looks like:
> 
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  (revision 202043)
> +++ gcc/expr.c  (working copy)
> @@ -4753,6 +4753,9 @@ expand_assignment (tree to, tree from, b
>         {
>           enum machine_mode address_mode;
>           rtx offset_rtx;
> +         rtx saved_to_rtx = to_rtx;
> +         if (misalignp)
> +           to_rtx = mem;
>  
>           if (!MEM_P (to_rtx))
>             {
> @@ -4785,6 +4788,11 @@ expand_assignment (tree to, tree from, b
>           to_rtx = offset_address (to_rtx, offset_rtx,
>                                    highest_pow2_factor_for_target (to,
>                                                                    offset));
> +         if (misalignp)
> +           {
> +             mem = to_rtx;
> +             to_rtx = saved_to_rtx;
> +           }
>         }
>  
>        /* No action is needed if the target is not a memory and the field
> 
> 

this patch generates wrong code too:

foo:
.LFB9:
        .cfi_startproc
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        movq    %rdi, %rbx
        subq    $16, %rsp
        .cfi_def_cfa_offset 32
        call    get_i
        movdqu  (%rbx), %xmm0
        cltq
        movq    .LC1(%rip), %xmm1
        psrldq  $8, %xmm0
        punpcklqdq      %xmm0, %xmm1
        movdqu  %xmm1, 16(%rbx,%rax,8)
        addq    $16, %rsp
        .cfi_def_cfa_offset 16
        popq    %rbx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

loads *s into xmm0, modifies low part,
writes back at s->b[0] and s->b[1].

Comment 29 Richard Biener 2013-09-04 09:59:50 UTC

(In reply to Bernd Edlinger from comment #28)
> (In reply to Richard Biener from comment #19)
> > Barking up wrong trees.  Hacky fix looks like:
> > 
> > Index: gcc/expr.c
> > ===================================================================
> > --- gcc/expr.c  (revision 202043)
> > +++ gcc/expr.c  (working copy)
> > @@ -4753,6 +4753,9 @@ expand_assignment (tree to, tree from, b
> >         {
> >           enum machine_mode address_mode;
> >           rtx offset_rtx;
> > +         rtx saved_to_rtx = to_rtx;
> > +         if (misalignp)
> > +           to_rtx = mem;
> >  
> >           if (!MEM_P (to_rtx))
> >             {
> > @@ -4785,6 +4788,11 @@ expand_assignment (tree to, tree from, b
> >           to_rtx = offset_address (to_rtx, offset_rtx,
> >                                    highest_pow2_factor_for_target (to,
> >                                                                    offset));
> > +         if (misalignp)
> > +           {
> > +             mem = to_rtx;
> > +             to_rtx = saved_to_rtx;
> > +           }
> >         }
> >  
> >        /* No action is needed if the target is not a memory and the field
> > 
> > 
> 
> this patch generates wrong code too:
> 
> foo:
> .LFB9:
>         .cfi_startproc
>         pushq   %rbx
>         .cfi_def_cfa_offset 16
>         .cfi_offset 3, -16
>         movq    %rdi, %rbx
>         subq    $16, %rsp
>         .cfi_def_cfa_offset 32
>         call    get_i
>         movdqu  (%rbx), %xmm0
>         cltq
>         movq    .LC1(%rip), %xmm1
>         psrldq  $8, %xmm0
>         punpcklqdq      %xmm0, %xmm1
>         movdqu  %xmm1, 16(%rbx,%rax,8)
>         addq    $16, %rsp
>         .cfi_def_cfa_offset 16
>         popq    %rbx
>         .cfi_def_cfa_offset 8
>         ret
>         .cfi_endproc
> 
> loads *s into xmm0, modifies low part,
> writes back at s->b[0] and s->b[1].

This bug is latent because we use the mode of the base object to query
for movmisalign_optab (and use it).  It highlights the issue I raised
in my last comment.

So, new, completely untested patch:

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 202240)
+++ gcc/expr.c  (working copy)
@@ -4646,10 +4646,12 @@ expand_assignment (tree to, tree from, b
 
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
-  if ((TREE_CODE (to) == MEM_REF
-       || TREE_CODE (to) == TARGET_MEM_REF)
-      && mode != BLKmode
-      && !mem_ref_refers_to_non_mem_p (to)
+  if (mode != BLKmode
+      && (DECL_P (to)
+         || handled_component_p (to)
+         || ((TREE_CODE (to) == MEM_REF
+              || TREE_CODE (to) == TARGET_MEM_REF)
+             && !mem_ref_refers_to_non_mem_p (to)))
       && ((align = get_object_alignment (to))
          < GET_MODE_ALIGNMENT (mode))
       && (((icode = optab_handler (movmisalign_optab, mode))
@@ -4696,7 +4698,6 @@ expand_assignment (tree to, tree from, b
       int unsignedp;
       int volatilep = 0;
       tree tem;
-      bool misalignp;
       rtx mem = NULL_RTX;
 
       push_temp_slots ();
@@ -4707,40 +4708,7 @@ expand_assignment (tree to, tree from, b
          && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
        get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos, &offset);
 
-      /* If we are going to use store_bit_field and extract_bit_field,
-        make sure to_rtx will be safe for multiple use.  */
-      mode = TYPE_MODE (TREE_TYPE (tem));
-      if (TREE_CODE (tem) == MEM_REF
-         && mode != BLKmode
-         && ((align = get_object_alignment (tem))
-             < GET_MODE_ALIGNMENT (mode))
-         && ((icode = optab_handler (movmisalign_optab, mode))
-             != CODE_FOR_nothing))
-       {
-         struct expand_operand ops[2];
-
-         misalignp = true;
-         to_rtx = gen_reg_rtx (mode);
-         mem = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
-
-         /* If the misaligned store doesn't overwrite all bits, perform
-            rmw cycle on MEM.  */
-         if (bitsize != GET_MODE_BITSIZE (mode))
-           {
-             create_input_operand (&ops[0], to_rtx, mode);
-             create_fixed_operand (&ops[1], mem);
-             /* The movmisalign<mode> pattern cannot fail, else the assignment
-                would silently be omitted.  */
-             expand_insn (icode, 2, ops);
-
-             mem = copy_rtx (mem);
-           }
-       }
-      else
-       {
-         misalignp = false;
-         to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
-       }
+      to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
 
       /* If the bitfield is volatile, we want to access it in the
         field's mode, not the computed mode.
@@ -4879,17 +4847,6 @@ expand_assignment (tree to, tree from, b
                                  get_alias_set (to), nontemporal);
        }
 
-      if (misalignp)
-       {
-         struct expand_operand ops[2];
-
-         create_fixed_operand (&ops[0], mem);
-         create_input_operand (&ops[1], to_rtx, mode);
-         /* The movmisalign<mode> pattern cannot fail, else the assignment
-            would silently be omitted.  */
-         expand_insn (icode, 2, ops);
-       }
-
       if (result)
        preserve_temp_slots (result);
       pop_temp_slots ();

Comment 30 Richard Biener 2013-09-04 10:04:25 UTC

At least the wrong-code issue is latent on the 4.7 branch.

Comment 31 Bernd Edlinger 2013-09-04 10:17:34 UTC

(In reply to Richard Biener from comment #29)
> (In reply to Bernd Edlinger from comment #28)
> > (In reply to Richard Biener from comment #19)
> > > Barking up wrong trees.  Hacky fix looks like:
> > > 
> > > Index: gcc/expr.c
> > > ===================================================================
> > > --- gcc/expr.c  (revision 202043)
> > > +++ gcc/expr.c  (working copy)
> > > @@ -4753,6 +4753,9 @@ expand_assignment (tree to, tree from, b
> > >         {
> > >           enum machine_mode address_mode;
> > >           rtx offset_rtx;
> > > +         rtx saved_to_rtx = to_rtx;
> > > +         if (misalignp)
> > > +           to_rtx = mem;
> > >  
> > >           if (!MEM_P (to_rtx))
> > >             {
> > > @@ -4785,6 +4788,11 @@ expand_assignment (tree to, tree from, b
> > >           to_rtx = offset_address (to_rtx, offset_rtx,
> > >                                    highest_pow2_factor_for_target (to,
> > >                                                                    offset));
> > > +         if (misalignp)
> > > +           {
> > > +             mem = to_rtx;
> > > +             to_rtx = saved_to_rtx;
> > > +           }
> > >         }
> > >  
> > >        /* No action is needed if the target is not a memory and the field
> > > 
> > > 
> > 
> > this patch generates wrong code too:
> > 
> > foo:
> > .LFB9:
> >         .cfi_startproc
> >         pushq   %rbx
> >         .cfi_def_cfa_offset 16
> >         .cfi_offset 3, -16
> >         movq    %rdi, %rbx
> >         subq    $16, %rsp
> >         .cfi_def_cfa_offset 32
> >         call    get_i
> >         movdqu  (%rbx), %xmm0
> >         cltq
> >         movq    .LC1(%rip), %xmm1
> >         psrldq  $8, %xmm0
> >         punpcklqdq      %xmm0, %xmm1
> >         movdqu  %xmm1, 16(%rbx,%rax,8)
> >         addq    $16, %rsp
> >         .cfi_def_cfa_offset 16
> >         popq    %rbx
> >         .cfi_def_cfa_offset 8
> >         ret
> >         .cfi_endproc
> > 
> > loads *s into xmm0, modifies low part,
> > writes back at s->b[0] and s->b[1].
> 
> This bug is latent because we use the mode of the base object to query
> for movmisalign_optab (and use it).  It highlights the issue I raised
> in my last comment.
> 
> So, new, completely untested patch:
> 
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  (revision 202240)
> +++ gcc/expr.c  (working copy)
> @@ -4646,10 +4646,12 @@ expand_assignment (tree to, tree from, b
>  
>    /* Handle misaligned stores.  */
>    mode = TYPE_MODE (TREE_TYPE (to));
> -  if ((TREE_CODE (to) == MEM_REF
> -       || TREE_CODE (to) == TARGET_MEM_REF)
> -      && mode != BLKmode
> -      && !mem_ref_refers_to_non_mem_p (to)
> +  if (mode != BLKmode
> +      && (DECL_P (to)
> +         || handled_component_p (to)
> +         || ((TREE_CODE (to) == MEM_REF
> +              || TREE_CODE (to) == TARGET_MEM_REF)
> +             && !mem_ref_refers_to_non_mem_p (to)))
>        && ((align = get_object_alignment (to))
>           < GET_MODE_ALIGNMENT (mode))
>        && (((icode = optab_handler (movmisalign_optab, mode))
> @@ -4696,7 +4698,6 @@ expand_assignment (tree to, tree from, b
>        int unsignedp;
>        int volatilep = 0;
>        tree tem;
> -      bool misalignp;
>        rtx mem = NULL_RTX;
>  
>        push_temp_slots ();
> @@ -4707,40 +4708,7 @@ expand_assignment (tree to, tree from, b
>           && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
>         get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos,
> &offset);
>  
> -      /* If we are going to use store_bit_field and extract_bit_field,
> -        make sure to_rtx will be safe for multiple use.  */
> -      mode = TYPE_MODE (TREE_TYPE (tem));
> -      if (TREE_CODE (tem) == MEM_REF
> -         && mode != BLKmode
> -         && ((align = get_object_alignment (tem))
> -             < GET_MODE_ALIGNMENT (mode))
> -         && ((icode = optab_handler (movmisalign_optab, mode))
> -             != CODE_FOR_nothing))
> -       {
> -         struct expand_operand ops[2];
> -
> -         misalignp = true;
> -         to_rtx = gen_reg_rtx (mode);
> -         mem = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
> -
> -         /* If the misaligned store doesn't overwrite all bits, perform
> -            rmw cycle on MEM.  */
> -         if (bitsize != GET_MODE_BITSIZE (mode))
> -           {
> -             create_input_operand (&ops[0], to_rtx, mode);
> -             create_fixed_operand (&ops[1], mem);
> -             /* The movmisalign<mode> pattern cannot fail, else the
> assignment
> -                would silently be omitted.  */
> -             expand_insn (icode, 2, ops);
> -
> -             mem = copy_rtx (mem);
> -           }
> -       }
> -      else
> -       {
> -         misalignp = false;
> -         to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
> -       }
> +      to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
>  
>        /* If the bitfield is volatile, we want to access it in the
>          field's mode, not the computed mode.
> @@ -4879,17 +4847,6 @@ expand_assignment (tree to, tree from, b
>                                   get_alias_set (to), nontemporal);
>         }
>  
> -      if (misalignp)
> -       {
> -         struct expand_operand ops[2];
> -
> -         create_fixed_operand (&ops[0], mem);
> -         create_input_operand (&ops[1], to_rtx, mode);
> -         /* The movmisalign<mode> pattern cannot fail, else the assignment
> -            would silently be omitted.  */
> -         expand_insn (icode, 2, ops);
> -       }
> -
>        if (result)
>         preserve_temp_slots (result);
>        pop_temp_slots ();

Richard, this will break the bitregion_start/end handling.

IMHO the misalign path is just a very old performance improvement.
It is totally under the control of the target hooks,
if that is executed.
It is only meant for structures that fit into registers,
like:

struct
{
  doulbe: x;
};

=> mode = double

or

union
{
  double x;
  float y[2];
};

=> mode = int64


when it is dangerous to go there we do not have to.

I am beginning to think that my patch is right.

Comment 32 Bernd Edlinger 2013-09-06 07:05:46 UTC

Ok, now I dared to propose my patch on the gcc-patches list:

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00363.html

as I can see now both tests fail even on gcc-4.6 (ubuntu 12.04)

Comment 33 Richard Biener 2013-09-06 08:12:03 UTC

My point is that

> -      mode = TYPE_MODE (TREE_TYPE (tem));
> -      if (TREE_CODE (tem) == MEM_REF
> -         && mode != BLKmode
> -         && ((align = get_object_alignment (tem))
> -             < GET_MODE_ALIGNMENT (mode))
> -         && ((icode = optab_handler (movmisalign_optab, mode))

is wrong because it asks if we can do a V2DFmode (the mode of the
whole struct!) unaligned store which the backend says, yes!, and
it builds a V2DFmode store for us to use.

But the _access_ is V1DFmode!  The access is not to 'tem' but to 'to'!
Using the mode of 'tem' is just wrong to use for movmisalign.

Comment 34 Bernd Edlinger 2013-09-09 06:41:26 UTC

Hmm, this was looking like a working example ((if it is valid C at all)),
but after some thougt, I saw now it exposes a data store race:

#include <stdio.h>
#include <string.h>

typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
union x
{
   long long a;
   float b;
} __attribute__((aligned(1))) ;

struct s
{
  union x xx[0];
  V x;

} __attribute__((packed));

void __attribute__((noinline, noclone))
foo(struct s * x)
{
  x->xx[0].a = -1;
  x->xx[0].b = 3.14;
  x->x[1] = 0x123456789ABCDEF;
}

int
main()
{
  struct s ss;
  memset(&ss, 0, sizeof(ss));
  foo (&ss);
  printf("%f %llX\n", ss.xx[0].b, ss.xx[0].a);
  printf("%llX %llX\n", ss.x[0], ss.x[1]);
}

the resulting code is:

foo:
.LFB23:
        .cfi_startproc
        movdqu  (%rdi), %xmm0
        movabsq $-4294967296, %rdx
        movq    .LC1(%rip), %xmm1
        psrldq  $8, %xmm0
        punpcklqdq      %xmm0, %xmm1
        movdqu  %xmm1, (%rdi)
        movdqu  (%rdi), %xmm2
        movdqa  %xmm2, -24(%rsp)
        movq    -24(%rsp), %rax
        andq    %rdx, %rax
        orq     $1078523331, %rax
        movq    %rax, -24(%rsp)
        movdqa  -24(%rsp), %xmm3
        movdqu  %xmm3, (%rdi)
        movdqu  (%rdi), %xmm0
        movhps  .LC2(%rip), %xmm0
        movdqu  %xmm0, (%rdi)
        ret

Which shows all read/write accesses are 16 byte at a time and this creates
a forbidden data store race.
Looks like I shot my own patch down now :-)

Comment 35 Bernd Edlinger 2013-09-09 13:01:54 UTC

Well, this bug seems to have a symmetical twin on the read side.

In the above example, if I add this:

  if (x->xx[0].b != 3.14F || x->xx[1].a != 0x123456789ABCDEF)
    abort ();

this gets compiled:

        movdqu  (%rdi), %xmm0
        ucomiss .LC0(%rip), %xmm0
        jp      .L2
        jne     .L2
        movdqu  (%rdi), %xmm0
        movabsq $81985529216486895, %rdx
        movhlps %xmm0, %xmm1
        movq    %xmm1, %rax
        cmpq    %rdx, %rax
        jne     .L2

regardless if x is volatile or even with -fstrict-volatile-bitfields (!)

Comment 36 Bernd Edlinger 2013-09-10 07:55:53 UTC

Created attachment 30782 [details]
Next try for a fix

OK, I removed the misalign code path completely,
and found a one-line fix for read side too.

So here is my next try to fix this issue.
It seems to boot-strap on x86_64 and all test cases pass.

Additional testing would be welcome.

Any comments?

Comment 37 Martin Jambor 2013-09-18 14:44:19 UTC

Created attachment 30854 [details]
Testcase for both the assignment and read issues

For the record, this is a slightly extended original x86_64 testcase
which also exposes the bug in the read expansion.

So far I have not been able to come up with a grave problem with
passing EXPAND_MEMORY to tem expansion (and looking at history of this
code, the passed modifier has changed quite a lot and for not
immediately apparent reasons).  In order to use movmisalign_optab
instructions when we can, we should probably only do it if tem is a
structure with a zero sized array.

Comment 38 Bernd Edlinger 2013-09-18 16:31:57 UTC

Richard,

I've split up the patch as requested:

Part 1 was posted here, but not yet approved:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01292.html

Just for the record, part 1 has been bootstrapped and regression tested.

For part 2:
For simplicity I'd propose to use simply pass EXPAND_MEMORY here.
These misaligned uses are quite rare hopefully.
 
I will replace my test cases 3+4 with Martin's test case,
maybe without the get_i() because otherwise the test will die
from the ICE before the check runs.

Will do this after Part 1 if you agree...

Comment 39 Bernd Edlinger 2013-09-19 09:03:08 UTC

(In reply to Martin Jambor from comment #37)
> In order to use movmisalign_optab
> instructions when we can, we should probably only do it if tem is a
> structure with a zero sized array.

I do also see possible problems with volatile accesses here.
If we do not really need the whole structure reading everything
would be wrong.

Comment 40 rguenther@suse.de 2013-09-20 07:38:13 UTC

On Wed, 18 Sep 2013, bernd.edlinger at hotmail dot de wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748
> 
> --- Comment #38 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
> Richard,
> 
> I've split up the patch as requested:
> 
> Part 1 was posted here, but not yet approved:
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01292.html
> 
> Just for the record, part 1 has been bootstrapped and regression tested.
> 
> For part 2:
> For simplicity I'd propose to use simply pass EXPAND_MEMORY here.
> These misaligned uses are quite rare hopefully.
> 
> I will replace my test cases 3+4 with Martin's test case,
> maybe without the get_i() because otherwise the test will die
> from the ICE before the check runs.
> 
> Will do this after Part 1 if you agree...

The issue is that we added the movmisaling-on-component-ref path
for _correctness_ reasons.  Now, if all misaligned accesses end up
being expanded using BLKmode moves then it probably works ok (but
then they are going to be horribly inefficent and wouldn't have
triggered the movmisaling path anyway ...).

But I'm willing to make the experiment - we've got plenty (well ...)
time to watch out for fallout of this change (and it does simplify
code, something I always like ;))

Comment 41 Bernd Edlinger 2013-09-20 13:50:33 UTC

(In reply to rguenther@suse.de from comment #40)
> The issue is that we added the movmisaling-on-component-ref path
> for _correctness_ reasons.  Now, if all misaligned accesses end up
> being expanded using BLKmode moves then it probably works ok (but
> then they are going to be horribly inefficent and wouldn't have
> triggered the movmisaling path anyway ...).
> 
> But I'm willing to make the experiment - we've got plenty (well ...)
> time to watch out for fallout of this change (and it does simplify
> code, something I always like ;))

Ok, then I'll commit part 1 later in the evening.

From my experiments I've learned that when expanding a
mov_optab instead of a movmisalign_optab the backend
looks at the MEM_ALIGN() info of both sides and based on that
information either a movdqa or movdqu is expanded. And
in the example that you mentioned, apparently the MEM_ALIGN
must be wrong.

Comment 42 Bernd Edlinger 2013-09-20 14:10:17 UTC

Author: edlinger
Date: Fri Sep 20 14:10:14 2013
New Revision: 202778

URL: http://gcc.gnu.org/viewcvs?rev=202778&root=gcc&view=rev
Log:
2013-09-20  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        PR middle-end/57748
        * expr.c (expand_assignment): Remove misalignp code path.

testsuite/

        PR middle-end/57748
        * gcc.dg/torture/pr57748-1.c: New test.
        * gcc.dg/torture/pr57748-2.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr57748-1.c
    trunk/gcc/testsuite/gcc.dg/torture/pr57748-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expr.c
    trunk/gcc/testsuite/ChangeLog

Comment 43 Jakub Jelinek 2013-10-16 09:49:10 UTC

GCC 4.8.2 has been released.

Comment 44 Bernd Edlinger 2013-11-14 15:19:45 UTC

Hi Richard,

this 59143 issue is very similar to what Sandra encountered with her patch.

but it is not volatile in that example.
I can not reproduce that on the ARM.
But I think it is possible that the other problem still prevails
if I apply my Part 2 patch, and Sandra's Bit-field patch.

I think the mode of the struct is QImode, from expanding the inner reference. does not have room for the array.
and it is not overwritten,
maybe Sandra's update patch contains the key to resolve this:

+  else if (MEM_P (str_rtx)
+          && MEM_VOLATILE_P (str_rtx) // REMOVE THIS.
+          && flag_strict_volatile_bitfields > 0) // AND THIS.
+    /* This is a case where -fstrict-volatile-bitfields doesn't apply
+       because we can't do a single access in the declared mode of the field.
+       Since the incoming STR_RTX has already been adjusted to that mode,
+       fall back to word mode for subsequent logic.  */
+    str_rtx = adjust_address (str_rtx, word_mode, 0);

Can you approve my part 2 ?
Then we can see if the recursion is still happening there.

Bernd.

Comment 45 Paulo J. Matos 2013-11-21 14:26:51 UTC

Can we backport Bernd's patch of the 20th of September to 4.8? 

I just met this ICE in 4.8 and I guess we should still try to fix them in 4.8 since it's still maintained.

Comment 46 Bernd Edlinger 2013-11-21 14:38:23 UTC

Note:
a) the patch will should be OK for 4.8 but not for 4.7.
b) the read-side does not ICE but will likely generate invalid code.
=> a patch for the read-side is sill pending for review:
  http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02291.html

Comment 47 Paulo J. Matos 2013-11-21 14:40:22 UTC

Would like to add that I backported the patch locally and all the testsuite is passing until now. The ICE I initially got is not gone as well. So I can confirm that as far as I know, the patch is indeed fine in 4.8.

Comment 48 Paulo J. Matos 2013-11-22 11:26:49 UTC

(In reply to Paulo J. Matos from comment #47)
> Would like to add that I backported the patch locally and all the testsuite
> is passing until now. The ICE I initially got is not gone as well. So I can
> confirm that as far as I know, the patch is indeed fine in 4.8.

s/not/now/.

Obviously I meant that the patch applies cleanly and works for me in 4.8.

Comment 49 Paulo J. Matos 2013-11-22 11:38:46 UTC

I noticed that enabling misaligned moves have created a few test failures on my port. Namely: execute.exp=20051113-1.c. It was generating one too many moves to deference the structure in function Sum.

Applying patch posted by Bernd:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02291.html

fixes the problems I was seeing.
The patch does not apply cleanly to 4.8 (3 out of 14 hunks fail) but they are easily fixable.

Again, a request for this to be approved in master and later in 4.8 (since it lead to wrong code generation).

Comment 50 Bernd Edlinger 2013-11-28 01:53:55 UTC

(In reply to Paulo J. Matos from comment #49)
> I noticed that enabling misaligned moves have created a few test failures on
> my port. Namely: execute.exp=20051113-1.c. It was generating one too many
> moves to deference the structure in function Sum.

Just for curiosity:
On which target did you reproduce this problem in 20051113-1.c ?

Comment 51 Paulo J. Matos 2013-11-28 09:50:05 UTC

This was in a private VLIW SIMD port.

Comment 52 Bernd Edlinger 2014-01-08 17:25:40 UTC

Author: edlinger
Date: Wed Jan  8 17:25:38 2014
New Revision: 206437

URL: http://gcc.gnu.org/viewcvs?rev=206437&root=gcc&view=rev
Log:
2014-01-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        PR middle-end/57748
        * expr.h (expand_expr_real, expand_expr_real_1): Add new parameter
        inner_reference_p.
        (expand_expr, expand_normal): Adjust.
        * expr.c (expand_expr_real, expand_expr_real_1): Add new parameter
        inner_reference_p. Use inner_reference_p to expand inner references.
        (store_expr): Adjust.
        * cfgexpand.c (expand_call_stmt): Adjust.

testsuite:
2014-01-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        PR middle-end/57748
        * gcc.dg/torture/pr57748-3.c: New test.
        * gcc.dg/torture/pr57748-4.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr57748-3.c
    trunk/gcc/testsuite/gcc.dg/torture/pr57748-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/expr.c
    trunk/gcc/expr.h
    trunk/gcc/testsuite/ChangeLog

Comment 53 Marek Polacek 2014-02-05 17:54:37 UTC

So fixed on the trunk?

Comment 54 Bernd Edlinger 2014-02-05 18:00:35 UTC

(In reply to Marek Polacek from comment #53)
> So fixed on the trunk?

yes, fixed on trunk.

Comment 55 Richard Biener 2014-05-22 09:04:03 UTC

GCC 4.8.3 is being released, adjusting target milestone.

Comment 56 Jakub Jelinek 2014-12-19 13:32:48 UTC

GCC 4.8.4 has been released.

Comment 57 Ramana Radhakrishnan 2015-01-16 15:48:10 UTC

(In reply to Jakub Jelinek from comment #56)
> GCC 4.8.4 has been released.

If it's too late to backport this to 4.8 we might as well close this off targeting it for 4.9.

Ramana

Comment 58 Bernd Edlinger 2015-01-16 16:08:33 UTC

IIRC it was first fixed in 4.9.0.
The known to fail above includes 4.9.0 but that is wrong.

Do you think this is still worth to be fixed in 4.8.5 ?

Comment 59 Ramana Radhakrishnan 2015-01-16 16:36:03 UTC

(In reply to Bernd Edlinger from comment #58)
> IIRC it was first fixed in 4.9.0.
> The known to fail above includes 4.9.0 but that is wrong.
Fixed - 

> 
> Do you think this is still worth to be fixed in 4.8.5 ?


I think so, given it's an ICE but not sure about release cycles or indeed about the patch to decide whether it's worth backporting. There are far too many such dangling bugs on the ARM port and I'm just trying to clean some of these up.

Comment 60 Mikael Pettersson 2015-01-16 19:45:08 UTC

FWIW I've been including backports of r202778 and r206437 (which I think are the two relevant fixes) in my 4.8-based GCCs on x86_64, sparc64, powerpc64, m68k, armv5, and armv7 for a year now, without regressions.

Comment 61 Bernd Edlinger 2015-01-22 08:08:17 UTC

(In reply to Mikael Pettersson from comment #60)
> FWIW I've been including backports of r202778 and r206437 (which I think are
> the two relevant fixes) in my 4.8-based GCCs on x86_64, sparc64, powerpc64,
> m68k, armv5, and armv7 for a year now, without regressions.


Well Mikael,


if you already have used/tested that patch on 4.8 for such a long time,
I would appreciate it very much if you post it on the gcc-patches mailing list.


Thanks
Bernd.

Comment 62 Mikael Pettersson 2015-01-22 08:27:53 UTC

(In reply to Bernd Edlinger from comment #61) 
> if you already have used/tested that patch on 4.8 for such a long time,
> I would appreciate it very much if you post it on the gcc-patches mailing
> list.

I'll post it this weekend, but I'll need for someone else to commit it if it's approved.

Comment 63 Mikael Pettersson 2015-01-25 14:27:36 UTC

The backport request has been posted:

https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02192.html

Comment 64 Bernd Edlinger 2015-01-26 23:40:06 UTC

(In reply to Mikael Pettersson from comment #63)
> The backport request has been posted:
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02192.html

OK, fine.

If you want I can commit that for you now.

Comment 65 Mikael Pettersson 2015-01-27 08:45:13 UTC

(In reply to Bernd Edlinger from comment #64)
> (In reply to Mikael Pettersson from comment #63)
> > The backport request has been posted:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02192.html
> 
> OK, fine.
> 
> If you want I can commit that for you now.

Thanks, yes please do.

Comment 66 Bernd Edlinger 2015-01-27 17:07:57 UTC

Author: edlinger
Date: Tue Jan 27 17:07:24 2015
New Revision: 220179

URL: https://gcc.gnu.org/viewcvs?rev=220179&root=gcc&view=rev
Log:
Backport from mainline
fix for PR middle-end/57748

Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/cfgexpand.c
    branches/gcc-4_8-branch/gcc/expr.c
    branches/gcc-4_8-branch/gcc/expr.h
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog

Comment 67 Bernd Edlinger 2015-01-27 17:10:05 UTC

done.