Bug 66917

Summary: [4.9 regression] ARM: NEON: memcpy compiles to vst1 with incorrect alignment
Product: gcc Reporter: Mark H Weaver <mhw>
Component: tree-optimizationAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: major CC: mikpelinux, ramana, rguenth
Priority: P3 Keywords: wrong-code
Version: 4.9.3   
Target Milestone: 5.3   
Host: arm-unknown-linux-gnueabihf Target: arm-unknown-linux-gnueabihf
Build: arm-unknown-linux-gnueabihf Known to work: 5.3.0, 6.0
Known to fail: 4.8.5, 4.9.3, 5.1.0, 5.2.0 Last reconfirmed: 2015-07-29 00:00:00
Attachments: Minimal example code that is miscompiled
patch

Description Mark H Weaver 2015-07-17 20:06:43 UTC
Created attachment 36009 [details]
Minimal example code that is miscompiled

The attached C source code is mis-compiled by GCC 4.9.3 with -O3 and -mfpu=neon.  Calls to memcpy between a uint8_t* parameter and a local variable are compiled into vld1.64 and vst1.64 instructions with an alignment field that specifies that the memory address (the uint8_t* parameter) is aligned on a 64-bit boundary, although there is no basis for assuming such an alignment.

Here's the C code:

  #include <stdint.h>
  #include <string.h>

  void
  test_neon_load_store_alignment (const uint8_t *ap,
                                  const uint8_t *bp,
                                  uint8_t *outp)
  {
    union {
      uint64_t u[2];
      uint8_t c[16];
    } a, b;
  
    memcpy (a.c, ap, 16);
    memcpy (b.c, bp, 16);
    a.u[0] ^= b.u[0];
    a.u[1] ^= b.u[1];
    memcpy (outp, a.c, 16);
  }

When natively compiled with -S -O3 using GCC 4.9.3 configured with "--build=arm-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard --with-mode=thumb --with-fpu=neon", here's the output:

	.syntax unified
	.arch armv7-a
	.eabi_attribute 27, 3
	.eabi_attribute 28, 1
	.fpu neon
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.thumb
	.file	"foo.c"
	.text
	.align	2
	.global	test_neon_load_store_alignment
	.thumb
	.thumb_func
	.type	test_neon_load_store_alignment, %function
test_neon_load_store_alignment:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vld1.64	{d16-d17}, [r0:64]
	vld1.64	{d18-d19}, [r1:64]
	veor	q8, q8, q9
	vst1.64	{d16-d17}, [r2:64]
	bx	lr
	.size	test_neon_load_store_alignment, .-test_neon_load_store_alignment
	.ident	"GCC: (GNU) 4.9.3"
	.section	.note.GNU-stack,"",%progbits

Here, r0 contains ap, r1 contains bp, and r2 contains outp.  The associated operands of the vld1.64 and vst1.64 instructions are [r<n>:64], which specify that the memory address is known to be aligned on a 64-bit boundary, although that is not necessarily the case.

See section A8.6.307 (page A8-602) of the ARM v7 Architecture Reference Manual, which spells out quite clearly that for this instruction:

  if (address MOD alignment) != 0 then GenerateAlignmentException();

and that in this case, alignment == 8.

This may be related to bug #57271.

If needed, I can provide instructions for how to reproduce the exact same GCC binary I'm using, and all other software that it depends on, using GNU Guix <http://gnu.org/s/guix>.

FYI, the attached minimal test case was distilled from code in OpenSSL 1.0.2d that is miscompiled by GCC 4.9.3 in our armhf port targetting NEON, and leads to a Bus Error on the Novena laptop.
Comment 1 Mikael Pettersson 2015-07-18 12:21:43 UTC
I can reproduce the alignment faults on armv7l-linux-gnueabi with -O3 -mfpu=neon and gcc-4.8.5/4.9.3/5.2/6.0, but not with gcc-4.7.4.
Comment 2 Mikael Pettersson 2015-07-19 18:28:22 UTC
This test case changed behaviour twice in the 4.7->4.8 development cycle.  First r185807 broke it by replacing code for unaligned memory accesses with code requiring more alignment than present in the source:

--- pr66917.s-r185806   2015-07-19 17:16:23.536116155 +0200
+++ pr66917.s-r185807   2015-07-19 17:13:23.016388416 +0200
@@ -17,42 +17,13 @@
        .global test_neon_load_store_alignment
        .type   test_neon_load_store_alignment, %function
 test_neon_load_store_alignment:
-       @ args = 0, pretend = 0, frame = 32
+       @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
-       mov     r3, r0
-       stmfd   sp!, {r4, r5, r6, r7}
-       mov     r7, r1
-       ldr     r0, [r0, #0]    @ unaligned
-       mov     r6, r2
-       sub     sp, sp, #32
-       ldr     r1, [r3, #4]    @ unaligned
-       mov     r5, sp
-       ldr     r2, [r3, #8]    @ unaligned
-       add     r4, sp, #16
-       ldr     r3, [r3, #12]   @ unaligned
-       mov     ip, sp
-       stmia   r5!, {r0, r1, r2, r3}
-       ldr     r0, [r7, #0]    @ unaligned
-       ldr     r1, [r7, #4]    @ unaligned
-       ldr     r2, [r7, #8]    @ unaligned
-       ldr     r3, [r7, #12]   @ unaligned
-       fldd    d16, [sp, #0]   @ int
-       fldd    d19, [sp, #8]   @ int
-       stmia   r4!, {r0, r1, r2, r3}
-       fldd    d18, [sp, #16]  @ int
-       veor    d17, d16, d18
-       fldd    d18, [sp, #24]  @ int
-       fstd    d17, [sp, #0]   @ int
-       veor    d16, d19, d18
-       fstd    d16, [sp, #8]   @ int
-       ldmia   ip!, {r0, r1, r2, r3}
-       str     r0, [r6, #0]    @ unaligned
-       str     r1, [r6, #4]    @ unaligned
-       str     r2, [r6, #8]    @ unaligned
-       str     r3, [r6, #12]   @ unaligned
-       add     sp, sp, #32
-       ldmfd   sp!, {r4, r5, r6, r7}
+       vldmia  r0, {d18-d19}
+       vldmia  r1, {d16-d17}
+       veor    q8, q9, q8
+       vstmia  r2, {d16-d17}
        bx      lr
        .size   test_neon_load_store_alignment, .-test_neon_load_store_alignment
        .section        .text.startup,"ax",%progbits

On Linux, this code SIGBUSes because the kernel can't fix up the first misaligned access:

[292105.326391] Alignment trap: not handling instruction ecd02b04 at [<00008e84>]
[292105.396370] Unhandled fault: alignment exception (0x001) at 0x0008b109

Then r191399 changed it again by replacing the vldm/vstm instructions with vld1.64/vst1.64 instructions:

--- pr66917.s-r191398   2015-07-19 19:12:15.815583139 +0200
+++ pr66917.s-r191399   2015-07-19 19:08:36.416037498 +0200
@@ -20,10 +20,10 @@
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
-       vldmia  r0, {d18-d19}
-       vldmia  r1, {d16-d17}
+       vld1.64 {d18-d19}, [r0:64]
+       vld1.64 {d16-d17}, [r1:64]
        veor    q8, q9, q8
-       vstmia  r2, {d16-d17}
+       vst1.64 {d16-d17}, [r2:64]
        bx      lr
        .size   test_neon_load_store_alignment, .-test_neon_load_store_alignment
        .section        .text.startup,"ax",%progbits

These instructions still fault, but the kernel recognizes them and fixes up the alignment faults (if suitably configured).

If I compile the test case for x86_64 w/ -O3 -mavx, the compiler generates vmovdqu instructions which permit unaligned addresses.  So I suspect a target bug.
Comment 3 Ramana Radhakrishnan 2015-07-29 08:21:35 UTC
 for STRICT_ALIGNMENT targets properly and 
(In reply to Mikael Pettersson from comment #2)
> This test case changed behaviour twice in the 4.7->4.8 development cycle. 
> First r185807 broke it by replacing code for unaligned memory accesses with
> code requiring more alignment than present in the source:
> 
> --- pr66917.s-r185806   2015-07-19 17:16:23.536116155 +0200
> +++ pr66917.s-r185807   2015-07-19 17:13:23.016388416 +0200
> @@ -17,42 +17,13 @@
>         .global test_neon_load_store_alignment
>         .type   test_neon_load_store_alignment, %function
>  test_neon_load_store_alignment:
> -       @ args = 0, pretend = 0, frame = 32
> +       @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
> -       mov     r3, r0
> -       stmfd   sp!, {r4, r5, r6, r7}
> -       mov     r7, r1
> -       ldr     r0, [r0, #0]    @ unaligned
> -       mov     r6, r2
> -       sub     sp, sp, #32
> -       ldr     r1, [r3, #4]    @ unaligned
> -       mov     r5, sp
> -       ldr     r2, [r3, #8]    @ unaligned
> -       add     r4, sp, #16
> -       ldr     r3, [r3, #12]   @ unaligned
> -       mov     ip, sp
> -       stmia   r5!, {r0, r1, r2, r3}
> -       ldr     r0, [r7, #0]    @ unaligned
> -       ldr     r1, [r7, #4]    @ unaligned
> -       ldr     r2, [r7, #8]    @ unaligned
> -       ldr     r3, [r7, #12]   @ unaligned
> -       fldd    d16, [sp, #0]   @ int
> -       fldd    d19, [sp, #8]   @ int
> -       stmia   r4!, {r0, r1, r2, r3}
> -       fldd    d18, [sp, #16]  @ int
> -       veor    d17, d16, d18
> -       fldd    d18, [sp, #24]  @ int
> -       fstd    d17, [sp, #0]   @ int
> -       veor    d16, d19, d18
> -       fstd    d16, [sp, #8]   @ int
> -       ldmia   ip!, {r0, r1, r2, r3}
> -       str     r0, [r6, #0]    @ unaligned
> -       str     r1, [r6, #4]    @ unaligned
> -       str     r2, [r6, #8]    @ unaligned
> -       str     r3, [r6, #12]   @ unaligned
> -       add     sp, sp, #32
> -       ldmfd   sp!, {r4, r5, r6, r7}
> +       vldmia  r0, {d18-d19}
> +       vldmia  r1, {d16-d17}
> +       veor    q8, q9, q8
> +       vstmia  r2, {d16-d17}
>         bx      lr
>         .size   test_neon_load_store_alignment,
> .-test_neon_load_store_alignment
>         .section        .text.startup,"ax",%progbits
> 
> On Linux, this code SIGBUSes because the kernel can't fix up the first
> misaligned access:
> 
> [292105.326391] Alignment trap: not handling instruction ecd02b04 at
> [<00008e84>]
> [292105.396370] Unhandled fault: alignment exception (0x001) at 0x0008b109
> 
> Then r191399 changed it again by replacing the vldm/vstm instructions with
> vld1.64/vst1.64 instructions:
> 
> --- pr66917.s-r191398   2015-07-19 19:12:15.815583139 +0200
> +++ pr66917.s-r191399   2015-07-19 19:08:36.416037498 +0200
> @@ -20,10 +20,10 @@
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
> -       vldmia  r0, {d18-d19}
> -       vldmia  r1, {d16-d17}
> +       vld1.64 {d18-d19}, [r0:64]
> +       vld1.64 {d16-d17}, [r1:64]
>         veor    q8, q9, q8
> -       vstmia  r2, {d16-d17}
> +       vst1.64 {d16-d17}, [r2:64]
>         bx      lr
>         .size   test_neon_load_store_alignment,
> .-test_neon_load_store_alignment
>         .section        .text.startup,"ax",%progbits
> 
> These instructions still fault, but the kernel recognizes them and fixes up
> the alignment faults (if suitably configured).

> 
> If I compile the test case for x86_64 w/ -O3 -mavx, the compiler generates
> vmovdqu instructions which permit unaligned addresses.  So I suspect a
> target bug.


No, sorry that doesn't suggest a target bug to me at all. The ARM backend is a STRICT_ALIGNMENT target, this implies the optimizers cannot introduce unaligned accesses unless sent through the movmisalign interface AFAIUI. 

In this testcase somehow at expand time the compiler thinks that the pointer is aligned to 8 bytes and can put out a normal mov insn rather than use the misaligned interface.

From the expand dump:

(insn 8 5 9 2 (set (reg:V2DI 117)
        (mem:V2DI (reg/v/f:SI 113 [ ap ]) [0 MEM[(char * {ref-all})ap_2(D)]+0 S16 A64])) /tmp/misalign.c:17 -1
     (nil))
(insn 9 8 10 2 (set (reg:V2DI 118)
        (mem:V2DI (reg/v/f:SI 114 [ bp ]) [0 MEM[(char * {ref-all})bp_3(D)]+0 S16 A64])) /tmp/misalign.c:17 -1
     (nil))

Look at the A64.. That's a smoking gun to me and needs further investigation.
Comment 4 Ramana Radhakrishnan 2015-07-29 08:22:58 UTC
Wrong code.
Comment 5 ktkachov 2015-07-29 14:03:49 UTC
The slp2 dump at tree level says about the ap pointer:
        base_address: ap_2(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 0
        aligned to: 64
        base_object: MEM[(char * {ref-all})ap_2(D)]

That looks dodgy?
Comment 6 Richard Biener 2015-07-29 14:37:02 UTC
Probably because you access a.u/b.u which is uint64_t and thus the union
is laid out as having 8 byte alignment?

How do the original GENERIC trees produced look like?
Comment 7 ktkachov 2015-07-29 14:40:56 UTC
(In reply to Richard Biener from comment #6)
> Probably because you access a.u/b.u which is uint64_t and thus the union
> is laid out as having 8 byte alignment?
> 
> How do the original GENERIC trees produced look like?

The 003t.original shows:
  typedef union 
  {
    uint64_t u[2];
    uint8_t c[16];
  } unionunion 
  {
    uint64_t u[2];
    uint8_t c[16];
  };
  union 
  {
    uint64_t u[2];
    uint8_t c[16];
  } a;
  union 
  {
    uint64_t u[2];
    uint8_t c[16];
  } b;

    union 
  {
    uint64_t u[2];
    uint8_t c[16];
  } a;
    union 
  {
    uint64_t u[2];
    uint8_t c[16];
  } b;
  memcpy ((void * restrict) &a.c, (const void * restrict) ap, 16);
  memcpy ((void * restrict) &b.c, (const void * restrict) bp, 16);
  a.u[0] = a.u[0] ^ b.u[0];
  a.u[1] = a.u[1] ^ b.u[1];
  memcpy ((void * restrict) outp, (const void * restrict) &a.c, 16);
Comment 8 ktkachov 2015-07-29 14:47:21 UTC
The 007t.lower one looks like:
  try
    {
      MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})ap];
      MEM[(char * {ref-all})&b] = MEM[(char * {ref-all})bp];
      D.5587 = a.u[0];
      D.5588 = b.u[0];
      D.5589 = D.5587 ^ D.5588;
      a.u[0] = D.5589;
      D.5590 = a.u[1];
      D.5591 = b.u[1];
      D.5592 = D.5590 ^ D.5591;
      a.u[1] = D.5592;
      MEM[(char * {ref-all})outp] = MEM[(char * {ref-all})&a];
    }
  finally
    {
      a = {CLOBBER};
      b = {CLOBBER};
    }
  return;
Comment 9 Richard Biener 2015-07-30 07:20:24 UTC
I see the following, 'a' having an alignment of 8 bytes:

 <array_ref 0x7ffff68201c0
    type <integer_type 0x7ffff67f5d20 uint64_t unsigned DI
        size <integer_cst 0x7ffff68ccf18 constant 64>
        unit size <integer_cst 0x7ffff68ccf30 constant 8>
        align 64 symtab 0 alias set -1 canonical type 0x7ffff68d19d8 precision 64 min <integer_cst 0x7ffff68de210 0> max <integer_cst 0x7ffff68d7500 18446744073709551615> context <translation_unit_decl 0x7ffff7ff10f0 D.5218>
        pointer_to_this <pointer_type 0x7ffff68215e8>>
   
    arg 0 <component_ref 0x7ffff681f570
        type <array_type 0x7ffff6821540 type <integer_type 0x7ffff67f5d20 uint64_t>
            TI
            size <integer_cst 0x7ffff68de228 constant 128>
            unit size <integer_cst 0x7ffff68de240 constant 16>
            align 64 symtab 0 alias set -1 canonical type 0x7ffff6821690 domain <integer_type 0x7ffff6821498>>
       
        arg 0 <var_decl 0x7ffff68d6900 a type <union_type 0x7ffff68213f0>
            addressable used BLK file t.c line 12 col 5 size <integer_cst 0x7ffff68de228 128> unit size <integer_cst 0x7ffff68de240 16>
            align 64 context <function_decl 0x7ffff67f6e00 test_neon_load_store_alignment> chain <var_decl 0x7ffff68d6990 b>>
        arg 1 <field_decl 0x7ffff69b3558 u type <array_type 0x7ffff6821540>
            TI file t.c line 10 col 16 size <integer_cst 0x7ffff68de228 128> unit size <integer_cst 0x7ffff68de240 16>
            align 64 offset_align 64
            offset <integer_cst 0x7ffff68ccee8 constant 0>
            bit offset <integer_cst 0x7ffff68ccf48 constant 0> context <union_type 0x7ffff68213f0> chain <field_decl 0x7ffff69b35f0 c>>>
    arg 1 <integer_cst 0x7ffff68de2e8 type <integer_type 0x7ffff68d1690 int> constant 1>>

so the smoking dump isn't the reads but the write:

(insn 11 10 0 (set (mem:V2DI (reg/v/f:SI 115 [ outp ]) [0 MEM[(char * {ref-all})outp_7(D)]+0 S16 A64])
        (reg:V2DI 116 [ vect__4.13 ])) t.c:18 -1
     (nil))

which for some reason also got 8 byte alignment.  I suppose this is
SRA at work as -fno-tree-sra fixes this.

Thus confirmed as SRA problem.
Comment 10 Richard Biener 2015-07-30 07:43:23 UTC
Actually SRA is fine.  vect_compute_data_ref_alignment is, too, leaving
DR_MISALIGNMENT at -1.  The target then tells us via

      bool is_packed = false;
      tree type = (TREE_TYPE (DR_REF (dr)));

      if (!known_alignment_for_access_p (dr))
        is_packed = not_size_aligned (DR_REF (dr));

     if ((TYPE_USER_ALIGN (type) && !is_packed)
         || targetm.vectorize.
              support_vector_misalignment (mode, type,
                                           DR_MISALIGNMENT (dr), is_packed))
       return dr_unaligned_supported;

rm_builtin_support_vector_misalignment (mode=V2DImode, type=0x7ffff68280a8, 
    misalignment=-1, is_packed=true)
    at /space/rguenther/tramp3d/trunk/gcc/config/arm/arm.c:27218
27218     if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
27217   {
27218     if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
27219       {
27220         HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type);
27221
27222         if (is_packed)
(gdb) p unaligned_access
No symbol "unaligned_access" in current context.
(gdb) n
27220         HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type);
(gdb) 
27222         if (is_packed)
(gdb) 
27223           return align == 1;

oops.  That's obviously buggy.  Not sure what the ARM hook expects in 'type',
but appearantly it's not what the vectorizer passes in here.

Docs say

This hook should return true if the target supports misaligned vector\n\
store/load of a specific factor denoted in the @var{misalignment}\n\
parameter.  The vector store/load should be of machine mode @var{mode} and\n\
the elements in the vectors should be of type @var{type}.  @var{is_packed}\n\
parameter is true if the memory access is defined in a packed struct.

but the arm hook seems to say "yes, I can vectorize with V2DImode
for a vector with element type unsigned long long __attribute__((aligned(1)))"
which it obviously cannot.

IMHO the hook should compare required mode alignment with type alignment.

This seems to be the arm hook trying to be too clever.

-> target bug.
Comment 11 Richard Biener 2015-07-30 07:48:08 UTC
Suggested fix:

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c        (revision 226348)
+++ gcc/config/arm/arm.c        (working copy)
@@ -27217,7 +27217,8 @@ arm_builtin_support_vector_misalignment
 {
   if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
     {
-      HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type);
+      HOST_WIDE_INT align
+       = GET_MODE_ALIGNMENT (TYPE_MODE (type)) / BITS_PER_UNIT;
 
       if (is_packed)
         return align == 1;

which produces

        push    {r4}
        add     r3, r0, #8
        add     r4, r1, #8
        vld1.64 {d17}, [r1]
        vld1.64 {d16}, [r4]
        ldr     r4, [sp], #4
        vld1.64 {d19}, [r3]
        veor    d16, d16, d19
        vld1.64 {d18}, [r0]
        veor    d17, d17, d18
        vst1.64 {d17}, [r2]!
        vst1.64 {d16}, [r2]
        bx      lr
Comment 12 Richard Earnshaw 2015-07-30 14:11:16 UTC
> but the arm hook seems to say "yes, I can vectorize with V2DImode
> for a vector with element type unsigned long long __attribute__((aligned(1)))"
> which it obviously cannot.

It can vectorize for this case, but the loads must not be marked as aligned loads.  That is, the mid-end must use the misaligned code hooks.

So I don't think this is a target bug.
Comment 13 Richard Biener 2015-07-30 14:39:06 UTC
Mine.
Comment 14 rguenther@suse.de 2015-07-30 14:39:11 UTC
On Thu, 30 Jul 2015, rearnsha at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66917
> 
> Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>           Component|target                      |tree-optimization
> 
> --- Comment #12 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
> > but the arm hook seems to say "yes, I can vectorize with V2DImode
> > for a vector with element type unsigned long long __attribute__((aligned(1)))"
> > which it obviously cannot.
> 
> It can vectorize for this case, but the loads must not be marked as aligned
> loads.  That is, the mid-end must use the misaligned code hooks.
> 
> So I don't think this is a target bug.

Hmm, indeed.  It's a vectorizer bug that always communicates at least
element alignment (thus doesn't properly code-gen for the packed_p
case)
Comment 15 Richard Biener 2015-07-30 14:41:34 UTC
For both loads and stores we do

              else if (DR_MISALIGNMENT (first_dr) == -1)
                {
                  TREE_TYPE (data_ref)
                    = build_aligned_type (TREE_TYPE (data_ref),
                                          TYPE_ALIGN (elem_type));
                  align = TYPE_ALIGN_UNIT (elem_type);
                  misalign = 0;

but TYPE_ALIGN (elem_type) doesn't always hold.
Comment 16 Richard Biener 2015-07-31 09:20:36 UTC
Created attachment 36099 [details]
patch

I am testing the attached (on x86_64-linux), it fixes the testcase with a cross to arm.  Full testing on arm appreciated.
Comment 17 ktkachov 2015-07-31 09:23:51 UTC
(In reply to Richard Biener from comment #16)
> Created attachment 36099 [details]
> patch
> 
> I am testing the attached (on x86_64-linux), it fixes the testcase with a
> cross to arm.  Full testing on arm appreciated.

I'll give it a test run
Comment 18 ktkachov 2015-07-31 13:21:34 UTC
(In reply to ktkachov from comment #17)
> (In reply to Richard Biener from comment #16)
> > Created attachment 36099 [details]
> > patch
> > 
> > I am testing the attached (on x86_64-linux), it fixes the testcase with a
> > cross to arm.  Full testing on arm appreciated.
> 
> I'll give it a test run

A bootstrap and make check on arm-none-linux-gnueabihf looks fine
Comment 19 Richard Biener 2015-08-03 07:14:06 UTC
Fixed on trunk sofar.
Comment 20 Richard Biener 2015-08-03 07:14:08 UTC
Author: rguenth
Date: Mon Aug  3 07:13:36 2015
New Revision: 226487

URL: https://gcc.gnu.org/viewcvs?rev=226487&root=gcc&view=rev
Log:
2015-08-03  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/66917
	* tree-vectorizer.h (struct dataref_aux): Add base_element_aligned
	field.
	(DR_VECT_AUX): New macro.
	(set_dr_misalignment): Adjust.
	(dr_misalignment): Likewise.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
	Compute whether the base is at least element aligned.
	* tree-vect-stmts.c (ensure_base_align): Adjust.
	(vectorizable_store): If the base is not element aligned
	preserve alignment of the original access if misalignment is unknown.
	(vectorizable_load): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-stmts.c
    trunk/gcc/tree-vectorizer.h
Comment 21 ctice 2015-09-24 20:57:01 UTC
Author: ctice
Date: Thu Sep 24 20:56:29 2015
New Revision: 228099

URL: https://gcc.gnu.org/viewcvs?rev=228099&root=gcc&view=rev
Log:
Backport r226487 from GCC FSF mainline:

r226487 | rguenth | 2015-08-03 00:13:36 -0700 (Mon, 03 Aug 2015) | 15 lines

2015-08-03  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/66917
	* tree-vectorizer.h (struct dataref_aux): Add base_element_aligned
	field.
	(DR_VECT_AUX): New macro.
	(set_dr_misalignment): Adjust.
	(dr_misalignment): Likewise.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
	Compute whether the base is at least element aligned.
	* tree-vect-stmts.c (ensure_base_align): Adjust.
	(vectorizable_store): If the base is not element aligned
	preserve alignment of the original access if misalignment is unknown.
	(vectorizable_load): Likewise.



Modified:
    branches/google/gcc-4_9-mobile/gcc/tree-vect-data-refs.c
    branches/google/gcc-4_9-mobile/gcc/tree-vect-stmts.c
    branches/google/gcc-4_9-mobile/gcc/tree-vectorizer.h
Comment 22 Richard Biener 2015-09-28 10:46:27 UTC
Author: rguenth
Date: Mon Sep 28 10:45:55 2015
New Revision: 228199

URL: https://gcc.gnu.org/viewcvs?rev=228199&root=gcc&view=rev
Log:
2015-09-28  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2015-08-03  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/66917
	* tree-vectorizer.h (struct dataref_aux): Add base_element_aligned
	field.
	(DR_VECT_AUX): New macro.
	(set_dr_misalignment): Adjust.
	(dr_misalignment): Likewise.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
	Compute whether the base is at least element aligned.
	* tree-vect-stmts.c (ensure_base_align): Adjust.
	(vectorizable_store): If the base is not element aligned
	preserve alignment of the original access if misalignment is unknown.
	(vectorizable_load): Likewise.

	2015-09-16  Richard Biener  <rguenther@suse.de>

	PR middle-end/67442
	* fold-const.c (extract_muldiv_1): Properly extend multiplication
	result before builting a tree via wide_int_to_tree.

	* gcc.dg/torture/pr67442.c: New testcase.

Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.dg/torture/pr67442.c
Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/fold-const.c
    branches/gcc-5-branch/gcc/testsuite/ChangeLog
    branches/gcc-5-branch/gcc/tree-vect-data-refs.c
    branches/gcc-5-branch/gcc/tree-vect-stmts.c
    branches/gcc-5-branch/gcc/tree-vectorizer.h
Comment 23 Richard Earnshaw 2016-06-28 13:01:03 UTC
Any plans to backport to 4.9, or should this be closed?
Comment 24 Richard Biener 2016-08-03 10:57:25 UTC
Fixed in GCC 5.3+