Created attachment 36009 [details] Minimal example code that is miscompiled The attached C source code is mis-compiled by GCC 4.9.3 with -O3 and -mfpu=neon. Calls to memcpy between a uint8_t* parameter and a local variable are compiled into vld1.64 and vst1.64 instructions with an alignment field that specifies that the memory address (the uint8_t* parameter) is aligned on a 64-bit boundary, although there is no basis for assuming such an alignment. Here's the C code: #include <stdint.h> #include <string.h> void test_neon_load_store_alignment (const uint8_t *ap, const uint8_t *bp, uint8_t *outp) { union { uint64_t u[2]; uint8_t c[16]; } a, b; memcpy (a.c, ap, 16); memcpy (b.c, bp, 16); a.u[0] ^= b.u[0]; a.u[1] ^= b.u[1]; memcpy (outp, a.c, 16); } When natively compiled with -S -O3 using GCC 4.9.3 configured with "--build=arm-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard --with-mode=thumb --with-fpu=neon", here's the output: .syntax unified .arch armv7-a .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu neon .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .thumb .file "foo.c" .text .align 2 .global test_neon_load_store_alignment .thumb .thumb_func .type test_neon_load_store_alignment, %function test_neon_load_store_alignment: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vld1.64 {d16-d17}, [r0:64] vld1.64 {d18-d19}, [r1:64] veor q8, q8, q9 vst1.64 {d16-d17}, [r2:64] bx lr .size test_neon_load_store_alignment, .-test_neon_load_store_alignment .ident "GCC: (GNU) 4.9.3" .section .note.GNU-stack,"",%progbits Here, r0 contains ap, r1 contains bp, and r2 contains outp. The associated operands of the vld1.64 and vst1.64 instructions are [r<n>:64], which specify that the memory address is known to be aligned on a 64-bit boundary, although that is not necessarily the case. See section A8.6.307 (page A8-602) of the ARM v7 Architecture Reference Manual, which spells out quite clearly that for this instruction: if (address MOD alignment) != 0 then GenerateAlignmentException(); and that in this case, alignment == 8. This may be related to bug #57271. If needed, I can provide instructions for how to reproduce the exact same GCC binary I'm using, and all other software that it depends on, using GNU Guix <http://gnu.org/s/guix>. FYI, the attached minimal test case was distilled from code in OpenSSL 1.0.2d that is miscompiled by GCC 4.9.3 in our armhf port targetting NEON, and leads to a Bus Error on the Novena laptop.
I can reproduce the alignment faults on armv7l-linux-gnueabi with -O3 -mfpu=neon and gcc-4.8.5/4.9.3/5.2/6.0, but not with gcc-4.7.4.
This test case changed behaviour twice in the 4.7->4.8 development cycle. First r185807 broke it by replacing code for unaligned memory accesses with code requiring more alignment than present in the source: --- pr66917.s-r185806 2015-07-19 17:16:23.536116155 +0200 +++ pr66917.s-r185807 2015-07-19 17:13:23.016388416 +0200 @@ -17,42 +17,13 @@ .global test_neon_load_store_alignment .type test_neon_load_store_alignment, %function test_neon_load_store_alignment: - @ args = 0, pretend = 0, frame = 32 + @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. - mov r3, r0 - stmfd sp!, {r4, r5, r6, r7} - mov r7, r1 - ldr r0, [r0, #0] @ unaligned - mov r6, r2 - sub sp, sp, #32 - ldr r1, [r3, #4] @ unaligned - mov r5, sp - ldr r2, [r3, #8] @ unaligned - add r4, sp, #16 - ldr r3, [r3, #12] @ unaligned - mov ip, sp - stmia r5!, {r0, r1, r2, r3} - ldr r0, [r7, #0] @ unaligned - ldr r1, [r7, #4] @ unaligned - ldr r2, [r7, #8] @ unaligned - ldr r3, [r7, #12] @ unaligned - fldd d16, [sp, #0] @ int - fldd d19, [sp, #8] @ int - stmia r4!, {r0, r1, r2, r3} - fldd d18, [sp, #16] @ int - veor d17, d16, d18 - fldd d18, [sp, #24] @ int - fstd d17, [sp, #0] @ int - veor d16, d19, d18 - fstd d16, [sp, #8] @ int - ldmia ip!, {r0, r1, r2, r3} - str r0, [r6, #0] @ unaligned - str r1, [r6, #4] @ unaligned - str r2, [r6, #8] @ unaligned - str r3, [r6, #12] @ unaligned - add sp, sp, #32 - ldmfd sp!, {r4, r5, r6, r7} + vldmia r0, {d18-d19} + vldmia r1, {d16-d17} + veor q8, q9, q8 + vstmia r2, {d16-d17} bx lr .size test_neon_load_store_alignment, .-test_neon_load_store_alignment .section .text.startup,"ax",%progbits On Linux, this code SIGBUSes because the kernel can't fix up the first misaligned access: [292105.326391] Alignment trap: not handling instruction ecd02b04 at [<00008e84>] [292105.396370] Unhandled fault: alignment exception (0x001) at 0x0008b109 Then r191399 changed it again by replacing the vldm/vstm instructions with vld1.64/vst1.64 instructions: --- pr66917.s-r191398 2015-07-19 19:12:15.815583139 +0200 +++ pr66917.s-r191399 2015-07-19 19:08:36.416037498 +0200 @@ -20,10 +20,10 @@ @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. - vldmia r0, {d18-d19} - vldmia r1, {d16-d17} + vld1.64 {d18-d19}, [r0:64] + vld1.64 {d16-d17}, [r1:64] veor q8, q9, q8 - vstmia r2, {d16-d17} + vst1.64 {d16-d17}, [r2:64] bx lr .size test_neon_load_store_alignment, .-test_neon_load_store_alignment .section .text.startup,"ax",%progbits These instructions still fault, but the kernel recognizes them and fixes up the alignment faults (if suitably configured). If I compile the test case for x86_64 w/ -O3 -mavx, the compiler generates vmovdqu instructions which permit unaligned addresses. So I suspect a target bug.
for STRICT_ALIGNMENT targets properly and (In reply to Mikael Pettersson from comment #2) > This test case changed behaviour twice in the 4.7->4.8 development cycle. > First r185807 broke it by replacing code for unaligned memory accesses with > code requiring more alignment than present in the source: > > --- pr66917.s-r185806 2015-07-19 17:16:23.536116155 +0200 > +++ pr66917.s-r185807 2015-07-19 17:13:23.016388416 +0200 > @@ -17,42 +17,13 @@ > .global test_neon_load_store_alignment > .type test_neon_load_store_alignment, %function > test_neon_load_store_alignment: > - @ args = 0, pretend = 0, frame = 32 > + @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > - mov r3, r0 > - stmfd sp!, {r4, r5, r6, r7} > - mov r7, r1 > - ldr r0, [r0, #0] @ unaligned > - mov r6, r2 > - sub sp, sp, #32 > - ldr r1, [r3, #4] @ unaligned > - mov r5, sp > - ldr r2, [r3, #8] @ unaligned > - add r4, sp, #16 > - ldr r3, [r3, #12] @ unaligned > - mov ip, sp > - stmia r5!, {r0, r1, r2, r3} > - ldr r0, [r7, #0] @ unaligned > - ldr r1, [r7, #4] @ unaligned > - ldr r2, [r7, #8] @ unaligned > - ldr r3, [r7, #12] @ unaligned > - fldd d16, [sp, #0] @ int > - fldd d19, [sp, #8] @ int > - stmia r4!, {r0, r1, r2, r3} > - fldd d18, [sp, #16] @ int > - veor d17, d16, d18 > - fldd d18, [sp, #24] @ int > - fstd d17, [sp, #0] @ int > - veor d16, d19, d18 > - fstd d16, [sp, #8] @ int > - ldmia ip!, {r0, r1, r2, r3} > - str r0, [r6, #0] @ unaligned > - str r1, [r6, #4] @ unaligned > - str r2, [r6, #8] @ unaligned > - str r3, [r6, #12] @ unaligned > - add sp, sp, #32 > - ldmfd sp!, {r4, r5, r6, r7} > + vldmia r0, {d18-d19} > + vldmia r1, {d16-d17} > + veor q8, q9, q8 > + vstmia r2, {d16-d17} > bx lr > .size test_neon_load_store_alignment, > .-test_neon_load_store_alignment > .section .text.startup,"ax",%progbits > > On Linux, this code SIGBUSes because the kernel can't fix up the first > misaligned access: > > [292105.326391] Alignment trap: not handling instruction ecd02b04 at > [<00008e84>] > [292105.396370] Unhandled fault: alignment exception (0x001) at 0x0008b109 > > Then r191399 changed it again by replacing the vldm/vstm instructions with > vld1.64/vst1.64 instructions: > > --- pr66917.s-r191398 2015-07-19 19:12:15.815583139 +0200 > +++ pr66917.s-r191399 2015-07-19 19:08:36.416037498 +0200 > @@ -20,10 +20,10 @@ > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > - vldmia r0, {d18-d19} > - vldmia r1, {d16-d17} > + vld1.64 {d18-d19}, [r0:64] > + vld1.64 {d16-d17}, [r1:64] > veor q8, q9, q8 > - vstmia r2, {d16-d17} > + vst1.64 {d16-d17}, [r2:64] > bx lr > .size test_neon_load_store_alignment, > .-test_neon_load_store_alignment > .section .text.startup,"ax",%progbits > > These instructions still fault, but the kernel recognizes them and fixes up > the alignment faults (if suitably configured). > > If I compile the test case for x86_64 w/ -O3 -mavx, the compiler generates > vmovdqu instructions which permit unaligned addresses. So I suspect a > target bug. No, sorry that doesn't suggest a target bug to me at all. The ARM backend is a STRICT_ALIGNMENT target, this implies the optimizers cannot introduce unaligned accesses unless sent through the movmisalign interface AFAIUI. In this testcase somehow at expand time the compiler thinks that the pointer is aligned to 8 bytes and can put out a normal mov insn rather than use the misaligned interface. From the expand dump: (insn 8 5 9 2 (set (reg:V2DI 117) (mem:V2DI (reg/v/f:SI 113 [ ap ]) [0 MEM[(char * {ref-all})ap_2(D)]+0 S16 A64])) /tmp/misalign.c:17 -1 (nil)) (insn 9 8 10 2 (set (reg:V2DI 118) (mem:V2DI (reg/v/f:SI 114 [ bp ]) [0 MEM[(char * {ref-all})bp_3(D)]+0 S16 A64])) /tmp/misalign.c:17 -1 (nil)) Look at the A64.. That's a smoking gun to me and needs further investigation.
Wrong code.
The slp2 dump at tree level says about the ap pointer: base_address: ap_2(D) offset from base address: 0 constant offset from base address: 0 step: 0 aligned to: 64 base_object: MEM[(char * {ref-all})ap_2(D)] That looks dodgy?
Probably because you access a.u/b.u which is uint64_t and thus the union is laid out as having 8 byte alignment? How do the original GENERIC trees produced look like?
(In reply to Richard Biener from comment #6) > Probably because you access a.u/b.u which is uint64_t and thus the union > is laid out as having 8 byte alignment? > > How do the original GENERIC trees produced look like? The 003t.original shows: typedef union { uint64_t u[2]; uint8_t c[16]; } unionunion { uint64_t u[2]; uint8_t c[16]; }; union { uint64_t u[2]; uint8_t c[16]; } a; union { uint64_t u[2]; uint8_t c[16]; } b; union { uint64_t u[2]; uint8_t c[16]; } a; union { uint64_t u[2]; uint8_t c[16]; } b; memcpy ((void * restrict) &a.c, (const void * restrict) ap, 16); memcpy ((void * restrict) &b.c, (const void * restrict) bp, 16); a.u[0] = a.u[0] ^ b.u[0]; a.u[1] = a.u[1] ^ b.u[1]; memcpy ((void * restrict) outp, (const void * restrict) &a.c, 16);
The 007t.lower one looks like: try { MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})ap]; MEM[(char * {ref-all})&b] = MEM[(char * {ref-all})bp]; D.5587 = a.u[0]; D.5588 = b.u[0]; D.5589 = D.5587 ^ D.5588; a.u[0] = D.5589; D.5590 = a.u[1]; D.5591 = b.u[1]; D.5592 = D.5590 ^ D.5591; a.u[1] = D.5592; MEM[(char * {ref-all})outp] = MEM[(char * {ref-all})&a]; } finally { a = {CLOBBER}; b = {CLOBBER}; } return;
I see the following, 'a' having an alignment of 8 bytes: <array_ref 0x7ffff68201c0 type <integer_type 0x7ffff67f5d20 uint64_t unsigned DI size <integer_cst 0x7ffff68ccf18 constant 64> unit size <integer_cst 0x7ffff68ccf30 constant 8> align 64 symtab 0 alias set -1 canonical type 0x7ffff68d19d8 precision 64 min <integer_cst 0x7ffff68de210 0> max <integer_cst 0x7ffff68d7500 18446744073709551615> context <translation_unit_decl 0x7ffff7ff10f0 D.5218> pointer_to_this <pointer_type 0x7ffff68215e8>> arg 0 <component_ref 0x7ffff681f570 type <array_type 0x7ffff6821540 type <integer_type 0x7ffff67f5d20 uint64_t> TI size <integer_cst 0x7ffff68de228 constant 128> unit size <integer_cst 0x7ffff68de240 constant 16> align 64 symtab 0 alias set -1 canonical type 0x7ffff6821690 domain <integer_type 0x7ffff6821498>> arg 0 <var_decl 0x7ffff68d6900 a type <union_type 0x7ffff68213f0> addressable used BLK file t.c line 12 col 5 size <integer_cst 0x7ffff68de228 128> unit size <integer_cst 0x7ffff68de240 16> align 64 context <function_decl 0x7ffff67f6e00 test_neon_load_store_alignment> chain <var_decl 0x7ffff68d6990 b>> arg 1 <field_decl 0x7ffff69b3558 u type <array_type 0x7ffff6821540> TI file t.c line 10 col 16 size <integer_cst 0x7ffff68de228 128> unit size <integer_cst 0x7ffff68de240 16> align 64 offset_align 64 offset <integer_cst 0x7ffff68ccee8 constant 0> bit offset <integer_cst 0x7ffff68ccf48 constant 0> context <union_type 0x7ffff68213f0> chain <field_decl 0x7ffff69b35f0 c>>> arg 1 <integer_cst 0x7ffff68de2e8 type <integer_type 0x7ffff68d1690 int> constant 1>> so the smoking dump isn't the reads but the write: (insn 11 10 0 (set (mem:V2DI (reg/v/f:SI 115 [ outp ]) [0 MEM[(char * {ref-all})outp_7(D)]+0 S16 A64]) (reg:V2DI 116 [ vect__4.13 ])) t.c:18 -1 (nil)) which for some reason also got 8 byte alignment. I suppose this is SRA at work as -fno-tree-sra fixes this. Thus confirmed as SRA problem.
Actually SRA is fine. vect_compute_data_ref_alignment is, too, leaving DR_MISALIGNMENT at -1. The target then tells us via bool is_packed = false; tree type = (TREE_TYPE (DR_REF (dr))); if (!known_alignment_for_access_p (dr)) is_packed = not_size_aligned (DR_REF (dr)); if ((TYPE_USER_ALIGN (type) && !is_packed) || targetm.vectorize. support_vector_misalignment (mode, type, DR_MISALIGNMENT (dr), is_packed)) return dr_unaligned_supported; rm_builtin_support_vector_misalignment (mode=V2DImode, type=0x7ffff68280a8, misalignment=-1, is_packed=true) at /space/rguenther/tramp3d/trunk/gcc/config/arm/arm.c:27218 27218 if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) 27217 { 27218 if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) 27219 { 27220 HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type); 27221 27222 if (is_packed) (gdb) p unaligned_access No symbol "unaligned_access" in current context. (gdb) n 27220 HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type); (gdb) 27222 if (is_packed) (gdb) 27223 return align == 1; oops. That's obviously buggy. Not sure what the ARM hook expects in 'type', but appearantly it's not what the vectorizer passes in here. Docs say This hook should return true if the target supports misaligned vector\n\ store/load of a specific factor denoted in the @var{misalignment}\n\ parameter. The vector store/load should be of machine mode @var{mode} and\n\ the elements in the vectors should be of type @var{type}. @var{is_packed}\n\ parameter is true if the memory access is defined in a packed struct. but the arm hook seems to say "yes, I can vectorize with V2DImode for a vector with element type unsigned long long __attribute__((aligned(1)))" which it obviously cannot. IMHO the hook should compare required mode alignment with type alignment. This seems to be the arm hook trying to be too clever. -> target bug.
Suggested fix: Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 226348) +++ gcc/config/arm/arm.c (working copy) @@ -27217,7 +27217,8 @@ arm_builtin_support_vector_misalignment { if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) { - HOST_WIDE_INT align = TYPE_ALIGN_UNIT (type); + HOST_WIDE_INT align + = GET_MODE_ALIGNMENT (TYPE_MODE (type)) / BITS_PER_UNIT; if (is_packed) return align == 1; which produces push {r4} add r3, r0, #8 add r4, r1, #8 vld1.64 {d17}, [r1] vld1.64 {d16}, [r4] ldr r4, [sp], #4 vld1.64 {d19}, [r3] veor d16, d16, d19 vld1.64 {d18}, [r0] veor d17, d17, d18 vst1.64 {d17}, [r2]! vst1.64 {d16}, [r2] bx lr
> but the arm hook seems to say "yes, I can vectorize with V2DImode > for a vector with element type unsigned long long __attribute__((aligned(1)))" > which it obviously cannot. It can vectorize for this case, but the loads must not be marked as aligned loads. That is, the mid-end must use the misaligned code hooks. So I don't think this is a target bug.
Mine.
On Thu, 30 Jul 2015, rearnsha at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66917 > > Richard Earnshaw <rearnsha at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Component|target |tree-optimization > > --- Comment #12 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > but the arm hook seems to say "yes, I can vectorize with V2DImode > > for a vector with element type unsigned long long __attribute__((aligned(1)))" > > which it obviously cannot. > > It can vectorize for this case, but the loads must not be marked as aligned > loads. That is, the mid-end must use the misaligned code hooks. > > So I don't think this is a target bug. Hmm, indeed. It's a vectorizer bug that always communicates at least element alignment (thus doesn't properly code-gen for the packed_p case)
For both loads and stores we do else if (DR_MISALIGNMENT (first_dr) == -1) { TREE_TYPE (data_ref) = build_aligned_type (TREE_TYPE (data_ref), TYPE_ALIGN (elem_type)); align = TYPE_ALIGN_UNIT (elem_type); misalign = 0; but TYPE_ALIGN (elem_type) doesn't always hold.
Created attachment 36099 [details] patch I am testing the attached (on x86_64-linux), it fixes the testcase with a cross to arm. Full testing on arm appreciated.
(In reply to Richard Biener from comment #16) > Created attachment 36099 [details] > patch > > I am testing the attached (on x86_64-linux), it fixes the testcase with a > cross to arm. Full testing on arm appreciated. I'll give it a test run
(In reply to ktkachov from comment #17) > (In reply to Richard Biener from comment #16) > > Created attachment 36099 [details] > > patch > > > > I am testing the attached (on x86_64-linux), it fixes the testcase with a > > cross to arm. Full testing on arm appreciated. > > I'll give it a test run A bootstrap and make check on arm-none-linux-gnueabihf looks fine
Fixed on trunk sofar.
Author: rguenth Date: Mon Aug 3 07:13:36 2015 New Revision: 226487 URL: https://gcc.gnu.org/viewcvs?rev=226487&root=gcc&view=rev Log: 2015-08-03 Richard Biener <rguenther@suse.de> PR tree-optimization/66917 * tree-vectorizer.h (struct dataref_aux): Add base_element_aligned field. (DR_VECT_AUX): New macro. (set_dr_misalignment): Adjust. (dr_misalignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Compute whether the base is at least element aligned. * tree-vect-stmts.c (ensure_base_align): Adjust. (vectorizable_store): If the base is not element aligned preserve alignment of the original access if misalignment is unknown. (vectorizable_load): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-data-refs.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.h
Author: ctice Date: Thu Sep 24 20:56:29 2015 New Revision: 228099 URL: https://gcc.gnu.org/viewcvs?rev=228099&root=gcc&view=rev Log: Backport r226487 from GCC FSF mainline: r226487 | rguenth | 2015-08-03 00:13:36 -0700 (Mon, 03 Aug 2015) | 15 lines 2015-08-03 Richard Biener <rguenther@suse.de> PR tree-optimization/66917 * tree-vectorizer.h (struct dataref_aux): Add base_element_aligned field. (DR_VECT_AUX): New macro. (set_dr_misalignment): Adjust. (dr_misalignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Compute whether the base is at least element aligned. * tree-vect-stmts.c (ensure_base_align): Adjust. (vectorizable_store): If the base is not element aligned preserve alignment of the original access if misalignment is unknown. (vectorizable_load): Likewise. Modified: branches/google/gcc-4_9-mobile/gcc/tree-vect-data-refs.c branches/google/gcc-4_9-mobile/gcc/tree-vect-stmts.c branches/google/gcc-4_9-mobile/gcc/tree-vectorizer.h
Author: rguenth Date: Mon Sep 28 10:45:55 2015 New Revision: 228199 URL: https://gcc.gnu.org/viewcvs?rev=228199&root=gcc&view=rev Log: 2015-09-28 Richard Biener <rguenther@suse.de> Backport from mainline 2015-08-03 Richard Biener <rguenther@suse.de> PR tree-optimization/66917 * tree-vectorizer.h (struct dataref_aux): Add base_element_aligned field. (DR_VECT_AUX): New macro. (set_dr_misalignment): Adjust. (dr_misalignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Compute whether the base is at least element aligned. * tree-vect-stmts.c (ensure_base_align): Adjust. (vectorizable_store): If the base is not element aligned preserve alignment of the original access if misalignment is unknown. (vectorizable_load): Likewise. 2015-09-16 Richard Biener <rguenther@suse.de> PR middle-end/67442 * fold-const.c (extract_muldiv_1): Properly extend multiplication result before builting a tree via wide_int_to_tree. * gcc.dg/torture/pr67442.c: New testcase. Added: branches/gcc-5-branch/gcc/testsuite/gcc.dg/torture/pr67442.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/fold-const.c branches/gcc-5-branch/gcc/testsuite/ChangeLog branches/gcc-5-branch/gcc/tree-vect-data-refs.c branches/gcc-5-branch/gcc/tree-vect-stmts.c branches/gcc-5-branch/gcc/tree-vectorizer.h
Any plans to backport to 4.9, or should this be closed?
Fixed in GCC 5.3+