Created attachment 30109 [details] Testcase and output Please find enclosed input Vector4.ii and Vector4.s compiled with "./xgcc -fpic -mfloat-abi=softfp -mthumb -Os -march=armv7-a -mfpu=neon -S Vector4.ii". Because function initVector4() returns instance of Vector4 16-byte in size, GCC passes internal memory buffer as the first argument to hold the return value. This is shown in Vector4.s line#54 "add r0,sp,#8", and the buffer is filled at line#33 "vst1.64 {d16-d17}, [r0:128]". The 128-bit alignment hint is due to the fact that class Vector4 is declared to be 16-byte aligned. Problem is, r0 may not be aligned to 16-byte if sp is 16-byte aligned, which results in crash at vst1.64 [:128]. It seems that GCC doesn't honor the alignment of internal memory buffer. If Vector4 is declared to be 32-byte align, GCC generates extra code to ensure r0 is properly aligned. I assume GCC should do it as low as 16-byte too.
What does the ABI say about incoming stack alignment? What target did you configure for?
I don't know if ABI dictates it but from observation the stack is aligned to 8-byte for the largest primitive type "double" (or long long). I configure it on Ubuntu 12.04 64-bit with the following: ~/m/gcc/gcc-4.8/configure \ --prefix=/tmp/gcc/prefix --target=arm-linux-androideabi --host=x86_64-linux-gnu --build=x86_64-linux-gnu \ --with-gnu-as --with-gnu-ld --enable-languages=c,c++ \ --with-gmp=/tmp/gcc/temp-install \ --with-mpfr=/tmp/gcc/temp-install \ --with-mpc=/tmp/gcc/temp-install \ --with-cloog=/tmp/gcc/temp-install \ --with-isl=/tmp/gcc/temp-install \ --with-ppl=/tmp/gcc/temp-install \ --disable-ppl-version-check --disable-cloog-version-check --disable-isl-version-check \ --enable-cloog-backend=isl \ --with-host-libstdcxx="-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm" \ --disable-libssp \ --enable-threads \ --disable-libmudflap --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared \ --disable-tls --disable-libitm --disable-nls --disable-bootstrap --disable-libquadmath --disable-libsanitizer \ --with-float=soft --with-fpu=vfp --with-arch=armv5te \ --enable-target-optspace --enable-initfini-array \ --with-sysroot=/tmp/gcc/prefix/sysroot \ --enable-plugins --enable-libgomp \ --enable-gold=default
(In reply to java4ada from comment #2) > I don't know if ABI dictates it but from observation the stack is aligned to > 8-byte for the largest primitive type "double" (or long long). I think this is wrong the alignment requirement for arm-eabi is 16 byte IIRC (so neon can be supported).
The ARM EABI only requires 8-byte alignment, as does Neon.
NEON instructions like vst/vld [:128] and [:256] need 16-byte and 32-byte alignment, respectively. Does it mean under ARM EABI both should be replaced with [:64] ? (Probably only at the cost of 1-2 cycle anyway) What other ARM ABI we can configure to get higher alignment than 8-byte?
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Marked for reference. Resolved as fixed @bugzilla.
I've ran to the exactly the same issue on armv7-a (cortex-A8) target with trunk compiler and pr55073.C testcase: $ ~/install/gcc-trunk-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc ~/workspace/downloads/gcc/gcc/testsuite/gcc.target/arm/pr55073.C -O2 -mcpu=cortex-a8 -lm -o ./pr55073.exe $ ~/install/gcc-5.0-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc -v Using built-in specs. COLLECT_GCC=/home/max/install/gcc-trunk-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc COLLECT_LTO_WRAPPER=/home/max/install/gcc-trunk-arm/armv7l-tizen/libexec/gcc/armv7l-tizen-linux-gnueabi/6.0.0/lto-wrapper Target: armv7l-tizen-linux-gnueabi Configured with: /home/max/build/v6/sources/gcc_1/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=armv7l-tizen-linux-gnueabi --prefix=/home/max/install/gcc-trunk-arm/armv7l-tizen --with-sysroot=/home/max/install/gcc-trunk-arm/armv7l-tizen/armv7l-tizen-linux-gnueabi/sys-root --disable-libmudflap --enable-libssp --disable-nls --disable-libstdcxx-pch --disable-multilib --disable-gnu-unique-object --enable-linker-build-id --with-mode=arm --with-fpu=neon-vfpv4 --with-arch=armv7-a --with-tune=cortex-a15.cortex-a7 --with-float=softfp --enable-libgomp --enable-poison-system-directories --enable-long-long --enable-threads --enable-languages=c,c++,fortran --enable-shared --enable-lto --enable-symvers=gnu --enable-__cxa_atexit --with-gnu-as --with-gnu-ld --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' -- Thread model: posix gcc version 6.0.0 20150505 (experimental) Looking to gdb output: Breakpoint 1, 0x00008608 in test_func() () (gdb) disas Dump of assembler code for function _Z9test_funcv: 0x000085b4 <+0>: movw r3, #34480 ; 0x86b0 0x000085b8 <+4>: movt r3, #0 0x000085bc <+8>: add r2, r3, #16 0x000085c0 <+12>: vld1.8 {d20-d21}, [r3 :128] 0x000085c4 <+16>: vld1.8 {d16-d17}, [r2 :128] 0x000085c8 <+20>: vorr d17, d20, d20 0x000085cc <+24>: vorr d18, d16, d16 0x000085d0 <+28>: vzip.8 d16, d20 0x000085d4 <+32>: vzip.8 d17, d18 0x000085d8 <+36>: vorr d20, d16, d16 0x000085dc <+40>: vorr d18, d17, d17 0x000085e0 <+44>: vorr d22, d16, d16 0x000085e4 <+48>: vzip.8 d20, d18 0x000085e8 <+52>: vzip.8 d17, d22 0x000085ec <+56>: vorr d18, d20, d20 0x000085f0 <+60>: vmov.i16 q12, #2 ; 0x0002 0x000085f4 <+64>: vmovl.s8 q10, d17 0x000085f8 <+68>: vmovl.s8 q9, d18 0x000085fc <+72>: vsub.i16 q10, q10, q12 0x00008600 <+76>: vsub.i16 q8, q9, q12 0x00008604 <+80>: vadd.i16 q8, q10, q8 => 0x00008608 <+84>: vst1.64 {d16-d17}, [r0 :128] 0x0000860c <+88>: bx lr End of assembler dump. r0 0xbe3ffc58 3191864408 (gdb) vst1.64 wants r0 to be 128-bit aligned, but it's only 64-bit aligned here. Documentation from ARM info center (http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.ddi0344c/Cihejdic.html) points here for cortex-a8: "If an alignment qualifier is specified, a check is made for strict alignment based on the qualifier, independent of the A-bit setting." So, bus error seems to be expected here. On cortex-a15.cortex-a7 the same code executed without errors. So, I wonder, if it's a testcase problem and it should be disabled for CA8? Or maybe it is a compiler bug?
> So, I wonder, if it's a testcase problem and > it should be disabled for CA8? Perhaps add CA8 check in check_effective_target_arm_vect_no_misalign in gcc/testsuite/lib/target-supports.exp?
Add ARM maintainers.
I've encountered a similar problem, possibly related. See bug #66917 for a minimal C example program exhibiting the problem.
FYI, there's now a suggested fix for bug #66917. It would be interesting to know whether it fixes the problem reported here.
(In reply to Mark H Weaver from comment #11) > FYI, there's now a suggested fix for bug #66917. It would be interesting to > know whether it fixes the problem reported here. The suggested fix was applied and that bug has now been closed. Assuming that this one has been fixed, too.