Bug 57271 - ARM: gcc generates insufficient alignment for memory passed as extra argument for function return large composite type
Summary: ARM: gcc generates insufficient alignment for memory passed as extra argument...
Status: WAITING
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.0
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-14 10:52 UTC by java4ada
Modified: 2015-07-30 13:58 UTC (History)
7 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: arm-linux-gnueabi
Build:
Known to work:
Known to fail: 6.0
Last reconfirmed: 2013-05-14 00:00:00


Attachments
Testcase and output (1.15 KB, application/zip)
2013-05-14 10:52 UTC, java4ada
Details

Note You need to log in before you can comment on or make changes to this bug.
Description java4ada 2013-05-14 10:52:53 UTC
Created attachment 30109 [details]
Testcase and output

Please find enclosed input Vector4.ii and Vector4.s compiled with "./xgcc -fpic  -mfloat-abi=softfp -mthumb -Os -march=armv7-a -mfpu=neon -S Vector4.ii".

Because function initVector4() returns instance of Vector4 16-byte in size, GCC passes internal memory buffer as the first argument to hold the return value.  This is shown in Vector4.s line#54 "add r0,sp,#8", and the buffer is filled at line#33 "vst1.64 {d16-d17}, [r0:128]".  The 128-bit alignment hint is due to the fact that class Vector4 is declared to be 16-byte aligned.  Problem is, r0 may not be aligned to 16-byte if sp is 16-byte aligned, which results in crash at vst1.64 [:128].  It seems that GCC doesn't honor the alignment of internal memory buffer.

If Vector4 is declared to be 32-byte align, GCC generates extra code to ensure r0 is properly aligned.  I assume GCC should do it as low as 16-byte too.
Comment 1 Richard Biener 2013-05-14 10:55:23 UTC
What does the ABI say about incoming stack alignment?  What target did you
configure for?
Comment 2 java4ada 2013-05-14 11:03:35 UTC
I don't know if ABI dictates it but from observation the stack is aligned to 8-byte for the largest primitive type "double" (or long long). 

I configure it on Ubuntu 12.04 64-bit with the following:

 ~/m/gcc/gcc-4.8/configure \
    --prefix=/tmp/gcc/prefix --target=arm-linux-androideabi --host=x86_64-linux-gnu --build=x86_64-linux-gnu \
    --with-gnu-as --with-gnu-ld --enable-languages=c,c++ \
    --with-gmp=/tmp/gcc/temp-install \
    --with-mpfr=/tmp/gcc/temp-install \
    --with-mpc=/tmp/gcc/temp-install \
    --with-cloog=/tmp/gcc/temp-install \
    --with-isl=/tmp/gcc/temp-install \
    --with-ppl=/tmp/gcc/temp-install \
    --disable-ppl-version-check --disable-cloog-version-check --disable-isl-version-check \
    --enable-cloog-backend=isl \
    --with-host-libstdcxx="-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm" \
    --disable-libssp \
    --enable-threads \
    --disable-libmudflap --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared \
    --disable-tls --disable-libitm --disable-nls --disable-bootstrap --disable-libquadmath --disable-libsanitizer \
    --with-float=soft --with-fpu=vfp --with-arch=armv5te \
    --enable-target-optspace --enable-initfini-array \
    --with-sysroot=/tmp/gcc/prefix/sysroot \
    --enable-plugins --enable-libgomp \
    --enable-gold=default
Comment 3 Andrew Pinski 2013-05-14 18:27:22 UTC
(In reply to java4ada from comment #2)
> I don't know if ABI dictates it but from observation the stack is aligned to
> 8-byte for the largest primitive type "double" (or long long). 

I think this is wrong the alignment requirement for arm-eabi is 16 byte IIRC (so neon can be supported).
Comment 4 Richard Earnshaw 2013-05-14 21:36:49 UTC
The ARM EABI only requires 8-byte alignment, as does Neon.
Comment 5 java4ada 2013-05-14 22:36:30 UTC
NEON instructions like vst/vld [:128] and [:256] need 16-byte and 32-byte alignment, respectively.  Does it mean under ARM EABI both should be replaced with [:64] ? (Probably only at the cost of 1-2 cycle anyway)

What other ARM ABI we can configure to get higher alignment than 8-byte?
Comment 6 Jackie Rosen 2014-02-16 10:00:31 UTC Comment hidden (spam)
Comment 7 Maxim Ostapenko 2015-05-05 09:46:24 UTC
I've ran to the exactly the same issue on armv7-a (cortex-A8) target with trunk compiler and pr55073.C testcase:

$ ~/install/gcc-trunk-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc  ~/workspace/downloads/gcc/gcc/testsuite/gcc.target/arm/pr55073.C -O2 -mcpu=cortex-a8 -lm -o ./pr55073.exe

$ ~/install/gcc-5.0-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=/home/max/install/gcc-trunk-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/home/max/install/gcc-trunk-arm/armv7l-tizen/libexec/gcc/armv7l-tizen-linux-gnueabi/6.0.0/lto-wrapper
Target: armv7l-tizen-linux-gnueabi
Configured with: /home/max/build/v6/sources/gcc_1/configure 
--build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu
--target=armv7l-tizen-linux-gnueabi
--prefix=/home/max/install/gcc-trunk-arm/armv7l-tizen
--with-sysroot=/home/max/install/gcc-trunk-arm/armv7l-tizen/armv7l-tizen-linux-gnueabi/sys-root
--disable-libmudflap
--enable-libssp
--disable-nls
--disable-libstdcxx-pch
--disable-multilib
--disable-gnu-unique-object
--enable-linker-build-id
--with-mode=arm
--with-fpu=neon-vfpv4
--with-arch=armv7-a
--with-tune=cortex-a15.cortex-a7
--with-float=softfp
--enable-libgomp
--enable-poison-system-directories
--enable-long-long
--enable-threads
--enable-languages=c,c++,fortran
--enable-shared
--enable-lto
--enable-symvers=gnu
--enable-__cxa_atexit
--with-gnu-as
--with-gnu-ld
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --
Thread model: posix
gcc version 6.0.0 20150505 (experimental)

Looking to gdb output:

Breakpoint 1, 0x00008608 in test_func() ()
(gdb) disas
Dump of assembler code for function _Z9test_funcv:
   0x000085b4 <+0>:	movw	r3, #34480	; 0x86b0
   0x000085b8 <+4>:	movt	r3, #0
   0x000085bc <+8>:	add	r2, r3, #16
   0x000085c0 <+12>:	vld1.8	{d20-d21}, [r3 :128]
   0x000085c4 <+16>:	vld1.8	{d16-d17}, [r2 :128]
   0x000085c8 <+20>:	vorr	d17, d20, d20
   0x000085cc <+24>:	vorr	d18, d16, d16
   0x000085d0 <+28>:	vzip.8	d16, d20
   0x000085d4 <+32>:	vzip.8	d17, d18
   0x000085d8 <+36>:	vorr	d20, d16, d16
   0x000085dc <+40>:	vorr	d18, d17, d17
   0x000085e0 <+44>:	vorr	d22, d16, d16
   0x000085e4 <+48>:	vzip.8	d20, d18
   0x000085e8 <+52>:	vzip.8	d17, d22
   0x000085ec <+56>:	vorr	d18, d20, d20
   0x000085f0 <+60>:	vmov.i16	q12, #2	; 0x0002
   0x000085f4 <+64>:	vmovl.s8	q10, d17
   0x000085f8 <+68>:	vmovl.s8	q9, d18
   0x000085fc <+72>:	vsub.i16	q10, q10, q12
   0x00008600 <+76>:	vsub.i16	q8, q9, q12
   0x00008604 <+80>:	vadd.i16	q8, q10, q8
=> 0x00008608 <+84>:	vst1.64	{d16-d17}, [r0 :128]
   0x0000860c <+88>:	bx	lr
End of assembler dump.
r0             0xbe3ffc58	3191864408
(gdb) 

vst1.64 wants r0 to be 128-bit aligned, but it's only 64-bit aligned here.

Documentation from ARM info center (http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.ddi0344c/Cihejdic.html) points here for cortex-a8:

"If an alignment qualifier is specified, a check is made for strict alignment based on the qualifier, independent of the A-bit setting." So, bus error seems to be expected here.

On cortex-a15.cortex-a7 the same code executed without errors. So, I wonder, if it's a testcase problem and it should be disabled for CA8? Or maybe it is a compiler bug?
Comment 8 Yuri Gribov 2015-05-05 10:50:23 UTC
> So, I wonder, if it's a testcase problem and
> it should be disabled for CA8?

Perhaps add CA8 check in check_effective_target_arm_vect_no_misalign in gcc/testsuite/lib/target-supports.exp?
Comment 9 Maxim Ostapenko 2015-05-13 10:58:22 UTC
Add ARM maintainers.
Comment 10 Mark H Weaver 2015-07-17 22:24:05 UTC
I've encountered a similar problem, possibly related.  See bug #66917 for a minimal C example program exhibiting the problem.
Comment 11 Mark H Weaver 2015-07-30 13:58:09 UTC
FYI, there's now a suggested fix for bug #66917.  It would be interesting to know whether it fixes the problem reported here.