57271 – ARM: gcc generates insufficient alignment for memory passed as extra argument for function return large composite type

Bug 57271 - ARM: gcc generates insufficient alignment for memory passed as extra argument for function return large composite type

Summary: ARM: gcc generates insufficient alignment for memory passed as extra argument...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	6.0

Importance:	P3 major
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-05-14 10:52 UTC by java4ada
Modified:	2017-08-03 14:44 UTC (History)
CC List:	1 user (show)

See Also:
Host:	x86_64-pc-linux-gnu
Target:	arm-linux-gnueabi
Build:
Known to work:
Known to fail:	6.0
Last reconfirmed:	2013-05-14 00:00:00

Attachments
Testcase and output (1.15 KB, application/zip) 2013-05-14 10:52 UTC, java4ada	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description java4ada 2013-05-14 10:52:53 UTC

Created attachment 30109 [details]
Testcase and output

Please find enclosed input Vector4.ii and Vector4.s compiled with "./xgcc -fpic  -mfloat-abi=softfp -mthumb -Os -march=armv7-a -mfpu=neon -S Vector4.ii".

Because function initVector4() returns instance of Vector4 16-byte in size, GCC passes internal memory buffer as the first argument to hold the return value.  This is shown in Vector4.s line#54 "add r0,sp,#8", and the buffer is filled at line#33 "vst1.64 {d16-d17}, [r0:128]".  The 128-bit alignment hint is due to the fact that class Vector4 is declared to be 16-byte aligned.  Problem is, r0 may not be aligned to 16-byte if sp is 16-byte aligned, which results in crash at vst1.64 [:128].  It seems that GCC doesn't honor the alignment of internal memory buffer.

If Vector4 is declared to be 32-byte align, GCC generates extra code to ensure r0 is properly aligned.  I assume GCC should do it as low as 16-byte too.

Comment 1 Richard Biener 2013-05-14 10:55:23 UTC

What does the ABI say about incoming stack alignment?  What target did you
configure for?

Comment 2 java4ada 2013-05-14 11:03:35 UTC

I don't know if ABI dictates it but from observation the stack is aligned to 8-byte for the largest primitive type "double" (or long long). 

I configure it on Ubuntu 12.04 64-bit with the following:

 ~/m/gcc/gcc-4.8/configure \
    --prefix=/tmp/gcc/prefix --target=arm-linux-androideabi --host=x86_64-linux-gnu --build=x86_64-linux-gnu \
    --with-gnu-as --with-gnu-ld --enable-languages=c,c++ \
    --with-gmp=/tmp/gcc/temp-install \
    --with-mpfr=/tmp/gcc/temp-install \
    --with-mpc=/tmp/gcc/temp-install \
    --with-cloog=/tmp/gcc/temp-install \
    --with-isl=/tmp/gcc/temp-install \
    --with-ppl=/tmp/gcc/temp-install \
    --disable-ppl-version-check --disable-cloog-version-check --disable-isl-version-check \
    --enable-cloog-backend=isl \
    --with-host-libstdcxx="-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm" \
    --disable-libssp \
    --enable-threads \
    --disable-libmudflap --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared \
    --disable-tls --disable-libitm --disable-nls --disable-bootstrap --disable-libquadmath --disable-libsanitizer \
    --with-float=soft --with-fpu=vfp --with-arch=armv5te \
    --enable-target-optspace --enable-initfini-array \
    --with-sysroot=/tmp/gcc/prefix/sysroot \
    --enable-plugins --enable-libgomp \
    --enable-gold=default

Comment 3 Andrew Pinski 2013-05-14 18:27:22 UTC

(In reply to java4ada from comment #2)
> I don't know if ABI dictates it but from observation the stack is aligned to
> 8-byte for the largest primitive type "double" (or long long). 

I think this is wrong the alignment requirement for arm-eabi is 16 byte IIRC (so neon can be supported).

Comment 4 Richard Earnshaw 2013-05-14 21:36:49 UTC

The ARM EABI only requires 8-byte alignment, as does Neon.

Comment 5 java4ada 2013-05-14 22:36:30 UTC

NEON instructions like vst/vld [:128] and [:256] need 16-byte and 32-byte alignment, respectively.  Does it mean under ARM EABI both should be replaced with [:64] ? (Probably only at the cost of 1-2 cycle anyway)

What other ARM ABI we can configure to get higher alignment than 8-byte?

Comment 6 Jackie Rosen 2014-02-16 10:00:31 UTC Comment hidden (spam)

*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Marked for reference. Resolved as fixed @bugzilla.

Comment 7 Maxim Ostapenko 2015-05-05 09:46:24 UTC

I've ran to the exactly the same issue on armv7-a (cortex-A8) target with trunk compiler and pr55073.C testcase:

$ ~/install/gcc-trunk-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc  ~/workspace/downloads/gcc/gcc/testsuite/gcc.target/arm/pr55073.C -O2 -mcpu=cortex-a8 -lm -o ./pr55073.exe

$ ~/install/gcc-5.0-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=/home/max/install/gcc-trunk-arm/armv7l-tizen/bin/armv7l-tizen-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/home/max/install/gcc-trunk-arm/armv7l-tizen/libexec/gcc/armv7l-tizen-linux-gnueabi/6.0.0/lto-wrapper
Target: armv7l-tizen-linux-gnueabi
Configured with: /home/max/build/v6/sources/gcc_1/configure 
--build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu
--target=armv7l-tizen-linux-gnueabi
--prefix=/home/max/install/gcc-trunk-arm/armv7l-tizen
--with-sysroot=/home/max/install/gcc-trunk-arm/armv7l-tizen/armv7l-tizen-linux-gnueabi/sys-root
--disable-libmudflap
--enable-libssp
--disable-nls
--disable-libstdcxx-pch
--disable-multilib
--disable-gnu-unique-object
--enable-linker-build-id
--with-mode=arm
--with-fpu=neon-vfpv4
--with-arch=armv7-a
--with-tune=cortex-a15.cortex-a7
--with-float=softfp
--enable-libgomp
--enable-poison-system-directories
--enable-long-long
--enable-threads
--enable-languages=c,c++,fortran
--enable-shared
--enable-lto
--enable-symvers=gnu
--enable-__cxa_atexit
--with-gnu-as
--with-gnu-ld
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --
Thread model: posix
gcc version 6.0.0 20150505 (experimental)

Looking to gdb output:

Breakpoint 1, 0x00008608 in test_func() ()
(gdb) disas
Dump of assembler code for function _Z9test_funcv:
   0x000085b4 <+0>:	movw	r3, #34480	; 0x86b0
   0x000085b8 <+4>:	movt	r3, #0
   0x000085bc <+8>:	add	r2, r3, #16
   0x000085c0 <+12>:	vld1.8	{d20-d21}, [r3 :128]
   0x000085c4 <+16>:	vld1.8	{d16-d17}, [r2 :128]
   0x000085c8 <+20>:	vorr	d17, d20, d20
   0x000085cc <+24>:	vorr	d18, d16, d16
   0x000085d0 <+28>:	vzip.8	d16, d20
   0x000085d4 <+32>:	vzip.8	d17, d18
   0x000085d8 <+36>:	vorr	d20, d16, d16
   0x000085dc <+40>:	vorr	d18, d17, d17
   0x000085e0 <+44>:	vorr	d22, d16, d16
   0x000085e4 <+48>:	vzip.8	d20, d18
   0x000085e8 <+52>:	vzip.8	d17, d22
   0x000085ec <+56>:	vorr	d18, d20, d20
   0x000085f0 <+60>:	vmov.i16	q12, #2	; 0x0002
   0x000085f4 <+64>:	vmovl.s8	q10, d17
   0x000085f8 <+68>:	vmovl.s8	q9, d18
   0x000085fc <+72>:	vsub.i16	q10, q10, q12
   0x00008600 <+76>:	vsub.i16	q8, q9, q12
   0x00008604 <+80>:	vadd.i16	q8, q10, q8
=> 0x00008608 <+84>:	vst1.64	{d16-d17}, [r0 :128]
   0x0000860c <+88>:	bx	lr
End of assembler dump.
r0             0xbe3ffc58	3191864408
(gdb) 

vst1.64 wants r0 to be 128-bit aligned, but it's only 64-bit aligned here.

Documentation from ARM info center (http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.ddi0344c/Cihejdic.html) points here for cortex-a8:

"If an alignment qualifier is specified, a check is made for strict alignment based on the qualifier, independent of the A-bit setting." So, bus error seems to be expected here.

On cortex-a15.cortex-a7 the same code executed without errors. So, I wonder, if it's a testcase problem and it should be disabled for CA8? Or maybe it is a compiler bug?

Comment 8 Yuri Gribov 2015-05-05 10:50:23 UTC

> So, I wonder, if it's a testcase problem and
> it should be disabled for CA8?

Perhaps add CA8 check in check_effective_target_arm_vect_no_misalign in gcc/testsuite/lib/target-supports.exp?

Comment 9 Maxim Ostapenko 2015-05-13 10:58:22 UTC

Add ARM maintainers.

Comment 10 Mark H Weaver 2015-07-17 22:24:05 UTC

I've encountered a similar problem, possibly related.  See bug #66917 for a minimal C example program exhibiting the problem.

Comment 11 Mark H Weaver 2015-07-30 13:58:09 UTC

FYI, there's now a suggested fix for bug #66917.  It would be interesting to know whether it fixes the problem reported here.

Comment 12 Eric Gallager 2017-08-03 14:44:29 UTC

(In reply to Mark H Weaver from comment #11)
> FYI, there's now a suggested fix for bug #66917.  It would be interesting to
> know whether it fixes the problem reported here.

The suggested fix was applied and that bug has now been closed. Assuming that this one has been fixed, too.