Bug 57271

Summary: ARM: gcc generates insufficient alignment for memory passed as extra argument for function return large composite type
Product: gcc Reporter: java4ada
Component: c++Assignee: Not yet assigned to anyone <unassigned>
Status: WAITING ---    
Severity: major CC: jackie.rosen
Priority: P3    
Version: 4.8.1   
Target Milestone: ---   
Host: Target: arm
Build: Known to work:
Known to fail: Last reconfirmed: 2013-05-14 00:00:00
Attachments: Testcase and output

Description java4ada 2013-05-14 10:52:53 UTC
Created attachment 30109 [details]
Testcase and output

Please find enclosed input Vector4.ii and Vector4.s compiled with "./xgcc -fpic  -mfloat-abi=softfp -mthumb -Os -march=armv7-a -mfpu=neon -S Vector4.ii".

Because function initVector4() returns instance of Vector4 16-byte in size, GCC passes internal memory buffer as the first argument to hold the return value.  This is shown in Vector4.s line#54 "add r0,sp,#8", and the buffer is filled at line#33 "vst1.64 {d16-d17}, [r0:128]".  The 128-bit alignment hint is due to the fact that class Vector4 is declared to be 16-byte aligned.  Problem is, r0 may not be aligned to 16-byte if sp is 16-byte aligned, which results in crash at vst1.64 [:128].  It seems that GCC doesn't honor the alignment of internal memory buffer.

If Vector4 is declared to be 32-byte align, GCC generates extra code to ensure r0 is properly aligned.  I assume GCC should do it as low as 16-byte too.
Comment 1 Richard Biener 2013-05-14 10:55:23 UTC
What does the ABI say about incoming stack alignment?  What target did you
configure for?
Comment 2 java4ada 2013-05-14 11:03:35 UTC
I don't know if ABI dictates it but from observation the stack is aligned to 8-byte for the largest primitive type "double" (or long long). 

I configure it on Ubuntu 12.04 64-bit with the following:

 ~/m/gcc/gcc-4.8/configure \
    --prefix=/tmp/gcc/prefix --target=arm-linux-androideabi --host=x86_64-linux-gnu --build=x86_64-linux-gnu \
    --with-gnu-as --with-gnu-ld --enable-languages=c,c++ \
    --with-gmp=/tmp/gcc/temp-install \
    --with-mpfr=/tmp/gcc/temp-install \
    --with-mpc=/tmp/gcc/temp-install \
    --with-cloog=/tmp/gcc/temp-install \
    --with-isl=/tmp/gcc/temp-install \
    --with-ppl=/tmp/gcc/temp-install \
    --disable-ppl-version-check --disable-cloog-version-check --disable-isl-version-check \
    --enable-cloog-backend=isl \
    --with-host-libstdcxx="-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm" \
    --disable-libssp \
    --enable-threads \
    --disable-libmudflap --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared \
    --disable-tls --disable-libitm --disable-nls --disable-bootstrap --disable-libquadmath --disable-libsanitizer \
    --with-float=soft --with-fpu=vfp --with-arch=armv5te \
    --enable-target-optspace --enable-initfini-array \
    --with-sysroot=/tmp/gcc/prefix/sysroot \
    --enable-plugins --enable-libgomp \
    --enable-gold=default
Comment 3 Andrew Pinski 2013-05-14 18:27:22 UTC
(In reply to java4ada from comment #2)
> I don't know if ABI dictates it but from observation the stack is aligned to
> 8-byte for the largest primitive type "double" (or long long). 

I think this is wrong the alignment requirement for arm-eabi is 16 byte IIRC (so neon can be supported).
Comment 4 Richard Earnshaw 2013-05-14 21:36:49 UTC
The ARM EABI only requires 8-byte alignment, as does Neon.
Comment 5 java4ada 2013-05-14 22:36:30 UTC
NEON instructions like vst/vld [:128] and [:256] need 16-byte and 32-byte alignment, respectively.  Does it mean under ARM EABI both should be replaced with [:64] ? (Probably only at the cost of 1-2 cycle anyway)

What other ARM ABI we can configure to get higher alignment than 8-byte?
Comment 6 Jackie Rosen 2014-02-16 10:00:31 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Marked for reference. Resolved as fixed @bugzilla.