Bug 43872 - VLAs are not aligned correctly on ARM
Summary: VLAs are not aligned correctly on ARM
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-23 19:12 UTC by Mikael Pettersson
Modified: 2011-09-19 06:17 UTC (History)
3 users (show)

See Also:
Host:
Target: armv5tel-unknown-linux-gnueabi
Build:
Known to work:
Known to fail: 4.4.3, 4.5.0, 4.6.0
Last reconfirmed: 2010-04-23 21:31:24


Attachments
test case (177 bytes, text/plain)
2010-04-23 19:14 UTC, Mikael Pettersson
Details
gcc-4.6.0 -O2 -march=armv5te -S output (578 bytes, text/plain)
2010-04-23 19:15 UTC, Mikael Pettersson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikael Pettersson 2010-04-23 19:12:33 UTC
This test case is derived from gcc.c-torture/execute/920929-1.c. It creates a VLA of doubles and fills it with zeros:

> cat bad-vla-align.c
unsigned long mask = sizeof(double) - 1;

unsigned int __attribute__((noinline)) f(int n)
{
    double v[n];

    while (n > 0)
        v[--n] = 0;

    return (unsigned long)v & mask;
}

int main(void)
{
    if (f(100) != 0)
        __builtin_abort();
    return 0;
}

With -march=armv5te -O2 gcc uses STRD instructions to write 8 bytes at a time. STRD requires an 8-byte aligned address, but gcc fails to align the VLA to 8 bytes, resulting in misaligned accesses at runtime. Depending on hardware and kernel configuration, this may result in abnormal termination or slow but correct execution. On my Marvell Kirkwood machine, it results in EXTREMELY slow execution due to the high overhead of kernel fixups for alignment traps.

The reason for the misalignment can be seen in the assembly code:

> cat bad-vla-align.s
        .arch armv5te
...
f:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 1, uses_anonymous_args = 0
        @ link register save eliminated.
        str     fp, [sp, #-4]!

sp is 8-byte aligned on entry but not after the prologue.

        mov     r1, r0, asl #3
        add     r3, r1, #8

Both these offsets are multiples of 8.

        add     fp, sp, #0
        cmp     r0, #0
        sub     sp, sp, r3

Now sp == &v[0] is not 8-byte aligned.

        mov     ip, sp
        ble     .L2
        add     r1, sp, r1

r1 == &v[n] is not 8-byte aligned.

        mov     r2, #0
        mov     r3, #0
.L3:
        subs    r0, r0, #1
        strd    r2, [r1, #-8]!

r1 == &v[--n] is not 8-byte aligned so strd fails.
...

I can reproduce this failure with 4.6.0 (r158675) and the 4.5.0 and 4.4.3 releases. 4.3.4 and 4.2.4 appear to work, but I don't know if that is by design or by accident (different register allocation resulting in different frame layouts and prologues).

A version of the test case tried to perform the alignment check inside the f function, but gcc optimized it away completely, apparently "knowing" that the array address was a multiple of 8. Still another version tried to pass the array address to a separate checking function, but that changed f's prologue enough to mask the error.
Comment 1 Mikael Pettersson 2010-04-23 19:14:07 UTC
Created attachment 20476 [details]
test case
Comment 2 Mikael Pettersson 2010-04-23 19:15:09 UTC
Created attachment 20477 [details]
gcc-4.6.0 -O2 -march=armv5te -S output
Comment 3 Richard Earnshaw 2010-04-23 21:31:24 UTC
Confirmed on trunk.
Comment 4 Chung-Lin Tang 2011-03-17 14:02:15 UTC
Author: cltang
Date: Thu Mar 17 14:02:12 2011
New Revision: 171096

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171096
Log:
2011-03-17  Chung-Lin Tang  <cltang@codesourcery.com>

	PR target/43872
	* config/arm/arm.c (arm_get_frame_offsets): Adjust early
	return condition with !cfun->calls_alloca.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
Comment 5 Chung-Lin Tang 2011-04-23 08:36:28 UTC
This should be fixed now.
Comment 6 richard.earnshaw 2011-04-23 08:37:21 UTC
I shall be out of the office until Tuesday 3rd May and will not have email access during most of that time, so I will read your message when I return.  For urgent managerial issues please contact Roger Teague.  For internal engineering issues please contact Matthew Gretton-Dann.
Comment 7 jye2 2011-09-19 06:17:57 UTC
Author: jye2
Date: Mon Sep 19 06:17:45 2011
New Revision: 178953

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178953
Log:
2011-09-19  chengbin  <bin.cheng@arm.com>

	Backport r174035 from mainline
	2011-05-22  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/48689
	* fold-const.c (fold_checksum_tree): Guard TREE_CHAIN use with
	CODE_CONTAINS_STRUCT (TS_COMMON).

	Backport r172297 from mainline
	2011-04-11  Chung-Lin Tang  <cltang@codesourcery.com>
		Richard Earnshaw  <rearnsha@arm.com>

	PR target/48250
	* config/arm/arm.c (arm_legitimize_reload_address): Update cases
	to use sign-magnitude offsets. Reject unsupported unaligned
	cases. Add detailed description in comments.
	* config/arm/arm.md (reload_outdf): Disable for ARM mode; change
	condition from TARGET_32BIT to TARGET_ARM.

	Backport r171978 from mainline
	2011-04-05  Tom de Vries  <tom@codesourcery.com>

	PR target/43920
	* config/arm/arm.h (BRANCH_COST): Set to 1 for Thumb-2 when optimizing
	for size.

	Backport r171632 from mainline
	2011-03-28  Richard Sandiford  <richard.sandiford@linaro.org>

	* builtins.c (expand_builtin_memset_args): Use gen_int_mode
	instead of GEN_INT.

	Backport r171379 from mainline
	2011-03-23  Chung-Lin Tang  <cltang@codesourcery.com>

	PR target/46934
	* config/arm/arm.md (casesi): Use the gen_int_mode() function
	to subtract lower bound instead of GEN_INT().

	Backport r171251 from mainline 
	2011-03-21  Daniel Jacobowitz  <dan@codesourcery.com>

	* config/arm/unwind-arm.c (__gnu_unwind_pr_common): Correct test
	for barrier handlers.

	Backport r171096 from mainline
	2011-03-17  Chung-Lin Tang  <cltang@codesourcery.com>

	PR target/43872
	* config/arm/arm.c (arm_get_frame_offsets): Adjust early
	return condition with !cfun->calls_alloca.


Modified:
    branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm
    branches/ARM/embedded-4_6-branch/gcc/builtins.c
    branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.c
    branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.h
    branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.md
    branches/ARM/embedded-4_6-branch/gcc/config/arm/unwind-arm.c
    branches/ARM/embedded-4_6-branch/gcc/fold-const.c
    branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr40887.c
    branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr42575.c
    branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr43698.c
    branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr44788.c
    branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/sync-1.c