This test case is derived from gcc.c-torture/execute/920929-1.c. It creates a VLA of doubles and fills it with zeros: > cat bad-vla-align.c unsigned long mask = sizeof(double) - 1; unsigned int __attribute__((noinline)) f(int n) { double v[n]; while (n > 0) v[--n] = 0; return (unsigned long)v & mask; } int main(void) { if (f(100) != 0) __builtin_abort(); return 0; } With -march=armv5te -O2 gcc uses STRD instructions to write 8 bytes at a time. STRD requires an 8-byte aligned address, but gcc fails to align the VLA to 8 bytes, resulting in misaligned accesses at runtime. Depending on hardware and kernel configuration, this may result in abnormal termination or slow but correct execution. On my Marvell Kirkwood machine, it results in EXTREMELY slow execution due to the high overhead of kernel fixups for alignment traps. The reason for the misalignment can be seen in the assembly code: > cat bad-vla-align.s .arch armv5te ... f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 @ link register save eliminated. str fp, [sp, #-4]! sp is 8-byte aligned on entry but not after the prologue. mov r1, r0, asl #3 add r3, r1, #8 Both these offsets are multiples of 8. add fp, sp, #0 cmp r0, #0 sub sp, sp, r3 Now sp == &v[0] is not 8-byte aligned. mov ip, sp ble .L2 add r1, sp, r1 r1 == &v[n] is not 8-byte aligned. mov r2, #0 mov r3, #0 .L3: subs r0, r0, #1 strd r2, [r1, #-8]! r1 == &v[--n] is not 8-byte aligned so strd fails. ... I can reproduce this failure with 4.6.0 (r158675) and the 4.5.0 and 4.4.3 releases. 4.3.4 and 4.2.4 appear to work, but I don't know if that is by design or by accident (different register allocation resulting in different frame layouts and prologues). A version of the test case tried to perform the alignment check inside the f function, but gcc optimized it away completely, apparently "knowing" that the array address was a multiple of 8. Still another version tried to pass the array address to a separate checking function, but that changed f's prologue enough to mask the error.
Created attachment 20476 [details] test case
Created attachment 20477 [details] gcc-4.6.0 -O2 -march=armv5te -S output
Confirmed on trunk.
Author: cltang Date: Thu Mar 17 14:02:12 2011 New Revision: 171096 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171096 Log: 2011-03-17 Chung-Lin Tang <cltang@codesourcery.com> PR target/43872 * config/arm/arm.c (arm_get_frame_offsets): Adjust early return condition with !cfun->calls_alloca. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c
This should be fixed now.
I shall be out of the office until Tuesday 3rd May and will not have email access during most of that time, so I will read your message when I return. For urgent managerial issues please contact Roger Teague. For internal engineering issues please contact Matthew Gretton-Dann.
Author: jye2 Date: Mon Sep 19 06:17:45 2011 New Revision: 178953 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178953 Log: 2011-09-19 chengbin <bin.cheng@arm.com> Backport r174035 from mainline 2011-05-22 Tom de Vries <tom@codesourcery.com> PR middle-end/48689 * fold-const.c (fold_checksum_tree): Guard TREE_CHAIN use with CODE_CONTAINS_STRUCT (TS_COMMON). Backport r172297 from mainline 2011-04-11 Chung-Lin Tang <cltang@codesourcery.com> Richard Earnshaw <rearnsha@arm.com> PR target/48250 * config/arm/arm.c (arm_legitimize_reload_address): Update cases to use sign-magnitude offsets. Reject unsupported unaligned cases. Add detailed description in comments. * config/arm/arm.md (reload_outdf): Disable for ARM mode; change condition from TARGET_32BIT to TARGET_ARM. Backport r171978 from mainline 2011-04-05 Tom de Vries <tom@codesourcery.com> PR target/43920 * config/arm/arm.h (BRANCH_COST): Set to 1 for Thumb-2 when optimizing for size. Backport r171632 from mainline 2011-03-28 Richard Sandiford <richard.sandiford@linaro.org> * builtins.c (expand_builtin_memset_args): Use gen_int_mode instead of GEN_INT. Backport r171379 from mainline 2011-03-23 Chung-Lin Tang <cltang@codesourcery.com> PR target/46934 * config/arm/arm.md (casesi): Use the gen_int_mode() function to subtract lower bound instead of GEN_INT(). Backport r171251 from mainline 2011-03-21 Daniel Jacobowitz <dan@codesourcery.com> * config/arm/unwind-arm.c (__gnu_unwind_pr_common): Correct test for barrier handlers. Backport r171096 from mainline 2011-03-17 Chung-Lin Tang <cltang@codesourcery.com> PR target/43872 * config/arm/arm.c (arm_get_frame_offsets): Adjust early return condition with !cfun->calls_alloca. Modified: branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/builtins.c branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.c branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.h branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.md branches/ARM/embedded-4_6-branch/gcc/config/arm/unwind-arm.c branches/ARM/embedded-4_6-branch/gcc/fold-const.c branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr40887.c branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr42575.c branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr43698.c branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/pr44788.c branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/sync-1.c