given the following simple test.c typedef unsigned long int uint64_t; typedef __Uint8x8_t uint8x8_t; typedef struct uint8x8x4_t { uint8x8_t val[4]; } uint8x8x4_t; __inline uint8x8_t bar (uint64_t __a) { return (uint8x8_t) __a; } uint8x8x4_t foo(uint8x8x4_t v1, uint8x8x4_t v2) { return (uint8x8x4_t){{bar(0), bar(0), bar(0), bar(0)}}; } on aarch64, compile it with "./cc1-aarch64 -std=c99 -Wall -O3 -ftree-vectorizer-verbose=3" foo: movi v0.2s, 0 sub sp, sp, #128 <== useless stack adjustment add sp, sp, 128 <== useless stack adjustment ... There are useless stack adjustment. A quick investigation shows it's caued by we first decide to put the return value on stack, then later optimized them into registers, and all those store to stack are deleted by dse1, but stack space required kept in x_rtl->x_frame_offset is not updated accordingly. Although I run into this issue on AArch64, I highly suspect it's a generic issue when the type of return value is very complex. Has anyone run into this issue on other architecture like MIPS, PPC?
> On Apr 27, 2015, at 9:10 PM, jiwang at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org> wrote: > > Has anyone run into this issue on other architecture like MIPS, PPC? Yes on both.
Confirmed then.
Technically blocks 47562 as this is another intrinsics related issue.
This aarch64 testcase was fixed in GCC 6. There are already many more bug reports about this issue really, see PR 101926. So closing as fixed for GCC 6.