This testcase crashes when compiled with gcc 4.4 -m32 -msse -O2, but not when compiled with 4.3 or 4.5. It doesn't crash when compiled as -m64. /* compile me with -O2 -msse -m32 */ #include <string.h> typedef float v4sf __attribute__ ((vector_size (16))); int main(void) { float m4[4]; v4sf m3; float m2[4] = {4, 3, 2, 1}; v4sf one2 = {5, 15, 25, 35}; memcpy(&m3, &m2, sizeof(m2)); m3 = m3 * one2; memcpy(&m4, &m3, sizeof(m4)); if (m4[0] != 20) return 1; return 0; } $ gcc-4.4 bug.c -m32 -O2 -msse $ ./a.out Segmentation fault GDB shows it crashes on the SSE instructions, probably wrong alignment: Dump of assembler code for function main: 0x080483a0 <+0>: push %ebp 0x080483a1 <+1>: mov %esp,%ebp 0x080483a3 <+3>: sub $0x14,%esp 0x080483a6 <+6>: movaps 0x80484d0,%xmm0 0x080483ad <+13>: movl $0x40800000,0x4(%esp) 0x080483b5 <+21>: movl $0x40400000,0x8(%esp) 0x080483bd <+29>: flds 0x80484e0 0x080483c3 <+35>: movl $0x40000000,0xc(%esp) 0x080483cb <+43>: movl $0x3f800000,0x10(%esp) => 0x080483d3 <+51>: mulps 0x4(%esp),%xmm0 0x080483d8 <+56>: movss %xmm0,(%esp) 0x080483dd <+61>: flds (%esp) 0x080483e0 <+64>: fucomip %st(1),%st 0x080483e2 <+66>: fstp %st(0) 0x080483e4 <+68>: setne %al 0x080483e7 <+71>: setp %dl 0x080483ea <+74>: or %edx,%eax 0x080483ec <+76>: movzbl %al,%eax 0x080483ef <+79>: leave 0x080483f0 <+80>: ret End of assembler dump. gcc -v output: Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.4-14' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.5 20100909 (prerelease) (Debian 4.4.4-14) My libc version is this (should it matter): GNU C Library (Debian EGLIBC 2.11.2-5) stable release version 2.11.2, by Roland McGrath et al. Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.4.5 20100824 (prerelease). Compiled on a Linux 2.6.32 system on 2010-09-03. Available extensions: crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B For bug reporting instructions, please see: <http://www.debian.org/Bugs/>.
No longer crashes in 4.5/4.6 since r147980, aka SRA rewrite. The only difference in *.optimized is though that before r147980 we have: m3.3_3 = VIEW_CONVERT_EXPR<v4sf>(m2); m3.5_4 = m3.3_3 * { 5.0e+0, 1.5e+1, 2.5e+1, 3.5e+1 }; m4$0_5 = VIEW_CONVERT_EXPR<float[4]>(m3.5_4)[0]; if (m4$0_5 != 2.0e+1) and after it: m3.3_7 = VIEW_CONVERT_EXPR<v4sf>(m2); m3.5_8 = m3.3_7 * { 5.0e+0, 1.5e+1, 2.5e+1, 3.5e+1 }; m4 = VIEW_CONVERT_EXPR<float[4]>(m3.5_8); m4$0_22 = m4[0]; if (m4$0_22 != 2.0e+1) where m2 (and in the second case m4 as well) are float [4] arrays, m3 is v4sf and m4$0 is float. This doesn't look like a fix for the case that the VCE<v4sf>(m2) load during expansion assumes m2 is 128-bits aligned when it is not.
Actually typedef float V __attribute__ ((vector_size (16))); V g; int main () { float d[4] = { 4, 3, 2, 1 }; V e; __builtin_memcpy (&e, &d, sizeof (d)); V f = { 5, 15, 25, 35 }; e = e * f; g = e; return 0; } segfaults even with 4.5/4.6 at -O2 -m32 -msse2.
DECL_ALIGN of d is set to 128 (but appearantly it isn't ensured it'll end up that way). DECL_ALIGN is adjusted here: Old value = 32 New value = 128 expand_one_stack_var_at (decl=0x7ffff5ae90a0, offset=-16) at /space/rguenther/src/svn/trunk/gcc/cfgexpand.c:739 739 DECL_USER_ALIGN (decl) = 0; so on trunk get_object_alignment of the MEM_REF will return 128 and thus we do not run into unaligned move expansion here: if (mode != BLKmode && (unsigned) align < GET_MODE_ALIGNMENT (mode) /* If the target does not have special handling for unaligned loads of mode then it can use regular moves for them. */ && ((icode = optab_handler (movmisalign_optab, mode)) != CODE_FOR_nothing)) manually setting alignment back to 32 in gdb results in ok asm. movlps (%esp), %xmm0 movhps 8(%esp), %xmm0 mulps .LC4, %xmm0 instead of mulps (%esp), %xmm0 Appearantly stack alignment code doesn't work.
Created attachment 21809 [details] patch to fix "half STRICT_ALIGNMENT" targets memcpy folding Might need this patch to fix as well. i?86 / x86_64 isn't really !STRICT_ALIGNMENT.
Re: #c4, shouldn't there be srcvar = NULL_TREE; somewhere for the STRICT_ALIGNMENT non-aligned case?
Missing some else indeed.
For the ix86/x86_64 alignment issue, I believe the problem here is that max_align = MAX (crtl->max_used_stack_slot_alignment, PREFERRED_STACK_BOUNDARY); is fine for !SUPPORTS_STACK_ALIGNMENT targets, but for ix86/x86_64 if max_used_stack_slot_alignment is really small, we might end up with deciding to use a smaller alignment. Perhaps for SUPPORTS_STACK_ALIGNMENT we should use here instead max_align = MAX (crtl->max_used_stack_slot_alignment, INCOMING_STACK_BOUNDARY); and if the align we compute is bigger than crtl->max_used_stack_slot_alignment ensure we will keep using that alignment (e.g. by bumping also crtl->stack_align_needed/estimated). If INCOMING_STACK_BOUNDARY is 128 bits aligned, I think using 128 bit alignment shouldn't cost us anything extra. The problem with using INCOMING_STACK_BOUNDARY is that it is on ix86/x86-64 computed only too late (in expand_stack_alignment by targetm.calls.update_stack_boundary (); ). The comment above it says it is computed again, but I can't actually find another call.
This also failed: --- typedef float V __attribute__ ((vector_size (16))); V g; float d[4] = { 4, 3, 2, 1 }; int main () { V e; __builtin_memcpy (&e, &d, sizeof (d)); V f = { 5, 15, 25, 35 }; e = e * f; g = e; return 0; } --- Program received signal SIGSEGV, Segmentation fault. 0x0804837e in main () at foo.c:11 11 e = e * f; Missing separate debuginfos, use: debuginfo-install glibc-2.12.1-2.0.f13.i686 (gdb) disass Dump of assembler code for function main: 0x08048374 <+0>: push %ebp 0x08048375 <+1>: mov %esp,%ebp 0x08048377 <+3>: movaps 0x8048470,%xmm0 => 0x0804837e <+10>: mulps 0x8049644,%xmm0 0x08048385 <+17>: movaps %xmm0,0x8049670 0x0804838c <+24>: mov $0x0,%eax 0x08048391 <+29>: pop %ebp 0x08048392 <+30>: ret End of assembler dump. (gdb) q There is no stack involved. Somehow we failed to align array of float properly.
If __builtin_memcpy generates instructions which require bigger alignment than alignments of source or destination, it should increase the alignment of source or destination.
When __builtin_memcpy increases the alignment of source or destination, it should update needed stack alignment if source or destination is on stack.
This code: if (TREE_CODE (srcvar) == ADDR_EXPR && var_decl_component_p (TREE_OPERAND (srcvar, 0)) && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len) && (!STRICT_ALIGNMENT || !destvar || src_align >= TYPE_ALIGN (desttype))) srcvar = fold_build2 (MEM_REF, destvar ? desttype : srctype, srcvar, off0); does float d[4]; __m128 *p = (__m128 *) &d; and treats p as properly aligned. I don't see how it can ever work with SSE. It has nothing to do with stack alignment.
(In reply to comment #4) > Created an attachment (id=21809) [edit] > patch to fix "half STRICT_ALIGNMENT" targets memcpy folding > > Might need this patch to fix as well. i?86 / x86_64 isn't really > !STRICT_ALIGNMENT. > We need a HARD_ALIGNMENT which depends on type for x86.
(In reply to comment #12) > (In reply to comment #4) > > Created an attachment (id=21809) [edit] > > patch to fix "half STRICT_ALIGNMENT" targets memcpy folding > > > > Might need this patch to fix as well. i?86 / x86_64 isn't really > > !STRICT_ALIGNMENT. > > > > We need a HARD_ALIGNMENT which depends on type for x86. With that patch the assignment generated from memcpy doesn't need more that int alignment, but still cfgexpand.c sets DECL_ALIGN of the decl to 128 so expand uses aligned instructions. cfgexpand.c should not increase alignment and not set 'needs stack alignment' then, based on your comment #10. So this _is_ about stack alignment (but maybe not exclusively).
The reason why cfgexpand does increase the alignment is that it believes that the base slot will be at least PREFERRED_STACK_BOUNDARY bytes aligned, which is true on all targets but i?86/x86-64, which apparently sometimes chooses even smaller alignment for the stack base. So, we can either use there MAX (..., STACK_BOUNDARY); for STACK_ALIGNMENT_SUPPORTED instead, which might penalize some code though, or use INCOMING_STACK_BOUNDARY there (after making sure we compute it before) and bump needed alignment to whatever we pick there up. During expansion expanders of course make use of the DECL_ALIGN info cfgexpand provides, after all that's why we do that.
Created attachment 21810 [details] A patch This patch adds HARD_ALIGNMENT_MODE_P and works for me.
(In reply to comment #13) > With that patch the assignment generated from memcpy doesn't need more > that int alignment, but still cfgexpand.c sets DECL_ALIGN of the > decl to 128 so expand uses aligned instructions. > > cfgexpand.c should not increase alignment and not set 'needs stack > alignment' then, based on your comment #10. So this _is_ about > stack alignment (but maybe not exclusively). > When we do float d[4]; __m128 *p = (__m128 *) &d; all bets are off.
That's true. But many expanders can make use of DECL_ALIGN information, e.g. to choose faster code. If cfgexpand keeps doing what it does now, namely bumping DECL_ALIGN of variables up to PREFERRED_STACK_BOUNDARY even when in the end the stack block doesn't end up being aligned that way, then it lies to the expander and that will hit us again and again. On x86-64/i686, I don't think we want to prevent memcpy folding as your patch does, at least not for CPUs where movu* is fast.
(In reply to comment #16) > (In reply to comment #13) > > > With that patch the assignment generated from memcpy doesn't need more > > that int alignment, but still cfgexpand.c sets DECL_ALIGN of the > > decl to 128 so expand uses aligned instructions. > > > > cfgexpand.c should not increase alignment and not set 'needs stack > > alignment' then, based on your comment #10. So this _is_ about > > stack alignment (but maybe not exclusively). > > > > When we do > > float d[4]; > __m128 *p = (__m128 *) &d; > > > all bets are off. ?
(In reply to comment #17) > That's true. But many expanders can make use of DECL_ALIGN information, e.g. > to choose faster code. If cfgexpand keeps doing what it does now, namely > bumping DECL_ALIGN of variables up to PREFERRED_STACK_BOUNDARY even when in the > end the stack block doesn't end up being aligned that way, then it lies to the > expander The problem isn't limited to stack. > and that will hit us again and again. On x86-64/i686, I don't think we want to > prevent memcpy folding as your patch does, at least not for CPUs where movu* is > fast. That is true. Whatever we do, we can't lie about alignment, on stack or not. Once we fix that, the rest shouldn't be too hard to fix.
The patch in comment #4 makes memcpy folding not lie about alignment. cfgexpand still lies about alignment though.
(In reply to comment #20) > The patch in comment #4 makes memcpy folding not lie about alignment. X86 only cares about alignment for vector modes. Can we combine 2 patches into one? > cfgexpand still lies about alignment though. > Let's open a new bug and fix it separately.
Subject: Bug 45678 Author: rguenth Date: Fri Sep 17 09:00:23 2010 New Revision: 164356 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164356 Log: 2010-09-17 Richard Guenther <rguenther@suse.de> PR middle-end/45678 * builtins.c (fold_builtin_memory_op): Always properly adjust alignment of memory accesses. Modified: trunk/gcc/ChangeLog trunk/gcc/builtins.c
Subject: Bug 45678 Author: rguenth Date: Fri Sep 17 13:57:04 2010 New Revision: 164369 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164369 Log: 2010-09-17 Richard Guenther <rguenther@suse.de> PR middle-end/45678 * gcc.dg/torture/pr45678-1.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr45678-1.c Modified: trunk/gcc/testsuite/ChangeLog
Created attachment 21821 [details] A patch The problem is we failed to update stack alignment when we increase alignment of local variable. This patch works for me.
A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01425.html
Subject: Bug 45678 Author: hjl Date: Fri Sep 17 17:49:30 2010 New Revision: 164375 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164375 Log: Update stack alignment when increasing local variable alignment. gcc/ 2010-09-17 H.J. Lu <hongjiu.lu@intel.com> PR middle-end/45678 * cfgexpand.c (update_stack_alignment): New. (get_decl_align_unit): Use it. (expand_one_stack_var_at): Call update_stack_alignment. gcc/testsuite/ 2010-09-17 H.J. Lu <hongjiu.lu@intel.com> PR middle-end/45678 * gcc.dg/torture/pr45678-2.c: New. Added: trunk/gcc/testsuite/gcc.dg/torture/pr45678-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/testsuite/ChangeLog
Are 4.4 and 4.5 going to be fixed?
Subject: Bug 45678 Author: jakub Date: Mon Sep 20 20:37:10 2010 New Revision: 164454 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164454 Log: PR middle-end/45678 * cfgexpand.c (expand_one_stack_var_at): Use crtl->max_used_stack_slot_alignment as max_align, instead of maximum of that and PREFERRED_STACK_BOUNDARY. Don't call update_stack_alignment. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c
Subject: Bug 45678 Author: jakub Date: Tue Sep 21 14:18:34 2010 New Revision: 164480 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164480 Log: PR middle-end/45678 * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If op0 isn't sufficiently aligned and there is movmisalignM insn for mode, use it to load op0 into a temporary register. Backport from mainline 2010-09-20 Jakub Jelinek <jakub@redhat.com> PR middle-end/45678 * cfgexpand.c (expand_one_stack_var_at): Limit alignment to crtl->max_used_stack_slot_alignment. Backport from mainline 2010-09-17 Richard Guenther <rguenther@suse.de> H.J. Lu <hongjiu.lu@intel.com> PR middle-end/45678 * gcc.dg/torture/pr45678-1.c: New. * gcc.dg/torture/pr45678-2.c: Likewise. Added: branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/torture/pr45678-1.c branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/torture/pr45678-2.c Modified: branches/gcc-4_5-branch/gcc/ChangeLog branches/gcc-4_5-branch/gcc/cfgexpand.c branches/gcc-4_5-branch/gcc/expr.c branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
Subject: Bug 45678 Author: jakub Date: Tue Sep 21 16:30:21 2010 New Revision: 164486 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164486 Log: PR middle-end/45678 * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If op0 isn't sufficiently aligned and there is movmisalignM insn for mode, use it to load op0 into a temporary register. Backport from mainline 2010-09-20 Jakub Jelinek <jakub@redhat.com> PR middle-end/45678 * cfgexpand.c (expand_one_stack_var_at): Limit alignment to crtl->max_used_stack_slot_alignment. Backport from mainline 2010-09-17 Richard Guenther <rguenther@suse.de> H.J. Lu <hongjiu.lu@intel.com> PR middle-end/45678 * gcc.dg/torture/pr45678-1.c: New. * gcc.dg/torture/pr45678-2.c: Likewise. Added: branches/gcc-4_4-branch/gcc/testsuite/gcc.dg/torture/pr45678-1.c branches/gcc-4_4-branch/gcc/testsuite/gcc.dg/torture/pr45678-2.c Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/cfgexpand.c branches/gcc-4_4-branch/gcc/expr.c branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
Fixed.
Author: rguenth Date: Tue Jan 24 09:17:01 2012 New Revision: 183470 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183470 Log: 2012-01-24 Richard Guenther <rguenther@suse.de> Forward-port to trunk 2010-09-21 Jakub Jelinek <jakub@redhat.com> PR middle-end/45678 * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If op0 isn't sufficiently aligned and there is movmisalignM insn for mode, use it to load op0 into a temporary register. Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c
Author: rguenth Date: Tue Jan 24 10:23:14 2012 New Revision: 183471 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183471 Log: 2012-01-24 Richard Guenther <rguenther@suse.de> Forward-port to branch 2010-09-21 Jakub Jelinek <jakub@redhat.com> PR middle-end/45678 * expr.c (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: If op0 isn't sufficiently aligned and there is movmisalignM insn for mode, use it to load op0 into a temporary register. Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/expr.c