This bug is transient and sensible to code/structures re-arrangements and how things get inlined. In the included testcases it shows up as unaligned stack load/store but atm in the current app, i also have values being smashed on the stack and no segfaults. Shows up with g++ 4.1.0 and 4.2-20060225 on x86 (cygwin) and x86-64 (linux), and in fact all 4.2.x i have tried. With this script... #!/usr/bin/perl while(<>) { chomp; next if !/movaps/; next if !/esp/; next if !/(0x\w+)/; next if substr($1, -1, 1) eq '0'; print "$_\n"; } ... and g++4.1.0 on cygwin... /usr/local/gcc-4.1.0/bin/g++ -march=k8 -mfpmath=sse -msse3 -O2 -fomit-frame-pointer bogus1.ii -c -o tt1.o && objdump.exe -d --no-show-raw-insn tt1.o |./check_alignment.pl 1664: movaps %xmm0,0x7c8(%esp) 2054: movaps %xmm0,0x318(%esp) 28cd: movaps %xmm0,0x1f8(%esp) 4579: movaps %xmm0,0x338(%esp) 513d: movaps %xmm0,0x328(%esp) /usr/local/gcc-4.1.0/bin/g++ -march=k8 -mfpmath=sse -msse3 -fomit-frame-pointer -Os bogus2.ii -c -o tt2.o && objdump.exe -d --no-show-raw-insn tt2.o |./check_alignment.pl 274: movaps %xmm5,0x74(%esp) 281: movaps %xmm1,0x64(%esp) 2ac: movaps %xmm4,0x84(%esp) 2b8: movaps %xmm4,0x84(%esp) 2cf: movaps %xmm5,0x54(%esp) 2d8: movaps %xmm5,0x54(%esp) 2e9: movaps %xmm0,0x44(%esp) 2f1: movaps %xmm0,0x44(%esp) 3a3: movaps %xmm3,0x34(%esp) 3a8: movaps %xmm1,0x24(%esp) 426: movaps 0x24(%esp),%xmm7 475: movaps 0x34(%esp),%xmm4 4cf: movaps 0x64(%esp),%xmm0 851: movaps %xmm0,0x18(%esp) 859: movaps 0x18(%esp),%xmm2 865: movaps %xmm0,0x28(%esp) 879: movaps 0x18(%esp),%xmm0 903: movaps 0x18(%esp),%xmm0 [snipped 300 more] Excuse those large testcases but i have no idea how to reproduce it and it only happens in that rather large unit.
Created attachment 11024 [details] testcase #1
Created attachment 11025 [details] testcase #2
_mm_store_ss((float*)(((float*) &rays[0]) + 0), (pvx));
I don't think rays[0] is a POD so this might turn out to be a bug in your code.
vec_t is a non-POD type because it has a user-defined copy assignment operator, thus ray_t can't be a POD either.
You're right, but that's a _mm_store_ss/movss asking for a 4 bytes alignment (which is satisfied) and not a movaps with a 16 bytes constraint. The latter are what are causing problems.
For clarification i should say that rt::mono::ray_t which uses vec_t etc, isn't a source of trouble, it's part of the single ray path where mostly scalar ops are used. There's a symmetrical set of structures in rt::packet which deal with bundles of rays (ie 2x2) and uses packed vectors; that's what that unit is massaging. Some functions have a bunch of live 16 bytes aligned data on the stack and depending on how they get (force_)inlined g++ goes nuts an forgets about those constraints.
Gcc <= 4.2.x are not supported anymore (BTW: A lot of aligmnent fixes went into gcc-4.4.x, so there is a big chance of bug being fixed there).