Summary: | Building simple test application with -march=pentium3 -Os gives SIGSEGV (unaligned sse instruction) | ||
---|---|---|---|
Product: | gcc | Reporter: | Roger Larsson <roger.larsson> |
Component: | target | Assignee: | Jason Merrill <jason> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | agner, belyshev, gcc-bugs, kpfleming, mueller, thiago, toolchain, ubizjak |
Priority: | P1 | Keywords: | wrong-code |
Version: | 3.3.4 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | 3.2.3 3.3.6 3.4.6 4.0.3 4.1.0 4.2.0 | Last reconfirmed: | 2006-09-07 19:45:38 |
Bug Depends on: | 33721 | ||
Bug Blocks: | 28069, 28621 | ||
Attachments: |
c++ source from kde/kdebase/kicker/applets/clock/
Assembly output Assembly output Source code that gives SIGSEGV Assembly output |
Description
Roger Larsson
2004-01-14 18:28:33 UTC
Created attachment 5487 [details]
c++ source from kde/kdebase/kicker/applets/clock/
crash at line 536
I think the problem is that the stack is unaligned causing the problem, I know there is another bug for that, if that is the bug, can you just mark it as a dup, I think it is suspended. Created attachment 5489 [details]
Assembly output
Crashes at this line when compiled with -march=pentium3 [-mcpu=pentium3] -O1
(There is only one movaps...)
0x41a8867b AnalogClock::paintEvent movaps %xmm0,0xfffffc78(%ebp)
where
ebp 0xbfffea60 0xbfffea60
This will generate a non 16 byte aligned access!!!
=> Crash
Code is not from a lib, it is generated by gcc. => I do not think this is a DUP of bug 10395 I did a little script to search for the problems. # /bin/bash # Filename: objdump_find_problematic grep_sse () { grep $1 "\(\(mov\(ap\|up\|ntp\)\|shufp\|unpck\(hp\|lp\)\|\(add\|\mul\| div\|and\|andn\|or\|xor\|max\|min\|cmp\.\.\|sqrt\|rsqrt\|rcp\)p\)s\| \(cvt\(ps2pi\|ss2pi\)\)\)" } if objdump -d $1 | grep_sse -q ; then echo $1 objdump -d $1 | grep_sse fi and run my complete installation of kdecvs (-Os -march=pentium3) through it. # find /opt/kdecvs/ -type f -perm +111 | xargs -n 1 bin/objdump_find_problematic | tee problematic.sse Among those found where kfiresaver3d starting it will crash at a unaligned movaps (actual code from glibc-2.3) Complete list, some are OK with aligned offset: # more problematic.sse | grep "0x\|opt" /opt/kdecvs/bin/kdm_greet 805e5ac: 0f 29 85 98 fc ff ff movaps %xmm0,0xfffffc98(%ebp) /opt/kdecvs/bin/keuphoria.kss 804fe6e: 0f 29 45 e0 movaps %xmm0,0xffffffe0(%ebp) 80500fd: 0f 29 45 d0 movaps %xmm0,0xffffffd0(%ebp) /opt/kdecvs/bin/kfiresaver3d 80502ac: 0f 29 85 7c ff ff ff movaps %xmm0,0xffffff7c(%ebp) 8050529: 0f 29 85 64 ff ff ff movaps %xmm0,0xffffff64(%ebp) /opt/kdecvs/lib/libartsflow.so.1.0.0 /opt/kdecvs/lib/kde3/kdeprint_cups.so 4042d: 0f 29 45 8c movaps %xmm0,0xffffff8c(%ebp) /opt/kdecvs/lib/kde3/kcm_clock.so e294: 0f 29 85 20 fe ff ff movaps %xmm0,0xfffffe20(%ebp) /opt/kdecvs/lib/kde3/libkiviopart.so f8a94: 0f 29 45 a8 movaps %xmm0,0xffffffa8(%ebp) f8d47: 0f 29 45 80 movaps %xmm0,0xffffff80(%ebp) f90ad: 0f 29 85 5c ff ff ff movaps %xmm0,0xffffff5c(%ebp) fa0ac: 0f 29 45 a8 movaps %xmm0,0xffffffa8(%ebp) fa305: 0f 29 45 a4 movaps %xmm0,0xffffffa4(%ebp) /opt/kdecvs/lib/kde3/libkpovmodelerpart.so.0.0.0 /opt/kdecvs/lib/libkdefx.so.4.2.0 /opt/kdecvs/lib/libnoatunarts.so 576d8: 0f 10 51 10 movups 0x10(%ecx),%xmm2 576dc: 0f c6 d2 00 shufps $0x0,%xmm2,%xmm2 576e0: 0f 10 61 14 movups 0x14(%ecx),%xmm4 576e4: 0f 10 69 24 movups 0x24(%ecx),%xmm5 57720: 0f c6 c8 b1 shufps $0xb1,%xmm0,%xmm1 57737: 0f c6 c5 24 shufps $0x24,%xmm5,%xmm0 5773b: 0f c6 e8 81 shufps $0x81,%xmm0,%xmm5 57743: 0f c6 db 39 shufps $0x39,%xmm3,%xmm3 57747: 0f c6 f6 39 shufps $0x39,%xmm6,%xmm6 57752: 0f 11 69 24 movups %xmm5,0x24(%ecx) 60ea4: 0f c6 c9 00 shufps $0x0,%xmm1,%xmm1 60ee6: 0f c6 c2 00 shufps $0x0,%xmm2,%xmm0 60ef5: 0f c6 c0 02 shufps $0x2,%xmm0,%xmm0 This is a BIG problem! (kivio, and kfiresaver3d have been verified to crash) Your report seems to indicate that the problematic asm is produced all over the place. Can you either provide preprocessed source, or (better, and if the problem is widespread, should not be too difficult) a small, self-contained testcase? Thanks. I am currently trying to create a testcase - the instructions gets generated but I do not get it unaligned - to crash... -O1 -march=pentium3 Gives the instructions (movaps) -O1 -march=pentium3 -mno-sse Is needed to avoid them. This is the shortest code yet (matrix.cpp) I found that generates the offending instruction (movaps) - but I have not yet succeeded in getting the stack unaligned... [Do you need that too?] class RTime { public: int minute() {} }; void rotate(float x) { } int main() { RTime _time; // hour float h_angle = _time.minute(); rotate(-h_angle); // minute float m_angle = _time.minute(); rotate(-m_angle); } I am pretty sure that Qt should be simple to Compile with: gcc -I/usr/src/kde/qt-copy/mkspecs/linux-g++ -I../../include -o matrix.o -march=pentium3 -O1 matrix.cpp Subject: Re: Building KDE3.2 clock applet with -march=pentium3 -O1 gives SIGSEGV > > ------- Additional Comments From roger dot larsson at norran dot net 2004-01-19 08:11 ------- > This is the shortest code yet (matrix.cpp) I found that generates the > offending instruction (movaps) - but I have not yet succeeded in getting > the stack unaligned... [Do you need that too?] Roger, I am not able to reproduce it from your testcase, but sending the assembly file produces with g++ -O1 -march=pentium3 -dp -S will probably give me enough information to fix it. Thanks! Honza > > class RTime > { > public: > int minute() {} > }; > > void rotate(float x) > { > } > > int main() > { > RTime _time; > > // hour > float h_angle = _time.minute(); > rotate(-h_angle); > > // minute > float m_angle = _time.minute(); > rotate(-m_angle); > } > > I am pretty sure that Qt should be simple to > Compile with: > gcc -I/usr/src/kde/qt-copy/mkspecs/linux-g++ -I../../include -o matrix.o > -march=pentium3 -O1 matrix.cpp > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685 Created attachment 5520 [details]
Assembly output
Output from
g++ -O1 -march=pentium3 -dp -S matrix.cpp
Created attachment 5521 [details]
Source code that gives SIGSEGV
The key is - you have to compile with -Os to get unaligned stack.
(my surrounding system was compiled with -Os then it is enough
to compile the routines that give movaps with -O1)
Created attachment 5522 [details]
Assembly output
Assembly output from
g++ -Os -march=pentium3 matrix.cpp -dp -S
I guess no more WAITING is needed -> NEW Got a query about this bug... It is still valid for gcc 3.3.4 My computer has been upgraded to a athlon-xp, so I tested both g++ -Os -march=pentium3 matrix.cpp -o matrix and g++ -Os -march=athlon-xp matrix.cpp -o matrix they both gives Segmentation fault when running ./matrix Other optimization levels works. Updated summary etc. *** Bug 15617 has been marked as a duplicate of this bug. *** Another testcase, use "-Os -msse", fails with all versions since 3.2: typedef float __m128 __attribute__ ((vector_size (16))); typedef int __m64 __attribute__ ((vector_size (8))); int puts (const char *s); void foo (__m128 *, __m64 *, int); int main (void) { foo (0, 0, 0); return 0; } void foo (__m128 *dst, __m64 *src, int n) { __m128 xmm0 = { 0 }; while (n > 64) { puts (""); xmm0 = __builtin_ia32_cvtpi2ps (xmm0, *src); *dst = xmm0; n --; } } raising severity because this bug makes "-Os" almost useless on modern x86. Works OK with gcc-4.2 and -Os -msse -fomit-frame-pointer. (In reply to comment #16) > raising severity because this bug makes "-Os" almost useless on modern x86. > With "gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)" my testcase works but not Serge Belyshevs You can work around this bug with -mpreferred-stack-boundary=4 Subject: Bug 13685 Author: jason Date: Fri Sep 8 00:28:30 2006 New Revision: 116775 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116775 Log: PR target/13685 * config/i386/i386.c (override_options): Use 128-bit stack boundary if -msse. Added: trunk/gcc/testsuite/gcc.target/i386/sse-20.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c *** Bug 27537 has been marked as a duplicate of this bug. *** Subject: Bug 13685 Author: hjl Date: Mon Sep 11 21:34:06 2006 New Revision: 116860 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116860 Log: gcc/ 2006-09-11 H.J. Lu <hongjiu.lu@intel.com> PR target/13685 PR target/27537 PR target/28621 * config/i386/i386.c (override_options): Always default to 16 byte stack boundary. gcc/testsuite/ 2006-09-11 H.J. Lu <hongjiu.lu@intel.com> PR target/13685 * gcc.target/i386/pr13685.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr13685.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog Subject: Bug 13685 Author: hjl Date: Tue Sep 12 02:54:42 2006 New Revision: 116870 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116870 Log: gcc/ 2006-09-11 H.J. Lu <hongjiu.lu@intel.com> PR target/13685 PR target/27537 PR target/28621 * config/i386/i386.c (override_options): Always default to 16 byte stack boundary. gcc/testsuite/ 2006-09-11 H.J. Lu <hongjiu.lu@intel.com> PR target/13685 * gcc.target/i386/pr13685.c: New test. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.target/i386/pr13685.c Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/config/i386/i386.c branches/gcc-4_1-branch/gcc/testsuite/ChangeLog This has been fixed for a while. is forcing the alignment to an even larger value really the fix ? is there no way to do such things on the fly ? after all, if someone turns around and tries to do a custom alignment on the stack that is larger than 16 bytes, that will fail (but i guess this issue will be handled at PR28069 ?) Thank you for fixing this, but you need to tell the world which solution you have chosen. Please see the discussion at the dublicate bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537 for arguments for and against each possible solution. You need to specify whether the chosen solution is to enforce 16 byte stack alignment regardless of -Os option or the solution is to make no assumption about stack alignment when making XMM code. This is an ABI issue that has to be standardized and made public. The makers of the Intel compiler are waiting for a resolution to this issue so that they can make their compiler compatible with GCC. For the same reason, assembly programmers need to know whether stack alignment is required or not. May I point that alternative solution (to align stack _in the function which needs it_) doesn't crash if called by code generated by old or new gcc, and also gives smaller, faster and less stack consuming code for all people who do not do any SSE stuff? |