Bug 13685 - Building simple test application with -march=pentium3 -Os gives SIGSEGV (unaligned sse instruction)
Summary: Building simple test application with -march=pentium3 -Os gives SIGSEGV (unal...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.3.4
: P1 major
Target Milestone: ---
Assignee: Jason Merrill
URL:
Keywords: wrong-code
: 15617 27537 (view as bug list)
Depends on: 33721
Blocks: 28069 28621
  Show dependency treegraph
 
Reported: 2004-01-14 18:28 UTC by Roger Larsson
Modified: 2007-10-10 04:12 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 3.2.3 3.3.6 3.4.6 4.0.3 4.1.0 4.2.0
Last reconfirmed: 2006-09-07 19:45:38


Attachments
c++ source from kde/kdebase/kicker/applets/clock/ (9.56 KB, text/x-c++src)
2004-01-14 18:33 UTC, Roger Larsson
Details
Assembly output (92.19 KB, text/plain)
2004-01-14 19:51 UTC, Roger Larsson
Details
Assembly output (600 bytes, text/plain)
2004-01-19 11:43 UTC, Roger Larsson
Details
Source code that gives SIGSEGV (160 bytes, text/plain)
2004-01-19 12:44 UTC, Roger Larsson
Details
Assembly output (687 bytes, text/plain)
2004-01-19 12:47 UTC, Roger Larsson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Larsson 2004-01-14 18:28:33 UTC
See (lots of confusion...) 
 http://bugs.kde.org/show_bug.cgi?id=70655 
 
I will attach source and assembly.
Comment 1 Roger Larsson 2004-01-14 18:33:57 UTC
Created attachment 5487 [details]
c++ source from kde/kdebase/kicker/applets/clock/

crash at line 536
Comment 2 Andrew Pinski 2004-01-14 18:34:16 UTC
I think the problem is that the stack is unaligned causing the problem, I know there is another bug 
for that, if that is the bug, can you just mark it as a dup, I think it is suspended.
Comment 3 Roger Larsson 2004-01-14 19:51:15 UTC
Created attachment 5489 [details]
Assembly output

Crashes at this line when compiled with -march=pentium3 [-mcpu=pentium3] -O1
(There is only one movaps...)

	0x41a8867b AnalogClock::paintEvent   movaps %xmm0,0xfffffc78(%ebp)

where

ebp	       0xbfffea60	0xbfffea60

This will generate a non 16 byte aligned access!!!
=> Crash
Comment 4 Roger Larsson 2004-01-14 23:50:41 UTC
Code is not from a lib, it is generated by gcc. 
=> I do not think this is a DUP of bug 10395 
 
I did a little script to search for the problems. 
# /bin/bash 
# Filename: objdump_find_problematic 
 
grep_sse () 
{ 
	grep $1 "\(\(mov\(ap\|up\|ntp\)\|shufp\|unpck\(hp\|lp\)\|\(add\|\mul\|
div\|and\|andn\|or\|xor\|max\|min\|cmp\.\.\|sqrt\|rsqrt\|rcp\)p\)s\|
\(cvt\(ps2pi\|ss2pi\)\)\)" 
} 
 
if objdump -d $1 | grep_sse -q ; then 
   echo $1 
   objdump -d $1 | grep_sse 
fi 
 
and run my complete installation of kdecvs (-Os -march=pentium3) through it. 
 
# find /opt/kdecvs/ -type f -perm +111 | xargs -n 1 
bin/objdump_find_problematic | tee problematic.sse 
 
Among those found where 
	kfiresaver3d starting it will crash at a unaligned movaps 
	 (actual code from glibc-2.3) 
 
Complete list, some are OK with aligned offset: 
 
# more problematic.sse | grep "0x\|opt" 
/opt/kdecvs/bin/kdm_greet 
 805e5ac:       0f 29 85 98 fc ff ff    movaps %xmm0,0xfffffc98(%ebp) 
/opt/kdecvs/bin/keuphoria.kss 
 804fe6e:       0f 29 45 e0             movaps %xmm0,0xffffffe0(%ebp) 
 80500fd:       0f 29 45 d0             movaps %xmm0,0xffffffd0(%ebp) 
/opt/kdecvs/bin/kfiresaver3d 
 80502ac:       0f 29 85 7c ff ff ff    movaps %xmm0,0xffffff7c(%ebp) 
 8050529:       0f 29 85 64 ff ff ff    movaps %xmm0,0xffffff64(%ebp) 
/opt/kdecvs/lib/libartsflow.so.1.0.0 
/opt/kdecvs/lib/kde3/kdeprint_cups.so 
   4042d:       0f 29 45 8c             movaps %xmm0,0xffffff8c(%ebp) 
/opt/kdecvs/lib/kde3/kcm_clock.so 
    e294:       0f 29 85 20 fe ff ff    movaps %xmm0,0xfffffe20(%ebp) 
/opt/kdecvs/lib/kde3/libkiviopart.so 
   f8a94:       0f 29 45 a8             movaps %xmm0,0xffffffa8(%ebp) 
   f8d47:       0f 29 45 80             movaps %xmm0,0xffffff80(%ebp) 
   f90ad:       0f 29 85 5c ff ff ff    movaps %xmm0,0xffffff5c(%ebp) 
   fa0ac:       0f 29 45 a8             movaps %xmm0,0xffffffa8(%ebp) 
   fa305:       0f 29 45 a4             movaps %xmm0,0xffffffa4(%ebp) 
/opt/kdecvs/lib/kde3/libkpovmodelerpart.so.0.0.0 
/opt/kdecvs/lib/libkdefx.so.4.2.0 
/opt/kdecvs/lib/libnoatunarts.so 
   576d8:       0f 10 51 10             movups 0x10(%ecx),%xmm2 
   576dc:       0f c6 d2 00             shufps $0x0,%xmm2,%xmm2 
   576e0:       0f 10 61 14             movups 0x14(%ecx),%xmm4 
   576e4:       0f 10 69 24             movups 0x24(%ecx),%xmm5 
   57720:       0f c6 c8 b1             shufps $0xb1,%xmm0,%xmm1 
   57737:       0f c6 c5 24             shufps $0x24,%xmm5,%xmm0 
   5773b:       0f c6 e8 81             shufps $0x81,%xmm0,%xmm5 
   57743:       0f c6 db 39             shufps $0x39,%xmm3,%xmm3 
   57747:       0f c6 f6 39             shufps $0x39,%xmm6,%xmm6 
   57752:       0f 11 69 24             movups %xmm5,0x24(%ecx) 
   60ea4:       0f c6 c9 00             shufps $0x0,%xmm1,%xmm1 
   60ee6:       0f c6 c2 00             shufps $0x0,%xmm2,%xmm0 
   60ef5:       0f c6 c0 02             shufps $0x2,%xmm0,%xmm0 
 
This is a BIG problem! (kivio, and kfiresaver3d have been verified to crash) 
Comment 5 Dara Hazeghi 2004-01-18 18:07:24 UTC
Your report seems to indicate that the problematic asm is produced all over the
place. Can you either provide preprocessed source, or (better, and if the
problem is widespread, should not be too difficult) a small, self-contained
testcase? Thanks.
Comment 6 Roger Larsson 2004-01-19 06:38:20 UTC
I am currently trying to create a testcase - the instructions gets generated 
but I do not get it unaligned - to crash... 
 
-O1 -march=pentium3 
 
Gives the instructions (movaps) 
 
-O1 -march=pentium3 -mno-sse 
 
Is needed to avoid them. 
 
Comment 7 Roger Larsson 2004-01-19 08:11:32 UTC
This is the shortest code yet (matrix.cpp) I found that generates the 
offending instruction (movaps) - but I have not yet succeeded in getting 
the stack unaligned... [Do you need that too?] 
 
class RTime 
{ 
public: 
    int minute() {} 
}; 
 
void rotate(float x) 
{ 
} 
 
int main() 
{ 
    RTime _time; 
 
    // hour 
    float h_angle = _time.minute(); 
    rotate(-h_angle); 
 
    // minute 
    float m_angle = _time.minute(); 
    rotate(-m_angle); 
} 
 
I am pretty sure that Qt should be simple to  
Compile with: 
gcc -I/usr/src/kde/qt-copy/mkspecs/linux-g++ -I../../include -o matrix.o 
-march=pentium3 -O1 matrix.cpp 
Comment 8 Jan Hubicka 2004-01-19 10:42:08 UTC
Subject: Re:  Building KDE3.2 clock applet with -march=pentium3 -O1 gives SIGSEGV

> 
> ------- Additional Comments From roger dot larsson at norran dot net  2004-01-19 08:11 -------
> This is the shortest code yet (matrix.cpp) I found that generates the 
> offending instruction (movaps) - but I have not yet succeeded in getting 
> the stack unaligned... [Do you need that too?] 

Roger,
I am not able to reproduce it from your testcase, but sending the
assembly file produces with g++ -O1 -march=pentium3 -dp -S will probably
give me enough information to fix it.

Thanks!
Honza
>  
> class RTime 
> { 
> public: 
>     int minute() {} 
> }; 
>  
> void rotate(float x) 
> { 
> } 
>  
> int main() 
> { 
>     RTime _time; 
>  
>     // hour 
>     float h_angle = _time.minute(); 
>     rotate(-h_angle); 
>  
>     // minute 
>     float m_angle = _time.minute(); 
>     rotate(-m_angle); 
> } 
>  
> I am pretty sure that Qt should be simple to  
> Compile with: 
> gcc -I/usr/src/kde/qt-copy/mkspecs/linux-g++ -I../../include -o matrix.o 
> -march=pentium3 -O1 matrix.cpp 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685
Comment 9 Roger Larsson 2004-01-19 11:43:09 UTC
Created attachment 5520 [details]
Assembly output

Output from

g++ -O1 -march=pentium3 -dp -S matrix.cpp
Comment 10 Roger Larsson 2004-01-19 12:44:54 UTC
Created attachment 5521 [details]
Source code that gives SIGSEGV

The key is - you have to compile with -Os to get unaligned stack.
(my surrounding system was compiled with -Os then it is enough
 to compile the routines that give movaps with -O1)
Comment 11 Roger Larsson 2004-01-19 12:47:14 UTC
Created attachment 5522 [details]
Assembly output

Assembly output from
 g++ -Os -march=pentium3 matrix.cpp -dp -S
Comment 12 Roger Larsson 2004-01-19 12:48:24 UTC
I guess no more WAITING is needed -> NEW 
Comment 13 Roger Larsson 2005-01-07 09:49:41 UTC
Got a query about this bug...   
   
It is still valid for gcc 3.3.4   
My computer has been upgraded to a athlon-xp, so I tested both   
   
g++ -Os -march=pentium3 matrix.cpp -o matrix   
 and   
g++ -Os -march=athlon-xp matrix.cpp -o matrix   
   
they both gives Segmentation fault when running   
./matrix   
   
Other optimization levels works.  
   
Updated summary etc.  
   
Comment 14 Serge Belyshev 2006-02-21 12:38:14 UTC
*** Bug 15617 has been marked as a duplicate of this bug. ***
Comment 15 Serge Belyshev 2006-02-21 12:45:28 UTC
Another testcase, use "-Os -msse", fails with all versions since 3.2:


typedef float __m128 __attribute__ ((vector_size (16)));
typedef int __m64 __attribute__ ((vector_size (8)));

int puts (const char *s);
void foo (__m128 *, __m64 *, int);

int main (void)
{
  foo (0, 0, 0);
  return 0;
}

void foo (__m128 *dst, __m64 *src, int n)
{
  __m128 xmm0 = { 0 };
  while (n > 64)
    {
      puts ("");
      xmm0 = __builtin_ia32_cvtpi2ps (xmm0, *src);
      *dst = xmm0;
      n --;
    }
}
Comment 16 Serge Belyshev 2006-02-21 12:51:41 UTC
raising severity because this bug makes "-Os" almost useless on modern x86.
Comment 17 Uroš Bizjak 2006-02-22 10:15:28 UTC
Works OK with gcc-4.2 and -Os -msse -fomit-frame-pointer.
Comment 18 Roger Larsson 2006-02-22 11:31:36 UTC
(In reply to comment #16)
> raising severity because this bug makes "-Os" almost useless on modern x86.
> 
With "gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)"
my testcase works but not Serge Belyshevs 
Comment 19 Jason Merrill 2006-09-07 20:24:25 UTC
You can work around this bug with -mpreferred-stack-boundary=4
Comment 20 Jason Merrill 2006-09-08 00:28:38 UTC
Subject: Bug 13685

Author: jason
Date: Fri Sep  8 00:28:30 2006
New Revision: 116775

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116775
Log:
        PR target/13685
        * config/i386/i386.c (override_options): Use 128-bit
        stack boundary if -msse.

Added:
    trunk/gcc/testsuite/gcc.target/i386/sse-20.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c

Comment 21 H.J. Lu 2006-09-08 00:45:54 UTC
*** Bug 27537 has been marked as a duplicate of this bug. ***
Comment 22 hjl@gcc.gnu.org 2006-09-11 21:34:17 UTC
Subject: Bug 13685

Author: hjl
Date: Mon Sep 11 21:34:06 2006
New Revision: 116860

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116860
Log:
gcc/

2006-09-11  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/13685
	PR target/27537
	PR target/28621
	* config/i386/i386.c (override_options): Always default to 16
	byte stack boundary.

gcc/testsuite/

2006-09-11  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/13685
	* gcc.target/i386/pr13685.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr13685.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog

Comment 23 hjl@gcc.gnu.org 2006-09-12 02:54:58 UTC
Subject: Bug 13685

Author: hjl
Date: Tue Sep 12 02:54:42 2006
New Revision: 116870

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116870
Log:
gcc/

2006-09-11  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/13685
	PR target/27537
	PR target/28621
	* config/i386/i386.c (override_options): Always default to 16
	byte stack boundary.

gcc/testsuite/

2006-09-11  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/13685
	* gcc.target/i386/pr13685.c: New test.

Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.target/i386/pr13685.c
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/config/i386/i386.c
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog

Comment 24 Jason Merrill 2006-09-22 22:42:37 UTC
This has been fixed for a while.
Comment 25 Mike Frysinger 2006-09-22 22:54:39 UTC
is forcing the alignment to an even larger value really the fix ?  is there no way to do such things on the fly ?  after all, if someone turns around and tries to do a custom alignment on the stack that is larger than 16 bytes, that will fail (but i guess this issue will be handled at PR28069 ?)
Comment 26 Agner Fog 2006-09-23 08:23:20 UTC
Thank you for fixing this, but you need to tell the world which solution you have chosen. Please see the discussion at the dublicate bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27537 for arguments for and against each possible solution.

You need to specify whether the chosen solution is to enforce 16 byte stack alignment regardless of -Os option or the solution is to make no assumption about stack alignment when making XMM code. This is an ABI issue that has to be standardized and made public. The makers of the Intel compiler are waiting for a resolution to this issue so that they can make their compiler compatible with GCC. For the same reason, assembly programmers need to know whether stack alignment is required or not.
Comment 27 Denis Vlasenko 2007-07-23 00:06:58 UTC
May I point that alternative solution (to align stack _in the function which needs it_) doesn't crash if called by code generated by old or new gcc, and also  gives smaller, faster and less stack consuming code for all people who do not do any SSE stuff?