Bug 41156 - [4.4/4.5/4.6 Regression] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
Summary: [4.4/4.5/4.6 Regression] zlib segfault in inflate_table() compiled w/ -O -mss...
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.2
: P3 normal
Target Milestone: 4.4.5
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 40838 32893
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-24 15:56 UTC by H.J. Lu
Modified: 2010-09-30 11:13 UTC (History)
13 users (show)

See Also:
Host: i686-linux-gnu
Target: i686-linux-gnu
Build: i686-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
sse & 32bit -> -mstackrealign (example only!) (723 bytes, patch)
2009-09-19 16:12 UTC, Dzianis Kahanovich
Details | Diff
sse & 32bit -> -mstackrealign (example 2) (1012 bytes, patch)
2009-09-25 11:03 UTC, Dzianis Kahanovich
Details | Diff
1) sse-stackrealign-misalignsse-4.4.patch (1.16 KB, patch)
2009-10-09 14:08 UTC, Dzianis Kahanovich
Details | Diff
2) sse-stackrealign-sse4a-4.4.patch (1.08 KB, patch)
2009-10-09 14:09 UTC, Dzianis Kahanovich
Details | Diff
1) sse-stackrealign-misalignsse-4.5.patch (1.17 KB, patch)
2009-10-09 14:10 UTC, Dzianis Kahanovich
Details | Diff
2) sse-stackrealign-sse4a-4.5.patch (1.07 KB, patch)
2009-10-09 14:10 UTC, Dzianis Kahanovich
Details | Diff
(2) for 4.4, fixed (1.08 KB, patch)
2009-10-09 14:39 UTC, Dzianis Kahanovich
Details | Diff
(2) for 4.5, fixed (1.07 KB, patch)
2009-10-09 14:41 UTC, Dzianis Kahanovich
Details | Diff
tests proposal from PR 40838 (from H.J. Lu) (772 bytes, patch)
2009-10-10 13:51 UTC, Dzianis Kahanovich
Details | Diff
(2) for 4.4 (873 bytes, patch)
2009-11-27 12:49 UTC, Dzianis Kahanovich
Details | Diff
(2) for 4.5 (863 bytes, patch)
2009-11-27 12:51 UTC, Dzianis Kahanovich
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2009-08-24 15:56:14 UTC
+++ This bug was initially created as a clone of Bug #32893 +++

Gcc 4.4/4.5 can align stack properly. But it needs to assume 4 byte
incoming stack alignment instead of 16byte.
Comment 1 Dzianis Kahanovich 2009-08-25 10:20:23 UTC
Fix: I got bug with -msse only, not -msse2.
Comment 2 Jakub Jelinek 2009-08-31 08:10:46 UTC
Why is this considered a regression?
Comment 3 H.J. Lu 2009-08-31 13:51:58 UTC
(In reply to comment #2)
> Why is this considered a regression?
> 

It is a regression for PR 32893, which was fixed by disabling
vectorizer on local variables requiring > 4byte alignment since
we couldn't realign the stack at the time to support 4byte
incoming stack. With automatic stack alignment, this restricting
was removed. But since we assume incoming stack is 16byte, we
don't realign the stack. When incoming stack is 4 byte aligned,
we get segfault.
Comment 4 Mark Mitchell 2009-08-31 21:47:51 UTC
HJ, this doesn't make sense.  

Either we can assume 16-byte stack alignment, or we can't.  Which is it?
Comment 5 H.J. Lu 2009-08-31 22:25:50 UTC
(In reply to comment #4)
> HJ, this doesn't make sense.  
> 
> Either we can assume 16-byte stack alignment, or we can't.  Which is it?
> 

On ia32, except for MacOS, we don't enforce 16-byte stack alignment.
We get 4-byte incoming stack alignment via object files:

1. Compiled with older gcc.
2. Compiled with gcc 3.x and earlier 4.x using -Os.
3. Compiled with gcc using -mpreferred-stack-boundary=2
4. From assembly code with 4 byte stack alignment.
5. Compiled with other compilers, which generate 4 byte stack
alignment.

It usually won't cause segfault until vectorizer is enabled.
Comment 6 Jakub Jelinek 2009-09-01 09:28:19 UTC
IMHO either standard options compiled code shouldn't be called from -mpreferred-stack-boundary=2 code, or it needs to be compiled with -mincoming-stack-boundary=2.  But it should be user's responsibility.  Ensuring by default outgoing calls are 16 byte aligned, but not assuming it is just a very stupid thing to do and unnecessarily penalizes normal users.  It is certainly not true that most code is compiled with -mpreferred-stack-boundary=2, only kernel and a handful of packages is by default, and kernel has its own ABI (and doesn't use FPU nor SSE*).
Comment 7 H.J. Lu 2009-09-01 13:20:03 UTC
Realign the incoming stack for vectorizer has very limited impact
on performance. Here are the differences of -m32 -O3 -msse2
-mfpmath=sse -ffast-math -funroll-loops before and after my patch:

400.perlbench                    -0.384615%
401.bzip2                        0%
403.gcc                          -0.362319%
429.mcf                          -0.813008%
445.gobmk                        0.921659%
456.hmmer                        0.549451%
458.sjeng                        -0.438596%
462.libquantum                   0%
464.h264ref                      0%
471.omnetpp                      -0.478469%
473.astar                        -0.645161%
483.xalancbmk                    -0.727273%
SPECint(R)_base2006              -0.411523%
410.bwaves                       -0.406504%
416.gamess                       0%
433.milc                         -1.36986%
434.zeusmp                       -0.44843%
435.gromacs                      0%
436.cactusADM                    0%
437.leslie3d                     -0.888889%
444.namd                         1.20482%
447.dealII                       -0.350877%
450.soplex                       -0.31746%
453.povray                       0.458716%
454.calculix                     0%
459.GemsFDTD                     0%
465.tonto                        0%
470.lbm                          0%
481.wrf                          0.480769%
482.sphinx3                      0.940439%
SPECfp(R)_base2006               0%

It won't change generated code if vectorizer isn't
enabled. Its benifits outweigh its drawbacks.
Comment 8 Dzianis Kahanovich 2009-09-08 12:08:49 UTC
I don't trying to rebuild all with 4-byte incoming stack alignment while, but think to unify this step with full ABI change. For me it may solve migration to -mregparm=3 ("ix86_regparm = REGPARM_MAX;" for x86_32) and others in crosscompile way (for example, to "i686-eabi-linux-gnu" target). For people it may solve "ABI standard" orthodox problem too.

Most undiscovered place FOR ME (while I busy in other place): adding support to "eabi*" (1+*) keyword, eabi/eabi1 may be REGPARM_MAX + 4-byte incoming stack alignment (or even more safe solution) + any others safe (if exists) changes. eabi2 - hardcoded -msseregparm (I unsure in softfloat/no-sse compatibility, then there are must be optional), etc.
Comment 9 Dzianis Kahanovich 2009-09-19 16:12:40 UTC
Created attachment 18608 [details]
sse & 32bit -> -mstackrealign (example only!)

Previous my ideas too heavy. :)
IMHO native solution for this problem is "-mstackrealign". It solving problems with known to me packages (including zlib). I trying to make STACK_REALIGN_DEFAULT related from TARGET_SSE && !TARGET_64BIT (see patch). But got "internal compiler error" on gcc self-compiling with "-march=pentium4". Without sse (=without -mstackrealign) self-compiling works.

Why -mstackrealign may be bad and why gcc dont' self-compiling so?

Error:
===
/var/tmp/portage/sys-devel/gcc-4.4.1/work/build/./gcc/xgcc -B/var/tmp/portage/sys-devel/gcc-4.4.1/work/build/./gcc/ -B/usr/i686-pc-linux-gnu/bin/ -B/usr/i686-pc-linux-gnu/lib/ -isystem /usr/i686-pc-linux-gnu/include -isystem /usr/i686-pc-linux-gnu/sys-include -g -mtune=pentium4 -march=pentium4 -pipe -w -O2 -O2  -g -mtune=pentium4 -march=pentium4 -pipe -w -O2 -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wold-style-definition  -isystem ./include  -fPIC -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED   -I. -I. -I../.././gcc -I/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc -I/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/. -I/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/../gcc -I/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/../include -I/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o unwind-c.o -MT unwind-c.o -MD -MP -MF unwind-c.dep -fexceptions -c /var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/../gcc/unwind-c.c -fvisibility=hidden -DHIDE_EXPORTS
In file included from /var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/../gcc/unwind-dw2.c:1555:
/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/../gcc/unwind.inc: In function '_Unwind_ForcedUnwind':
/var/tmp/portage/sys-devel/gcc-4.4.1/work/gcc-4.4.1/libgcc/../gcc/unwind.inc:212: internal compiler error: in ix86_expand_epilogue, at config/i386/i386.c:8570
===
Comment 10 H.J. Lu 2009-09-19 21:42:42 UTC
(In reply to comment #9)
> Created an attachment (id=18608) [edit]
> sse & 32bit -> -mstackrealign (example only!)
> 
> Previous my ideas too heavy. :)
> IMHO native solution for this problem is "-mstackrealign". It solving problems
> with known to me packages (including zlib). I trying to make
> STACK_REALIGN_DEFAULT related from TARGET_SSE && !TARGET_64BIT (see patch). But
> got "internal compiler error" on gcc self-compiling with "-march=pentium4".
> Without sse (=without -mstackrealign) self-compiling works.
> 
> Why -mstackrealign may be bad and why gcc dont' self-compiling so?
> 

Stack alignment in unwind library is unsupported.
Comment 11 Dzianis Kahanovich 2009-09-25 11:03:11 UTC
Created attachment 18650 [details]
sse & 32bit -> -mstackrealign (example 2)

Second attempt (while against 4.4.1, sorry). Working with any -march.

For gcc libc -mstackrealign disabled. Tell me:

1) how to isolate cflags for unwind-dw2.c or unwind library or stage1 (exclude preset STAGE1_LIBCFLAGS, etc before make)?

2) are misaligning may be happened in internal libs and when? (to "-mno-sse" or question 1)

3) are 3DNOW! must be realigned too? and what other sse-independent sets/targets must be realigned?
Comment 12 Dzianis Kahanovich 2009-10-09 14:07:30 UTC
I found new AMD CPUs not required to stack aligning for SSE. IMHO there are "misalignsse" cpu feature, found near SSE4a (fixme). Then, requirement for stack realigning may be checked from "misalignsse" (precise) or "sse4a" (simple).

I don't want (while) to exeperiments with make scripts for isolating "-mno-stackrealign" CFLAGS for unwinding library and only suggest two variants of common solution:
1) Checking "misalignsse" for -march=native, -mno-stackrealign (if given) may produce broken code with SSE, gcc libs not verifyed against it.
2) Use TARGET_SSE4A as "cname" to misalignsse to detect realigning requirement. Also trying to disable SSE if realigning forced to disable (-mno-stackrealign given). SSE for gcc libs will be disabled.

This solutions may be mixed, but I see no sense. Or SSE4A is used or no. If no - I not found way to move "misalignsse" flag to configuration section. Just may be sense to remove SSE disabling in (2) - just remove i386.c changes.

IMHO last todo in this case: unwinding library (CFLAGS or -mstackrealign support).

PS Both are untested. But sse-stackrealign-1.patch have not problems for me.

Both are
Comment 13 Dzianis Kahanovich 2009-10-09 14:08:52 UTC
Created attachment 18761 [details]
1) sse-stackrealign-misalignsse-4.4.patch
Comment 14 Dzianis Kahanovich 2009-10-09 14:09:32 UTC
Created attachment 18762 [details]
2) sse-stackrealign-sse4a-4.4.patch
Comment 15 Dzianis Kahanovich 2009-10-09 14:10:16 UTC
Created attachment 18763 [details]
1) sse-stackrealign-misalignsse-4.5.patch
Comment 16 Dzianis Kahanovich 2009-10-09 14:10:58 UTC
Created attachment 18764 [details]
2) sse-stackrealign-sse4a-4.5.patch
Comment 17 Dzianis Kahanovich 2009-10-09 14:39:30 UTC
Created attachment 18765 [details]
(2) for 4.4, fixed
Comment 18 Dzianis Kahanovich 2009-10-09 14:41:01 UTC
Created attachment 18766 [details]
(2) for 4.5, fixed

Originals may produce illegal warnings without SSE.
Comment 19 H.J. Lu 2009-10-09 14:43:55 UTC
Without a testcase, people may not review the patch.
Comment 20 Dzianis Kahanovich 2009-10-10 13:48:14 UTC
(In reply to comment #19)
> Without a testcase, people may not review the patch.
> 

May be just include your tests from Bug 40838? (even without testing)
From http://gcc.gnu.org/bugzilla/attachment.cgi?id=18656
Your suggestions?

I understand so there are same results with your gcc-4.4-pr40838-*.patch, but realigning whole code and exclude "misaligned" AMD CPUs. And warrantied to realign in any place.

Your gcc-4.4-pr40838-3.patch lost realigning in unknown place - runtime segfaults in Seamonkey. *-4.patch I don't test becouse alredy use simple "-mstackrealign" solution - first there are dummy testing - surfing with Seamonkey until sometimes segfault, second - I got new desktop with Athlon (but was alredy satisfyed by "-mstackrealign" on old Celeron). Now I prefer to use safe "-mstackrealign" solution for Intel CPUs on 32bit servers and to be sure in safe SSE code. And see no visual defferences in perfomance whith global "-mstackrealign".
Comment 21 Dzianis Kahanovich 2009-10-10 13:51:12 UTC
Created attachment 18773 [details]
tests proposal from PR 40838 (from H.J. Lu)

Are there are good?
Comment 22 H.J. Lu 2009-10-10 14:46:03 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Without a testcase, people may not review the patch.
> > 
> 
> May be just include your tests from Bug 40838? (even without testing)
> From http://gcc.gnu.org/bugzilla/attachment.cgi?id=18656
> Your suggestions?
> 
> I understand so there are same results with your gcc-4.4-pr40838-*.patch, but
> realigning whole code and exclude "misaligned" AMD CPUs. And warrantied to
> realign in any place.
> 
> Your gcc-4.4-pr40838-3.patch lost realigning in unknown place - runtime
> segfaults in Seamonkey. *-4.patch I don't test becouse alredy use simple

We can't check Seamonkey into gcc/testsuite. We need something
much smaller. 

> "-mstackrealign" solution - first there are dummy testing - surfing with
> Seamonkey until sometimes segfault, second - I got new desktop with Athlon (but
> was alredy satisfyed by "-mstackrealign" on old Celeron). Now I prefer to use
> safe "-mstackrealign" solution for Intel CPUs on 32bit servers and to be sure
> in safe SSE code. And see no visual defferences in perfomance whith global
> "-mstackrealign".
> 

If you think adding -mstackrealign to your build is an acceptable solution,
we can close this bug. Otherwise, please test my latest gcc 4.4 patch for
PR 40838.  Thanks.
Comment 23 Dzianis Kahanovich 2009-10-12 11:40:57 UTC
(In reply to comment #22)

> We can't check Seamonkey into gcc/testsuite. We need something
> much smaller. 

I understand this. But even if I will use your testsuite addons (I unsure in it, Gentoo "USE=test emerge sys-deve/gcc" on unpatched gcc & Celeron don't breaks), unknown bug from at least old *-3.patch will be present. This is like communicative problem, but problem...

> If you think adding -mstackrealign to your build is an acceptable solution,
> we can close this bug. Otherwise, please test my latest gcc 4.4 patch for
> PR 40838.  Thanks.

Your patch, by idea, more accurate - if now realign stack in ALL requred places. But also it unmotivated realign with AMDs too. I alredy ask: "Why -mstackrealign may be bad?". This is dark place. In manpage "-mstackrealign" described against SSE aligning problem. This is like wellknown solution, but unused wide. Why? What side-effects of "-mstackrealign"? Perfomance? Compatibility? Just ABI standard? Related from this "-mstackrealign" may be hardcoded for SSE in simple way - just "SSE -> -mstackrealign" or overcoded patch to disable -mstackrealign for AMDs. Or unuse whole. I will not fast testing your patch on old Celeron (after few days) to close this theme too, but for final solution IMHO information about -mstackrealign [history] required.
Comment 24 H.J. Lu 2009-10-12 13:59:49 UTC
(In reply to comment #23)
> > If you think adding -mstackrealign to your build is an acceptable solution,
> > we can close this bug. Otherwise, please test my latest gcc 4.4 patch for
> > PR 40838.  Thanks.
> 
> Your patch, by idea, more accurate - if now realign stack in ALL requred
> places. But also it unmotivated realign with AMDs too. I alredy ask: "Why
> -mstackrealign may be bad?". This is dark place. In manpage "-mstackrealign"
> described against SSE aligning problem. This is like wellknown solution, but
> unused wide. Why? What side-effects of "-mstackrealign"? Perfomance?

Please see PR 40838.
Comment 25 Ryan Hill 2009-10-12 22:48:15 UTC
would you please just test the patch on PR 40838 and tell HJ if it works or not?
Comment 26 Dzianis Kahanovich 2009-10-13 12:16:14 UTC
(In reply to comment #25)
> would you please just test the patch on PR 40838 and tell HJ if it works or
> not?
> 

In progress. There are only one Celeron PC leased without distcc to build and runtime test big packages.
Comment 27 Dzianis Kahanovich 2009-10-13 13:26:13 UTC
(In reply to comment #24)

> > unused wide. Why? What side-effects of "-mstackrealign"? Perfomance?
> 
> Please see PR 40838.

As seen on... a...
I found only ABI standard reasons. FIXME!
But SSE usage still optional like -mstackrealign option. I see no political differences: "-msse" or "-march=pentium4" or "-mstackrealign". Default "i686" builds will be same unaligned.
Comment 28 Dzianis Kahanovich 2009-11-27 12:45:58 UTC
Yes, I read PR 40838. But last (IMHO) in this thread:

Disabling SSE in whole GCC libs may cause various build problems in SSE-related -march (mostly with -ffast-math, but IMHO more). So, to use this simple solution, SSE disabling code better to remove. If still paranoid about GCC libs SSE compatibility - better to build GCC with "-march=i686". Next stripped patches:
Comment 29 Dzianis Kahanovich 2009-11-27 12:49:42 UTC
Created attachment 19162 [details]
(2) for 4.4
Comment 30 Dzianis Kahanovich 2009-11-27 12:51:34 UTC
Created attachment 19163 [details]
(2) for 4.5
Comment 31 Mark Mitchell 2010-02-17 16:49:37 UTC
I still have no idea what this PR is about.  Someone needs to make a clear statement of what they believe the ABI to be.  There are some simple questions:

* Can we expect that the stack is 16-byte aligned, or not?

* If we cannot (because of legacy requirements), then is there any reason to try to align the stack automatically at entry to every function, or at every call site?  I can see that this might be useful as an option, but it does not seem like a sensible default to me.
Comment 32 H.J. Lu 2010-02-17 17:09:15 UTC
(In reply to comment #31)
> I still have no idea what this PR is about.  Someone needs to make a clear
> statement of what they believe the ABI to be.  There are some simple questions:
> 
> * Can we expect that the stack is 16-byte aligned, or not?

Not if we want to be binary compatible with object files compiled by
compilers, including gcc 1.x/2.x/3.x, which generate 4byte stack alignment.

> 
> * If we cannot (because of legacy requirements), then is there any reason to
> try to align the stack automatically at entry to every function, or at every
> call site?  I can see that this might be useful as an option, but it does not
> seem like a sensible default to me.
> 

See PR 40838.  "-mstackrealign" now aligns the stack only if SSE vector
instructions are used in the function. The performance impact of
"-mstackrealign" is minimum:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838#c65
Comment 33 joseph@codesourcery.com 2010-02-17 17:24:30 UTC
Subject: Re:  [4.4/4.5 Regression] zlib segfault in
 inflate_table() compiled w/ -O -msse2 ftree-vectorize

I believe the ABI *is* that the stack must be 16-byte aligned at function 
boundaries, although this is not documented other than as a statement of 
what GCC does 
<http://groups.google.com/group/ia32-abi/browse_thread/thread/4f9b3e5069943bf1>.  
This does not of course stop smaller alignment being used for particular 
code by private arrangement as long as 16-byte alignment is used when 
calling back into user code (for example, it's fine for glibc to build 
code not using callbacks with smaller alignment, or for the Linux kernel 
to use smaller alignment internally as long as it ensures the full 
alignment for signal frames in userspace processes).

Comment 34 Jakub Jelinek 2010-02-17 19:31:20 UTC
Yeah, I completely agree with that.
Comment 35 Alexander Bezrukov 2010-06-10 19:08:00 UTC
Hello,

I just upgraded to gcc-4.4.3 (from Gentoo distribution) and recompiled the whole system (on x86). Then I had to discover the (as it turned to be, infamous) mozilla-firefox + zlib bug. I reported it to the distribution here:
http://bugs.gentoo.org/show_bug.cgi?id=323431

As I learned later, everybody accounts for incoming misaligned stack to be the cause of the crashes. But what I see:

inflate_table:
.LFB45:
        .file 1 "inftrees.c"
        .loc 1 39 0
.LVL0:
        pushl   %ebp     ; stack misaligned to -4
.LCFI0:
        .loc 1 108 0
        pxor    %xmm0, %xmm0
        .loc 1 39 0
        movl    %esp, %ebp ; $ebp % 16 = 4
.LCFI1:
        pushl   %edi
.LCFI2:
        pushl   %esi
.LCFI3:
        pushl   %ebx
.LCFI4:
        call    .L101
.L101:
        popl    %ebx
        addl    $_GLOBAL_OFFSET_TABLE_+[.-.L101], %ebx
        subl    $188, %esp
.LCFI5:
        .loc 1 108 0
        movdqa  %xmm0, -56(%ebp) ; -56-4=60, 60%16=4
; <snip>

Even if the stack were 16 bytes aligned on the entry, the instruction in the last line would attempt an unaligned access and thus would fault. Please correct me, if I miscounted.

My concern is: even with ABI which guarantees 16-byte aligned incoming stacks, gcc generates code which would segfault.
Comment 36 Ed Catmur 2010-06-10 21:28:52 UTC
Alexander, you're omitting to consider that call pushes EIP + 4 onto the stack.  Thus on entry the stack is already misaligned by -4, so gcc is correct.
Comment 37 Alexander Bezrukov 2010-06-11 01:43:43 UTC
Thank you, Ed. I missed that. I wrongly (obviously wrongly, because this would negatively affect performance) thought that ABI is such that stack is aligned after the call, not before.
Comment 38 Richard Biener 2010-09-30 11:13:44 UTC
The conclusion seems to be WONTFIX (or NOT-A-BUG).  Getting it off the
serious regressions radar.