This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: PR target/40838: gcc shouldn't assume that the stack is aligned



On Mon, 24 Aug 2009, H.J. Lu wrote:

> On Fri, Aug 7, 2009 at 3:30 PM, H.J. Lu<hjl.tools@gmail.com> wrote:
> > On Fri, Aug 7, 2009 at 5:53 AM, H.J. Lu<hjl.tools@gmail.com> wrote:
> >> On Fri, Aug 7, 2009 at 12:13 AM, Jakub Jelinek<jakub@redhat.com> wrote:
> >>> On Fri, Aug 07, 2009 at 02:54:46AM +0200, Mikulas Patocka wrote:
> >>>> > > In 32bit, the incoming stack may not be 16 byte aligned. ?This patch
> >>>> > > assumes the incoming stack is 4 byte aligned and realigns stack if any
> >>>> > > SSE variable is put on stack. Any comments?
> >>>> >
> >>>> > IMHO this is wrong, I could live with a non-default option for those who
> >>>> > don't care about performance and think a SCO document from 1996 has any
> >>>> > relevance to Linux these days. ?In reality a Linux ABI for years assumes
> >>>> > 16 byte stack alignment for 32-bit code.
> >>>>
> >>>> Tell me which Linux distribution did you run with 16-byte stack alignment
> >>>> checking (as proposed in bug 40838) and what was the result?
> >>>>
> >>>> For me, the result was that 75% of binaries in /bin in Debian Lenny do not
> >>>> align the stack on 16-byte boundary.
> >>>
> >>> Besides the obstack glibc bug which has been fixed since then you haven't
> >>> reported anything particular. ?It is true that parts of i?86 glibc is
> >>> compiled with -mpreferered-stack-boundary=2, but only parts that don't call
> >>> callbacks. ?Async signals AFAIK will align the stack properly.
> >>>
> >>> I simply don't trust your 75% claim, lots of stuff would break if things
> >>> weren't aligned properly.
> >>>
> >>
> >> From gcc 3.4:
> >>
> >> ?/* Validate -mpreferred-stack-boundary= value, or provide default.
> >> ? ? The default of 128 bits is for Pentium III's SSE __m128, but we
> >> ? ? don't want additional code to keep the stack aligned when
> >> ? ? optimizing for code size. ?*/
> >> ?ix86_preferred_stack_boundary = (optimize_size
> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TARGET_64BIT ? 128 : 32
> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : 128);
> >>
> >> If you compile code with -Os, you will get 4 byte stack alignment.
> >> Just step back, we changed stack alignment from 4 byte to 16byte
> >> for SSE since we couldn't realign stack at the time. Now we can
> >> realign the stack very efficiently. I think we should do it for SSE
> >> to support the existing Linux binaries which have 4 byte stack
> >> alignment. If it helps, I can compare -m32 -O3 -msse2 -mfp-math=sse
> >> results with SPEC CPU 2006, before and after my patch.
> >>
> >
> > Here are the differences of -m32 -O3 -msse2 -mfpmath=sse -ffast-math
> > -funroll-loops
> > before and after my patch:
> >
> > 400.perlbench ? ? ? ? ? ? ? ? ? ?-0.384615%
> > 401.bzip2 ? ? ? ? ? ? ? ? ? ? ? ?0%
> > 403.gcc ? ? ? ? ? ? ? ? ? ? ? ? ?-0.362319%
> > 429.mcf ? ? ? ? ? ? ? ? ? ? ? ? ?-0.813008%
> > 445.gobmk ? ? ? ? ? ? ? ? ? ? ? ?0.921659%
> > 456.hmmer ? ? ? ? ? ? ? ? ? ? ? ?0.549451%
> > 458.sjeng ? ? ? ? ? ? ? ? ? ? ? ?-0.438596%
> > 462.libquantum ? ? ? ? ? ? ? ? ? 0%
> > 464.h264ref ? ? ? ? ? ? ? ? ? ? ?0%
> > 471.omnetpp ? ? ? ? ? ? ? ? ? ? ?-0.478469%
> > 473.astar ? ? ? ? ? ? ? ? ? ? ? ?-0.645161%
> > 483.xalancbmk ? ? ? ? ? ? ? ? ? ?-0.727273%
> > SPECint(R)_base2006 ? ? ? ? ? ? ? ? ? ? ?-0.411523%
> > 410.bwaves ? ? ? ? ? ? ? ? ? ? ? -0.406504%
> > 416.gamess ? ? ? ? ? ? ? ? ? ? ? 0%
> > 433.milc ? ? ? ? ? ? ? ? ? ? ? ? -1.36986%
> > 434.zeusmp ? ? ? ? ? ? ? ? ? ? ? -0.44843%
> > 435.gromacs ? ? ? ? ? ? ? ? ? ? ?0%
> > 436.cactusADM ? ? ? ? ? ? ? ? ? ?0%
> > 437.leslie3d ? ? ? ? ? ? ? ? ? ? -0.888889%
> > 444.namd ? ? ? ? ? ? ? ? ? ? ? ? 1.20482%
> > 447.dealII ? ? ? ? ? ? ? ? ? ? ? -0.350877%
> > 450.soplex ? ? ? ? ? ? ? ? ? ? ? -0.31746%
> > 453.povray ? ? ? ? ? ? ? ? ? ? ? 0.458716%
> > 454.calculix ? ? ? ? ? ? ? ? ? ? 0%
> > 459.GemsFDTD ? ? ? ? ? ? ? ? ? ? 0%
> > 465.tonto ? ? ? ? ? ? ? ? ? ? ? ?0%
> > 470.lbm ? ? ? ? ? ? ? ? ? ? ? ? ?0%
> > 481.wrf ? ? ? ? ? ? ? ? ? ? ? ? ?0.480769%
> > 482.sphinx3 ? ? ? ? ? ? ? ? ? ? ?0.940439%
> > SPECfp(R)_base2006 ? ? ? ? ? ? ? ? ? ? ? 0%
> >
> > I think we should align stack if SSE variables are put on stack.
> >
> 
> Darwin ia32 psABI specifies 16byte stack alignment and enforces it
> with
> 
> #define PREFERRED_STACK_BOUNDARY                        \
>   MAX (STACK_BOUNDARY, ix86_preferred_stack_boundary)
> 
> On other ia32 targets, 4byte outgoing stack alignment is
> correct and allowed. My patch assumes 4 byte incoming
> stack alignment only when SSE variables are put on stack.
> Automatic stack alignment implementation is quite efficient.
> Its performance impact is very limited as show in SPEC CPU
> 2006 results. It also fixed a regression:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156
> 
> OK for trunk?
> 
> Thanks.
> 
> -- 
> H.J.

I tried this patch for 4.4.1 (that one posted in bugzilla 40838) with 
seamonkey 1.1.17 and it relly misses alignment in some parts of it. I used 
CFLAGS and CXXFLAGS "-O3 -fomit-frame-pointer -frename-registers 
-march=barcelona"

I found the functions that miscompile, the trivial way to find them is to 
compile seamonkey, do find . -name "*.o"|xargs objdump -d|less, search for 
"movdqa.*esp" or "movaps.*esp" (or even "esp.*xmm" and "xmm.*esp", but it 
gives more false positives) and check the function that it has aligment 
code. There are a few functions that don't. Please look at these 
functions, the idea of your patch is good, it just needs to be fixed.

BTW. gentoo documentation lists -O3 as problematic flag that leads to 
instability with gcc 4 (they don't way why, but the most likely reason is 
this alignment issue), it would be good if we could fix it and allow 
people to use SSE.

Mikulas


Some of the miscompiled functions are:

pow5mult:
a00:       55                      push   %ebp
a01:       57                      push   %edi
a02:       56                      push   %esi
a03:       53                      push   %ebx
a04:       89 d3                   mov    %edx,%ebx
a06:       83 ec 6c                sub    $0x6c,%esp
...
a7d:       0f 29 44 24 10          movaps %xmm0,0x10(%esp)
a82:       e8 79 f8 ff ff          call   300 <Balloc>
a87:       85 c0                   test   %eax,%eax
a89:       89 44 24 4c             mov    %eax,0x4c(%esp)
a8d:       66 0f 6f 44 24 10       movdqa 0x10(%esp),%xmm0

PR_strtod:
10a0:       55                      push   %ebp
10a1:       57                      push   %edi
10a2:       56                      push   %esi
10a3:       53                      push   %ebx
10a4:       81 ec fc 00 00 00       sub    $0xfc,%esp
...
172c:       66 0f 6f 44 24 20       movdqa 0x20(%esp),%xmm0

PR_select:
3da0:       55                      push   %ebp
3da1:       57                      push   %edi
3da2:       56                      push   %esi
3da3:       89 ce                   mov    %ecx,%esi
3da5:       53                      push   %ebx
3da6:       89 d3                   mov    %edx,%ebx
3da8:       81 ec ec 01 00 00       sub    $0x1ec,%esp
...
3dcb:       0f 29 84 24 50 01 00    movaps %xmm0,0x150(%esp)
3dd2:       00
3dd3:       0f 29 84 24 60 01 00    movaps %xmm0,0x160(%esp)
3dda:       00
3ddb:       0f 29 84 24 70 01 00    movaps %xmm0,0x170(%esp)
3de2:       00
3de3:       0f 29 84 24 80 01 00    movaps %xmm0,0x180(%esp)
3dea:       00
3deb:       0f 29 84 24 90 01 00    movaps %xmm0,0x190(%esp)
3df2:       00

_ZL18wait_for_retrievalP13_GtkClipboardP17retrieval_context:
560:       55                      push   %ebp
561:       57                      push   %edi
562:       56                      push   %esi
563:       89 d6                   mov    %edx,%esi
565:       53                      push   %ebx
566:       81 ec bc 01 00 00       sub    $0x1bc,%esp
...
5b0:       0f 29 44 24 20          movaps %xmm0,0x20(%esp)
5b5:       0f 29 44 24 30          movaps %xmm0,0x30(%esp)
5ba:       0f 29 44 24 40          movaps %xmm0,0x40(%esp)
5bf:       0f 29 44 24 50          movaps %xmm0,0x50(%esp)
5c4:       0f 29 44 24 60          movaps %xmm0,0x60(%esp)
5c9:       0f 29 44 24 70          movaps %xmm0,0x70(%esp)
5ce:       0f 29 84 24 80 00 00    movaps %xmm0,0x80(%esp)
5d5:       00
5d6:       0f 29 84 24 90 00 00    movaps %xmm0,0x90(%esp)
5dd:       00

_ZN13XRemoteClient7GetLockEmPi:
680:       55                      push   %ebp
681:       57                      push   %edi
682:       56                      push   %esi
683:       53                      push   %ebx
684:       89 c3                   mov    %eax,%ebx
686:       81 ec 0c 02 00 00       sub    $0x20c,%esp
...
6ee:       0f 29 44 24 30          movaps %xmm0,0x30(%esp)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]