Bug 32893 - zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
Summary: zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 40838 33721 35653
Blocks: 41156
  Show dependency treegraph
 
Reported: 2007-07-25 20:22 UTC by Ryan Hill
Modified: 2009-08-26 05:23 UTC (History)
10 users (show)

See Also:
Host: i686-linux-gnu
Target: i686-linux-gnu
Build: i686-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Preprocessed inffast.c. Compile with -O2 -msse2 -ftree-vectorize (10.46 KB, application/x-tar)
2007-09-20 06:51 UTC, Uroš Bizjak
Details
zlib testcase(zlib's inftrees.c) (4.33 KB, text/plain)
2007-09-20 10:43 UTC, Katamine Kentaro
Details
inftrees.c (576 bytes, text/plain)
2007-09-23 05:59 UTC, Ryan Hill
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Hill 2007-07-25 20:22:40 UTC
(In reply to comment #17)
> This looks like an unrelated problem - the vectorizer does not perform loop
> peeling here so it's not an issue of natural alignment. Lets open a separate PR
> for this one, unless there's already one open. In the meantime, would you
> please try this patch?:
> 
> Index: tree-vectorizer.c
> ===================================================================
> *** tree-vectorizer.c   (revision 126902)
> --- tree-vectorizer.c   (working copy)
> *************** vect_can_force_dr_alignment_p (tree decl
> *** 1527,1533 ****
>          PREFERRED_STACK_BOUNDARY is honored by all translation units.
>          However, until someone implements forced stack alignment, SSE
>          isn't really usable without this.  */
> !     return (alignment <= PREFERRED_STACK_BOUNDARY);
>   }
> 
> 
> --- 1527,1533 ----
>          PREFERRED_STACK_BOUNDARY is honored by all translation units.
>          However, until someone implements forced stack alignment, SSE
>          isn't really usable without this.  */
> !     return (alignment <= STACK_BOUNDARY);
>   }
> 

Hey Dorit.  With this patch zlib appears to compile successfully.  The loop is vectorized with an "alignment of access forced using peeling" note and linked apps no longer segfault.

I also tested using Andrew's patch from bug #16660 and always returning true in vect_can_force_dr_alignment_p but it does not fix this error.

Let me know if I can provide any other info that would be useful to you.
Comment 1 dorit 2007-07-25 20:43:12 UTC
thanks a lot for checking both patches!

> With this patch zlib appears to compile successfully.  The loop is
> vectorized with an "alignment of access forced using peeling" note and linked
> apps no longer segfault.

I'd like to try to verify if the problem is indeed related to the STACK_BOUNDARY, or whether this has to do with some weird interplay with the compilation of some other function, possibly after inlining (i.e. something like what we had in PR27770). I'm not sure how to suggest to check that...

> I also tested using Andrew's patch from bug #16660 and always returning true in
> vect_can_force_dr_alignment_p but it does not fix this error.

Andrew, makes sense to you?

> Let me know if I can provide any other info that would be useful to you.

thanks, I'll think about it...
Comment 2 Andrew Pinski 2007-07-25 20:45:01 UTC
> Andrew, makes sense to you?
I think my patch only checks PREFERRED_STACK_BOUNDARY and not STACK_BOUNDARY which is why it does not work but I have not looked into it at all.
Comment 3 dorit 2007-07-28 21:03:51 UTC
(In reply to comment #2)
> > Andrew, makes sense to you?
> I think my patch only checks PREFERRED_STACK_BOUNDARY and not STACK_BOUNDARY
> which is why it does not work but I have not looked into it at all.

I see references in the patch to both PREFERRED_STACK_BOUNDARY and STACK_BOUNDARY. Could you please check which of these needs to be fixed? (cause I think your fix is the more desirable one). (just for the record, the link to the patch in question is here:  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c21)
Comment 4 dorit 2007-08-01 11:36:10 UTC
Also just for the record - the testcase for this PR is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c14
Comment 5 dorit 2007-08-01 11:57:18 UTC
Ryan, I wonder what happens if you force alignment in the source code, like so:

unsigned short count[MAXBITS+1] __attribute__ ((__aligned__(16))) ;    

In this case the vectorizer does not change the alignment of the array. I wonder if the compiler honors the alignment attribute when the user asks for it, rather than the vectorizer. 
Comment 6 Katamine Kentaro 2007-08-14 17:46:26 UTC
It looks like 
zlib compiled w/ -O -msse -ftree-vectorize (built with fedora's rpm package gcc-4.1.2-17) 
has same problem.

In my environment, rpm-4.4.2.1-7.fc8 and seamonkey-1.1.3-6.fc8 segfault like below:

Program received signal SIGSEGV, Segmentation fault.
0x003a869d in inflate_table (type=CODES, lens=0x913b5c8, codes=19,
    table=0x913b5c4, bits=0x913b5ac, work=0x913b848) at inftrees.c:108
108             count[len] = 0;
Comment 7 dorit 2007-09-19 14:28:51 UTC
(In reply to comment #6)
> It looks like 
> zlib compiled w/ -O -msse -ftree-vectorize (built with fedora's rpm package
> gcc-4.1.2-17) 
> has same problem.
> In my environment, rpm-4.4.2.1-7.fc8 and seamonkey-1.1.3-6.fc8 segfault like
> below:
> Program received signal SIGSEGV, Segmentation fault.
> 0x003a869d in inflate_table (type=CODES, lens=0x913b5c8, codes=19,
>     table=0x913b5c4, bits=0x913b5ac, work=0x913b848) at inftrees.c:108
> 108             count[len] = 0;

could you please provide a complete (reduced...) testcase that could be used to reproduce this? 
In the meantime, other things that may help:
- could you please try to add "__attribute__ ((__aligned__(16)))" to the definition of count, as suggested in comment 5?
- could you please show the relevant generated assembly up to the offending insn?  (with and without the attribute aligned)? could you also check (with gdb) what is the address accessed and what is the address of the stack pointer?
Comment 8 Uroš Bizjak 2007-09-20 06:49:41 UTC
I can't confirm the failure (make check from zlib works OK for me), but there is some suspicious code in inflate_fast(). When compiling (to be attached) preprocessed testcase (using -O2 -msse2 -ftree-vectorize on i686), I got:

 8052711:	31 d2                	xor    %edx,%edx
 8052713:	83 c2 01             	add    $0x1,%edx
 8052716:	f3 0f 6f 41 10       	movdqu 0x10(%ecx),%xmm0
 805271b:	f3 0f 6f 51 20       	movdqu 0x20(%ecx),%xmm2
 8052720:	f3 0f 6f 09          	movdqu (%ecx),%xmm1
 8052724:	66 0f 7f 40 11       	movdqa %xmm0,0x11(%eax)
 8052729:	66 0f 7f 48 01       	movdqa %xmm1,0x1(%eax)
 805272e:	66 0f 7f 50 21       	movdqa %xmm2,0x21(%eax)
 8052733:	83 c1 30             	add    $0x30,%ecx
 8052736:	83 c0 30             	add    $0x30,%eax

These lines correspond to line 235 of inffast.c:

                    }
                    while (len > 2) {
                        PUP(out) = PUP(from);
                        PUP(out) = PUP(from);
                        PUP(out) = PUP(from);
                        len -= 3;
                    }

These lines are preprocessed into:

                    while (len > 2) {
                        *++(out) = *++(from);
                        *++(out) = *++(from);
                        *++(out) = *++(from);
                        len -= 3;
                    }

Comment 9 Uroš Bizjak 2007-09-20 06:51:40 UTC
Created attachment 14226 [details]
Preprocessed inffast.c. Compile with -O2 -msse2 -ftree-vectorize
Comment 10 Katamine Kentaro 2007-09-20 10:43:20 UTC
Created attachment 14228 [details]
zlib testcase(zlib's inftrees.c)
Comment 11 Uroš Bizjak 2007-09-20 11:58:06 UTC
I hope that following testcase can be of some use to somebody...

--cut here--
void __attribute__((noinline))
inflate_fast (unsigned char *from, unsigned char *out, unsigned int len)
{
  while (len > 2)
    {
      *++(out) = *++(from);
      *++(out) = *++(from);
      *++(out) = *++(from);
      len -= 3;
    }

  return;
}

int main()
{
  unsigned int len = 13;

  unsigned char x[13] = "Hello there! ...Hello there! ...Hello there! ...";
  unsigned char y[13] = " ";

  inflate_fast (x, y, len);

  printf ("%s\n", y);
  return 0;
}
--cut here--

This testcase produces movdqa with 1byte offset, but I'm not able to trigger segfault with it, although it produces asm code as in comment #18.
Comment 12 Uroš Bizjak 2007-09-22 10:26:55 UTC
(In reply to comment #10)
> Created an attachment (id=14228) [edit]
> zlib testcase(zlib's inftrees.c)

Sorry, I can't reproduce the segfault with current mainline. Please provide self-contained test that segfaults, or exact instructions how to produce segfault from public accessible sources.

example.c (part of zlib testsuite) runs OK with -O2 -msse2 -ftree-vectorize on i686 and x86_64.
Comment 13 Katamine Kentaro 2007-09-22 11:28:28 UTC
(In reply to comment #12)
Hmm..., but in my environment, some applications always segfaults at zlib which was built with -O -msse -ftree-vectorize.

I'm sorry, but now I recognize that I don't have enough ability to do bug-report.
Please ignore me.
Sorry for mess that I've caused.
Comment 14 Ryan Hill 2007-09-23 05:59:50 UTC
Created attachment 14246 [details]
inftrees.c

this is the testcase from bug #25413.  with -O2 -msse2 -ftree-vectorize, i get this in gcc-4.2.0:

inftrees.o:     file format elf32-i386

Disassembly of section .text:

00000000 <inflate_table>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 24                sub    $0x24,%esp
   7:   8b 5d 0c                mov    0xc(%ebp),%ebx
   a:   8b 4d 10                mov    0x10(%ebp),%ecx
   d:   66 0f ef c0             pxor   %xmm0,%xmm0
  11:   66 0f 7f 45 d8          movdqa %xmm0,-0x28(%ebp)
  16:   66 0f 7f 45 e8          movdqa %xmm0,-0x18(%ebp)
  1b:   85 c9                   test   %ecx,%ecx
  1d:   74 16                   je     35 <inflate_table+0x35>
  1f:   ba 00 00 00 00          mov    $0x0,%edx
  24:   0f b7 04 53             movzwl (%ebx,%edx,2),%eax
  28:   66 83 44 45 d8 01       addw   $0x1,-0x28(%ebp,%eax,2)
  2e:   83 c2 01                add    $0x1,%edx
  31:   39 ca                   cmp    %ecx,%edx
  33:   75 ef                   jne    24 <inflate_table+0x24>
  35:   b8 00 00 00 00          mov    $0x0,%eax
  3a:   8d 55 d8                lea    -0x28(%ebp),%edx
  3d:   66 83 7c 42 1e 00       cmpw   $0x0,0x1e(%edx,%eax,2)
  43:   75 08                   jne    4d <inflate_table+0x4d>
  45:   83 e8 01                sub    $0x1,%eax
  48:   83 f8 f1                cmp    $0xfffffff1,%eax
  4b:   75 f0                   jne    3d <inflate_table+0x3d>
  4d:   83 c4 24                add    $0x24,%esp
  50:   5b                      pop    %ebx
  51:   5d                      pop    %ebp
  52:   c3                      ret

forcing alignment as in comment #5 results in:

inftrees-align.o:     file format elf32-i386

Disassembly of section .text:

00000000 <inflate_table>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 24                sub    $0x24,%esp
   7:   8b 5d 0c                mov    0xc(%ebp),%ebx
   a:   8b 4d 10                mov    0x10(%ebp),%ecx
   d:   b8 01 00 00 00          mov    $0x1,%eax
  12:   8d 55 d8                lea    -0x28(%ebp),%edx
  15:   66 c7 44 42 fe 00 00    movw   $0x0,-0x2(%edx,%eax,2)
  1c:   83 c0 01                add    $0x1,%eax
  1f:   83 f8 11                cmp    $0x11,%eax
  22:   75 f1                   jne    15 <inflate_table+0x15>
  24:   85 c9                   test   %ecx,%ecx
  26:   74 16                   je     3e <inflate_table+0x3e>
  28:   ba 00 00 00 00          mov    $0x0,%edx
  2d:   0f b7 04 53             movzwl (%ebx,%edx,2),%eax
  31:   66 83 44 45 d8 01       addw   $0x1,-0x28(%ebp,%eax,2)
  37:   83 c2 01                add    $0x1,%edx
  3a:   39 ca                   cmp    %ecx,%edx
  3c:   75 ef                   jne    2d <inflate_table+0x2d>
  3e:   b8 00 00 00 00          mov    $0x0,%eax
  43:   8d 55 d8                lea    -0x28(%ebp),%edx
  46:   66 83 7c 42 1e 00       cmpw   $0x0,0x1e(%edx,%eax,2)
  4c:   75 08                   jne    56 <inflate_table+0x56>
  4e:   83 e8 01                sub    $0x1,%eax
  51:   83 f8 f1                cmp    $0xfffffff1,%eax
  54:   75 f0                   jne    46 <inflate_table+0x46>
  56:   83 c4 24                add    $0x24,%esp
  59:   5b                      pop    %ebx
  5a:   5d                      pop    %ebp
  5b:   c3                      ret

there's a gdb log for the segfault in firefox @ http://gcc.gnu.org/bugzilla/attachment.cgi?id=13966

i'll try to find something a little smaller than mozilla that can demonstrate this problem since i still suck at testcases.

FWIW, i've been running GCC-4.2 svn with the patch at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c17 for a couple months now and have built a sizable chunk of our package repository with -ftree-vectorize enabled several times over and have yet to run into any trouble whatsoever.
Comment 15 Ryan Hill 2007-09-23 06:23:25 UTC
i should also mention that the zlib testsuite doesn't trigger this bug for some reason.  many other applications that link with zlib are also unaffected.  some that are include firefox/mozilla/thunderbird/seamonkey/xulrunner, rpm (notably rpm2cpio), openoffice...  some have reported that opening the GTK file selector dialog triggers it but i haven't encountered that and suspect it was bug #25413.
Comment 16 dorit 2007-10-03 18:52:59 UTC
Ryan, thanks a lot for the info. FYI, I started a discussion about this here: http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00202.html

Comment 17 dorit 2007-10-30 05:25:23 UTC
Subject: Bug 32893

Author: dorit
Date: Tue Oct 30 05:25:10 2007
New Revision: 129764

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129764
Log:
        PR tree-optimization/32893
        * tree-vectorize.c (vect_can_force_dr_alignment_p): Check
        STACK_BOUNDARY instead of PREFERRED_STACK_BOUNDARY.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-77-global.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-78-global.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-25.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-13.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-17.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-18.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-19.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-2.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-20.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-21.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-22.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-27.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-29.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-3.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-34.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-36.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-4.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-5.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-6.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-64.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-65.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-66.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-68.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-7.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-72.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-73.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-76.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-77.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-78.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-86.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-all.c
    trunk/gcc/testsuite/gcc.dg/vect/vect.exp
    trunk/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
    trunk/gcc/testsuite/lib/target-supports.exp
    trunk/gcc/tree-vectorizer.c

Comment 18 Richard Biener 2008-01-25 23:44:29 UTC
Fixed?
Comment 19 dorit 2008-01-28 13:20:12 UTC
> Fixed?

In a way, yes. The problem is avoided by generating too conservative code. AFAIU, a better solution may be expected in 4.4 from the stack alignment branch. In any case this segfault PR can be closed, and instead a missed optimization PR could be opened.
Comment 20 H.J. Lu 2008-02-07 00:04:35 UTC
FYI, stack alignment branch will look like

  if (TREE_STATIC (decl))
    return (alignment <= MAX_OFILE_ALIGNMENT);
  else if (MAX_VECTORIZE_STACK_ALIGNMENT)
    {    
      gcc_assert (!cfun->stack_realign_processed);
      if (alignment <= MAX_VECTORIZE_STACK_ALIGNMENT)
        {
          if (cfun->stack_alignment_needed < alignment)
            cfun->stack_alignment_needed = alignment;
          return true;
        }
      else 
        return false;
    }    
  else 
    return (alignment <= STACK_BOUNDARY); 

MAX_VECTORIZE_STACK_ALIGNMENT is defined as BIGGEST_ALIGNMENT. Do we need
stack alignment larger than BIGGEST_ALIGNMENT for vectorizer?
Comment 21 H.J. Lu 2008-02-07 22:15:33 UTC
The real problem is i386 gcc expects 16 byte stack boundary while the
psABI specifies 4 byte. When you link a callee, which expects incoming
stack at 16 byte boundary, with a caller, which only guarantees 4 byte
stack boundary, you will run into problem. The stack alignment branch can
automatically align the incoming stack. But we have to generate 16byte
stack boundary when calling a function by default since callees compiled by
older gcc expects 16 byte stack boundary.  Since incoming stack is aligned
at 16byte by default, the stack alignment branch still assumes incoming
stack aligned at 16 byte by default. You can override it with -mstackrealign
or force_align_arg_pointer attribute, which will assume incoming stack is
aligned at 4 byte.
Comment 22 H.J. Lu 2009-08-06 16:46:46 UTC
*** Bug 40985 has been marked as a duplicate of this bug. ***
Comment 23 Dzianis Kahanovich 2009-08-10 17:48:14 UTC
(In reply to comment #17)

> URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129764
> Log:
>         PR tree-optimization/32893
>         * tree-vectorize.c (vect_can_force_dr_alignment_p): Check
>         STACK_BOUNDARY instead of PREFERRED_STACK_BOUNDARY.

In 4.4 STACK_BOUNDARY changed to MAX_STACK_ALIGNMENT in this place.
MAX_STACK_ALIGNMENT is STACK_BOUNDARY "by default", but in i386.h:
#define MAX_STACK_ALIGNMENT MAX_OFILE_ALIGNMENT

I not check code dependences more, but starting from 4.4.0 this error alive (Bug 40985). Are there are regression or something else?

PS May be reopen this bug with other version or Bug 40985 as regression report?