(In reply to comment #17) > This looks like an unrelated problem - the vectorizer does not perform loop > peeling here so it's not an issue of natural alignment. Lets open a separate PR > for this one, unless there's already one open. In the meantime, would you > please try this patch?: > > Index: tree-vectorizer.c > =================================================================== > *** tree-vectorizer.c (revision 126902) > --- tree-vectorizer.c (working copy) > *************** vect_can_force_dr_alignment_p (tree decl > *** 1527,1533 **** > PREFERRED_STACK_BOUNDARY is honored by all translation units. > However, until someone implements forced stack alignment, SSE > isn't really usable without this. */ > ! return (alignment <= PREFERRED_STACK_BOUNDARY); > } > > > --- 1527,1533 ---- > PREFERRED_STACK_BOUNDARY is honored by all translation units. > However, until someone implements forced stack alignment, SSE > isn't really usable without this. */ > ! return (alignment <= STACK_BOUNDARY); > } > Hey Dorit. With this patch zlib appears to compile successfully. The loop is vectorized with an "alignment of access forced using peeling" note and linked apps no longer segfault. I also tested using Andrew's patch from bug #16660 and always returning true in vect_can_force_dr_alignment_p but it does not fix this error. Let me know if I can provide any other info that would be useful to you.
thanks a lot for checking both patches! > With this patch zlib appears to compile successfully. The loop is > vectorized with an "alignment of access forced using peeling" note and linked > apps no longer segfault. I'd like to try to verify if the problem is indeed related to the STACK_BOUNDARY, or whether this has to do with some weird interplay with the compilation of some other function, possibly after inlining (i.e. something like what we had in PR27770). I'm not sure how to suggest to check that... > I also tested using Andrew's patch from bug #16660 and always returning true in > vect_can_force_dr_alignment_p but it does not fix this error. Andrew, makes sense to you? > Let me know if I can provide any other info that would be useful to you. thanks, I'll think about it...
> Andrew, makes sense to you? I think my patch only checks PREFERRED_STACK_BOUNDARY and not STACK_BOUNDARY which is why it does not work but I have not looked into it at all.
(In reply to comment #2) > > Andrew, makes sense to you? > I think my patch only checks PREFERRED_STACK_BOUNDARY and not STACK_BOUNDARY > which is why it does not work but I have not looked into it at all. I see references in the patch to both PREFERRED_STACK_BOUNDARY and STACK_BOUNDARY. Could you please check which of these needs to be fixed? (cause I think your fix is the more desirable one). (just for the record, the link to the patch in question is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c21)
Also just for the record - the testcase for this PR is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c14
Ryan, I wonder what happens if you force alignment in the source code, like so: unsigned short count[MAXBITS+1] __attribute__ ((__aligned__(16))) ; In this case the vectorizer does not change the alignment of the array. I wonder if the compiler honors the alignment attribute when the user asks for it, rather than the vectorizer.
It looks like zlib compiled w/ -O -msse -ftree-vectorize (built with fedora's rpm package gcc-4.1.2-17) has same problem. In my environment, rpm-4.4.2.1-7.fc8 and seamonkey-1.1.3-6.fc8 segfault like below: Program received signal SIGSEGV, Segmentation fault. 0x003a869d in inflate_table (type=CODES, lens=0x913b5c8, codes=19, table=0x913b5c4, bits=0x913b5ac, work=0x913b848) at inftrees.c:108 108 count[len] = 0;
(In reply to comment #6) > It looks like > zlib compiled w/ -O -msse -ftree-vectorize (built with fedora's rpm package > gcc-4.1.2-17) > has same problem. > In my environment, rpm-4.4.2.1-7.fc8 and seamonkey-1.1.3-6.fc8 segfault like > below: > Program received signal SIGSEGV, Segmentation fault. > 0x003a869d in inflate_table (type=CODES, lens=0x913b5c8, codes=19, > table=0x913b5c4, bits=0x913b5ac, work=0x913b848) at inftrees.c:108 > 108 count[len] = 0; could you please provide a complete (reduced...) testcase that could be used to reproduce this? In the meantime, other things that may help: - could you please try to add "__attribute__ ((__aligned__(16)))" to the definition of count, as suggested in comment 5? - could you please show the relevant generated assembly up to the offending insn? (with and without the attribute aligned)? could you also check (with gdb) what is the address accessed and what is the address of the stack pointer?
I can't confirm the failure (make check from zlib works OK for me), but there is some suspicious code in inflate_fast(). When compiling (to be attached) preprocessed testcase (using -O2 -msse2 -ftree-vectorize on i686), I got: 8052711: 31 d2 xor %edx,%edx 8052713: 83 c2 01 add $0x1,%edx 8052716: f3 0f 6f 41 10 movdqu 0x10(%ecx),%xmm0 805271b: f3 0f 6f 51 20 movdqu 0x20(%ecx),%xmm2 8052720: f3 0f 6f 09 movdqu (%ecx),%xmm1 8052724: 66 0f 7f 40 11 movdqa %xmm0,0x11(%eax) 8052729: 66 0f 7f 48 01 movdqa %xmm1,0x1(%eax) 805272e: 66 0f 7f 50 21 movdqa %xmm2,0x21(%eax) 8052733: 83 c1 30 add $0x30,%ecx 8052736: 83 c0 30 add $0x30,%eax These lines correspond to line 235 of inffast.c: } while (len > 2) { PUP(out) = PUP(from); PUP(out) = PUP(from); PUP(out) = PUP(from); len -= 3; } These lines are preprocessed into: while (len > 2) { *++(out) = *++(from); *++(out) = *++(from); *++(out) = *++(from); len -= 3; }
Created attachment 14226 [details] Preprocessed inffast.c. Compile with -O2 -msse2 -ftree-vectorize
Created attachment 14228 [details] zlib testcase(zlib's inftrees.c)
I hope that following testcase can be of some use to somebody... --cut here-- void __attribute__((noinline)) inflate_fast (unsigned char *from, unsigned char *out, unsigned int len) { while (len > 2) { *++(out) = *++(from); *++(out) = *++(from); *++(out) = *++(from); len -= 3; } return; } int main() { unsigned int len = 13; unsigned char x[13] = "Hello there! ...Hello there! ...Hello there! ..."; unsigned char y[13] = " "; inflate_fast (x, y, len); printf ("%s\n", y); return 0; } --cut here-- This testcase produces movdqa with 1byte offset, but I'm not able to trigger segfault with it, although it produces asm code as in comment #18.
(In reply to comment #10) > Created an attachment (id=14228) [edit] > zlib testcase(zlib's inftrees.c) Sorry, I can't reproduce the segfault with current mainline. Please provide self-contained test that segfaults, or exact instructions how to produce segfault from public accessible sources. example.c (part of zlib testsuite) runs OK with -O2 -msse2 -ftree-vectorize on i686 and x86_64.
(In reply to comment #12) Hmm..., but in my environment, some applications always segfaults at zlib which was built with -O -msse -ftree-vectorize. I'm sorry, but now I recognize that I don't have enough ability to do bug-report. Please ignore me. Sorry for mess that I've caused.
Created attachment 14246 [details] inftrees.c this is the testcase from bug #25413. with -O2 -msse2 -ftree-vectorize, i get this in gcc-4.2.0: inftrees.o: file format elf32-i386 Disassembly of section .text: 00000000 <inflate_table>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 24 sub $0x24,%esp 7: 8b 5d 0c mov 0xc(%ebp),%ebx a: 8b 4d 10 mov 0x10(%ebp),%ecx d: 66 0f ef c0 pxor %xmm0,%xmm0 11: 66 0f 7f 45 d8 movdqa %xmm0,-0x28(%ebp) 16: 66 0f 7f 45 e8 movdqa %xmm0,-0x18(%ebp) 1b: 85 c9 test %ecx,%ecx 1d: 74 16 je 35 <inflate_table+0x35> 1f: ba 00 00 00 00 mov $0x0,%edx 24: 0f b7 04 53 movzwl (%ebx,%edx,2),%eax 28: 66 83 44 45 d8 01 addw $0x1,-0x28(%ebp,%eax,2) 2e: 83 c2 01 add $0x1,%edx 31: 39 ca cmp %ecx,%edx 33: 75 ef jne 24 <inflate_table+0x24> 35: b8 00 00 00 00 mov $0x0,%eax 3a: 8d 55 d8 lea -0x28(%ebp),%edx 3d: 66 83 7c 42 1e 00 cmpw $0x0,0x1e(%edx,%eax,2) 43: 75 08 jne 4d <inflate_table+0x4d> 45: 83 e8 01 sub $0x1,%eax 48: 83 f8 f1 cmp $0xfffffff1,%eax 4b: 75 f0 jne 3d <inflate_table+0x3d> 4d: 83 c4 24 add $0x24,%esp 50: 5b pop %ebx 51: 5d pop %ebp 52: c3 ret forcing alignment as in comment #5 results in: inftrees-align.o: file format elf32-i386 Disassembly of section .text: 00000000 <inflate_table>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 24 sub $0x24,%esp 7: 8b 5d 0c mov 0xc(%ebp),%ebx a: 8b 4d 10 mov 0x10(%ebp),%ecx d: b8 01 00 00 00 mov $0x1,%eax 12: 8d 55 d8 lea -0x28(%ebp),%edx 15: 66 c7 44 42 fe 00 00 movw $0x0,-0x2(%edx,%eax,2) 1c: 83 c0 01 add $0x1,%eax 1f: 83 f8 11 cmp $0x11,%eax 22: 75 f1 jne 15 <inflate_table+0x15> 24: 85 c9 test %ecx,%ecx 26: 74 16 je 3e <inflate_table+0x3e> 28: ba 00 00 00 00 mov $0x0,%edx 2d: 0f b7 04 53 movzwl (%ebx,%edx,2),%eax 31: 66 83 44 45 d8 01 addw $0x1,-0x28(%ebp,%eax,2) 37: 83 c2 01 add $0x1,%edx 3a: 39 ca cmp %ecx,%edx 3c: 75 ef jne 2d <inflate_table+0x2d> 3e: b8 00 00 00 00 mov $0x0,%eax 43: 8d 55 d8 lea -0x28(%ebp),%edx 46: 66 83 7c 42 1e 00 cmpw $0x0,0x1e(%edx,%eax,2) 4c: 75 08 jne 56 <inflate_table+0x56> 4e: 83 e8 01 sub $0x1,%eax 51: 83 f8 f1 cmp $0xfffffff1,%eax 54: 75 f0 jne 46 <inflate_table+0x46> 56: 83 c4 24 add $0x24,%esp 59: 5b pop %ebx 5a: 5d pop %ebp 5b: c3 ret there's a gdb log for the segfault in firefox @ http://gcc.gnu.org/bugzilla/attachment.cgi?id=13966 i'll try to find something a little smaller than mozilla that can demonstrate this problem since i still suck at testcases. FWIW, i've been running GCC-4.2 svn with the patch at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c17 for a couple months now and have built a sizable chunk of our package repository with -ftree-vectorize enabled several times over and have yet to run into any trouble whatsoever.
i should also mention that the zlib testsuite doesn't trigger this bug for some reason. many other applications that link with zlib are also unaffected. some that are include firefox/mozilla/thunderbird/seamonkey/xulrunner, rpm (notably rpm2cpio), openoffice... some have reported that opening the GTK file selector dialog triggers it but i haven't encountered that and suspect it was bug #25413.
Ryan, thanks a lot for the info. FYI, I started a discussion about this here: http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00202.html
Subject: Bug 32893 Author: dorit Date: Tue Oct 30 05:25:10 2007 New Revision: 129764 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129764 Log: PR tree-optimization/32893 * tree-vectorize.c (vect_can_force_dr_alignment_p): Check STACK_BOUNDARY instead of PREFERRED_STACK_BOUNDARY. Added: trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c trunk/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c trunk/gcc/testsuite/gcc.dg/vect/vect-77-global.c trunk/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c trunk/gcc/testsuite/gcc.dg/vect/vect-78-global.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c trunk/gcc/testsuite/gcc.dg/vect/slp-25.c trunk/gcc/testsuite/gcc.dg/vect/vect-13.c trunk/gcc/testsuite/gcc.dg/vect/vect-17.c trunk/gcc/testsuite/gcc.dg/vect/vect-18.c trunk/gcc/testsuite/gcc.dg/vect/vect-19.c trunk/gcc/testsuite/gcc.dg/vect/vect-2.c trunk/gcc/testsuite/gcc.dg/vect/vect-20.c trunk/gcc/testsuite/gcc.dg/vect/vect-21.c trunk/gcc/testsuite/gcc.dg/vect/vect-22.c trunk/gcc/testsuite/gcc.dg/vect/vect-27.c trunk/gcc/testsuite/gcc.dg/vect/vect-29.c trunk/gcc/testsuite/gcc.dg/vect/vect-3.c trunk/gcc/testsuite/gcc.dg/vect/vect-31.c trunk/gcc/testsuite/gcc.dg/vect/vect-34.c trunk/gcc/testsuite/gcc.dg/vect/vect-36.c trunk/gcc/testsuite/gcc.dg/vect/vect-4.c trunk/gcc/testsuite/gcc.dg/vect/vect-5.c trunk/gcc/testsuite/gcc.dg/vect/vect-6.c trunk/gcc/testsuite/gcc.dg/vect/vect-64.c trunk/gcc/testsuite/gcc.dg/vect/vect-65.c trunk/gcc/testsuite/gcc.dg/vect/vect-66.c trunk/gcc/testsuite/gcc.dg/vect/vect-68.c trunk/gcc/testsuite/gcc.dg/vect/vect-7.c trunk/gcc/testsuite/gcc.dg/vect/vect-72.c trunk/gcc/testsuite/gcc.dg/vect/vect-73.c trunk/gcc/testsuite/gcc.dg/vect/vect-76.c trunk/gcc/testsuite/gcc.dg/vect/vect-77.c trunk/gcc/testsuite/gcc.dg/vect/vect-78.c trunk/gcc/testsuite/gcc.dg/vect/vect-86.c trunk/gcc/testsuite/gcc.dg/vect/vect-all.c trunk/gcc/testsuite/gcc.dg/vect/vect.exp trunk/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c trunk/gcc/testsuite/lib/target-supports.exp trunk/gcc/tree-vectorizer.c
Fixed?
> Fixed? In a way, yes. The problem is avoided by generating too conservative code. AFAIU, a better solution may be expected in 4.4 from the stack alignment branch. In any case this segfault PR can be closed, and instead a missed optimization PR could be opened.
FYI, stack alignment branch will look like if (TREE_STATIC (decl)) return (alignment <= MAX_OFILE_ALIGNMENT); else if (MAX_VECTORIZE_STACK_ALIGNMENT) { gcc_assert (!cfun->stack_realign_processed); if (alignment <= MAX_VECTORIZE_STACK_ALIGNMENT) { if (cfun->stack_alignment_needed < alignment) cfun->stack_alignment_needed = alignment; return true; } else return false; } else return (alignment <= STACK_BOUNDARY); MAX_VECTORIZE_STACK_ALIGNMENT is defined as BIGGEST_ALIGNMENT. Do we need stack alignment larger than BIGGEST_ALIGNMENT for vectorizer?
The real problem is i386 gcc expects 16 byte stack boundary while the psABI specifies 4 byte. When you link a callee, which expects incoming stack at 16 byte boundary, with a caller, which only guarantees 4 byte stack boundary, you will run into problem. The stack alignment branch can automatically align the incoming stack. But we have to generate 16byte stack boundary when calling a function by default since callees compiled by older gcc expects 16 byte stack boundary. Since incoming stack is aligned at 16byte by default, the stack alignment branch still assumes incoming stack aligned at 16 byte by default. You can override it with -mstackrealign or force_align_arg_pointer attribute, which will assume incoming stack is aligned at 4 byte.
*** Bug 40985 has been marked as a duplicate of this bug. ***
(In reply to comment #17) > URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129764 > Log: > PR tree-optimization/32893 > * tree-vectorize.c (vect_can_force_dr_alignment_p): Check > STACK_BOUNDARY instead of PREFERRED_STACK_BOUNDARY. In 4.4 STACK_BOUNDARY changed to MAX_STACK_ALIGNMENT in this place. MAX_STACK_ALIGNMENT is STACK_BOUNDARY "by default", but in i386.h: #define MAX_STACK_ALIGNMENT MAX_OFILE_ALIGNMENT I not check code dependences more, but starting from 4.4.0 this error alive (Bug 40985). Are there are regression or something else? PS May be reopen this bug with other version or Bug 40985 as regression report?