The "ZynAddSubFX 2.1.1" C++ program happens to work with gcc 3.3.4, but crashes strangely with gcc 3.4.2 in a specific section of the src/Synth/OscilGen.C code.
Created attachment 7342 [details] 'gcc -v' output
Created attachment 7343 [details] OscilGen.ii erisha@necro ~/bin/ZYNSUCKS-3.4.2/src/Synth $ gcc -O0 -ggdb -Wall -DOS_LINUX -DALSAMIDIIN -DFFTW_VERSION_3 -DASM_F2I_YES `fltk-config --cflags` -DJACKAUDIOOUT `pkg-config --cflags jack` -c -o OscilGen.o OscilGen.C --save-temps
Created attachment 7344 [details] OscilGen.S gcc -O0 -ggdb -Wall -DOS_LINUX -DALSAMIDIIN -DFFTW_VERSION_3 -DASM_F2I_YES `fltk-config --cflags` -DJACKAUDIOOUT `pkg-config --cflags jack` -c -o OscilGen.S OscilGen.C -S
Created attachment 7345 [details] OscilGen.S_listing objdump -S OscilGen.o > OscilGen.S_listing
Created attachment 7346 [details] GDB session
for (i=0;i<OSCIL_SIZE;i++) oscilFFTfreqs[i]=0.0; 25ad: a1 00 00 00 00 mov 0x0,%eax 25b2: 31 ff xor %edi,%edi 25b4: 39 c7 cmp %eax,%edi 25b6: 89 45 80 mov %eax,0xffffff80(%ebp) 25b9: 7d 1c jge 25d7 <_ZN8OscilGen7prepareEv+0x157> 25bb: 8b 55 e0 mov 0xffffffe0(%ebp),%edx 25be: 8b 92 88 05 00 00 mov 0x588(%edx),%edx 25c4: 89 55 84 mov %edx,0xffffff84(%ebp) 25c7: 8b 45 84 mov 0xffffff84(%ebp),%eax 25ca: c7 04 b8 00 00 00 00 movl $0x0,(%eax,%edi,4) 25d1: 47 inc %edi 25d2: 3b 7d 80 cmp 0xffffff80(%ebp),%edi 25d5: 7c f0 jl 25c7 <_ZN8OscilGen7prepareEv+0x147> if (Pcurrentbasefunc==0) {//the sine case 25d7: 8b 55 e0 mov 0xffffffe0(%ebp),%edx for (i=0;i<MAX_AD_HARMONICS;i++){ oscilFFTfreqs[i+1]=-hmag[i]*sin(hphase[i]*(i+1))/2.0; oscilFFTfreqs[OSCIL_SIZE-i-1]=hmag[i]*cos(hphase[i]*(i+1))/2.0; }; } else { for (j=0;j<MAX_AD_HARMONICS;j++){ 25da: 31 f6 xor %esi,%esi 25dc: 80 ba 29 01 00 00 00 cmpb $0x0,0x129(%edx) 25e3: 0f 85 1f 02 00 00 jne 2808 <_ZN8OscilGen7prepareEv+0x388> 25e9: 8b 82 88 05 00 00 mov 0x588(%edx),%eax 25ef: 31 ff xor %edi,%edi 25f1: f3 0f 10 05 00 00 00 movss 0x0,%xmm0 25f8: 00 25f9: 0f 29 45 88 movaps %xmm0,0xffffff88(%ebp) 25fd: 89 45 84 mov %eax,0xffffff84(%ebp) 2600: 89 85 7c ff ff ff mov %eax,0xffffff7c(%ebp) 2606: 8d 77 01 lea 0x1(%edi),%esi 2609: 8b 55 e0 mov 0xffffffe0(%ebp),%edx 260c: 0f 57 c0 xorps %xmm0,%xmm0 260f: f3 0f 2a c6 cvtsi2ss %esi,%xmm0 The problem seems that movaps is writing to memory which is not 16-byte aligned.
Also on mainline: movaps %xmm0, -104(%ebp)
Note the options needed to reproduce this is -O2 -march=athlon-xp.
Confirmed reduced testcase: extern double sin (double __x) throw (); float hmag[128],hphase[128]; void prepare(float *oscilFFTfreqs) { int i; for (i=0;i<128;i++) *oscilFFTfreqs=-hmag[i]*sin(hphase[i])/2.0; }
And this is a regression. Only -O2 -msse is needed to reproduce the problem.
Related to bug 17962.
I have a very similar problem in my program. It segfaults in this instruction: 0x0815935d <vector+203>: movapd %xmm0,(%eax) where I have eax 0xbfffd874 -1073751948 and (gdb) p ((unsigned int)0xbfffd874) $1 = 3221215348 (gdb) p ((unsigned int)0xbfffd874) % 16 $2 = 4 So indeed movapd isn't writing to 16-byte aligned storage. W.
possibly related to http://www.cygwin.com/ml/cygwin/2004-04/msg01103.html
Uros, Jan, can you have a look at this bug? It's about an unaligned XMM operation to the stack. There is a small testcase and it is a pretty serious regression.
Testcase from comment #9 and '-O2 -msse -fomit-frame-pointer' procduces code that seems OK: prepare: pushl %ebp pushl %edi movl $hmag+512, %edi pushl %esi movl $hphase, %esi pushl %ebx movl $hmag, %ebx subl $76, %esp movss .LC0, %xmm0 movl 96(%esp), %ebp movaps %xmm0, 32(%esp) .p2align 4,,15 .L2: flds (%ebx) addl $4, %ebx fchs flds (%esi) addl $4, %esi fstpl (%esp) fstpl 16(%esp) call sin fldl 16(%esp) cmpl %ebx, %edi fmulp %st, %st(1) fmull .LC2 fstps (%ebp) jne .L2 addl $76, %esp popl %ebx popl %esi popl %edi popl %ebp ret BTW: when alter_reg() for reg 81 is called from reload() (at line ~836, reload1.c) it gets right offset (-32). [This offset is calculated at line ~1969 in reload1.c]. Something changes this correct offset from (-32) to (-56) if frame pointer is not omitted.
So, Eric, it looks like that adding -fomit-frame-pointer to your Makefile is a good workaround for the time being without losing any advanced optimization. You were looking for one, IIRC.
-fomit-frame-pointer has no effect. thanks anyways G. -E
i tested -mno-sse and it is a functional good workaround. also seems to override '-march=athlon-xp' which in this case is a good thing. -E
The problem here is triggered in reload() function around line 950, this part (#ifdef'd part was added by me:): for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++) if (reg_renumber[i] < 0 && reg_equiv_memory_loc[i]) { rtx x = eliminate_regs (reg_equiv_memory_loc[i], 0, NULL_RTX); #ifdef DEBUG debug_rtx(reg_equiv_memory_loc[i]); debug_rtx(x); #endif if (strict_memory_address_p (GET_MODE (regno_reg_rtx[i]), XEXP (x, 0))) reg_equiv_mem[i] = x, reg_equiv_address[i] = 0; ... For the testcase from comment #9 (converted to plain c), 'gcc -O2 -msse' will produce relevant debug information: IN: (mem:V4SF (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffe0])) [6 S16 A8]) OUT: (mem:V4SF (plus:SI (reg/f:SI 6 bp) (const_int -56 [0xffffffc8])) [6 S16 A8]) So, the problem is inside eliminate_regs() function, that unaligns otherwise aligned address. This unaligned address is passed down and somewhere around line 1214, following code will be triggered: for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++) { rtx addr = 0; if (reg_equiv_mem[i]) addr = XEXP (reg_equiv_mem[i], 0); ... and this addr is used as new (unaligned) address on stack. To further analyze this issue: following RTX is passed to eliminate_regs(): (mem:V4SF (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffe0])) [6 S16 A8]) After getting through MEM: case, function recurses to PLUS: case, where following RTX is processed: plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffffffe0])) and this code is triggered: ... else return gen_rtx_PLUS (Pmode, ep->to_rtx, plus_constant (XEXP (x, 1), ep->previous_offset)); where ep->previous_offset (when substituting frame pointer with ebp) equals (-24). And the resulting sum is then -56. I'm a little lost here, what previous_offset field represents, perhaps someone with more knowledge could find, if magic number (-24) is OK [it is not!]. BTW: the testcase from comment #9 when -fomit-frame-pointer is added to compilation flags produces correctly aligned address, because ep->previous_offset, when substituting frame pointer with esp, equals to 64. -32 + 64 = 32 in this case. Regarding comment #17: Perhaps original testcase still uses ebp, even with '-fomit-frame-pointer'. Uros.
These PRs are all duplicates: target/12902: Invalid assembly generated when using SSE / xmmintrin.h target/14776: -mfpmath=sse causes movapd from non-16-byte aligned address fortran/17930: -mfpmath=sse creates illegal code (movapd with misaligned argument) and perhaps: target/10395: [x86] vector types are incorrectly aligned causing crash in multi-threaded apps or when using in main [this bug has 10 dupes!] Uros.
Subject: Re: [3.4/4.0 Regression] unaligned xmm movaps on the stack with -O2 -msse because of the frame pointer uros at kss-loka dot si wrote: > target/12902: Invalid assembly generated when using SSE / xmmintrin.h > target/14776: -mfpmath=sse causes movapd from non-16-byte aligned > address fortran/17930: -mfpmath=sse creates illegal code (movapd with > misaligned argument) > > and perhaps: > target/10395: [x86] vector types are incorrectly aligned causing > crash in multi-threaded apps or when using in main > [this bug has 10 dupes!] Uros, thanks for the analysys. I have just given you edit access to Bugzilla, would you please close the dupes as such yourself, but making sure that you bring over any attacched comments/patches/testcases that you think might matter? Thanks, Giovanni Bajo
Postponed until GCC 3.4.4.
Reconfirmed with today's mainline. This bug has about 10 dups...
Could this patch help with unaligned stack: http://gcc.gnu.org/ml/gcc-patches/2003-01/msg00019.html
I have been completely unable to reproduce an unaligned stack, with either 3.4 or 4.0. Can those gcc folk that claim to have reproduced it in the past still do so? If so, what are you trying? I can say that the proximate cause of the existance of the SSE instruction in the first place is a test in negsf2 using TARGET_SSE that ought to be using TARGET_SSE_MATH. I can say that the reason reload fails to rematerialize the vector constant (and spills it on the stack) is due to the movss split firing before reload and removing the obviousness of the vector constant. For 3.4 I can make two minor tweaks that will prevent the manifestation of this bug, but leave -mfpmath=sse less than correct. For 4.0 I will attempt to fix that as well.
Richard, I have added a small testcase in f77 to PR 14776. I'll try to come up with something that can be used for mainline as well, but for now this is at least some kind of testcase for you to work with. Regards Wolfgang
I have something else (although of a quality that I don't trust myself) in C in PR 12902. W.
Subject: Bug 17990 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: rth@gcc.gnu.org 2004-12-14 01:57:57 Modified files: gcc : ChangeLog gcc/config/i386: i386.md Log message: PR target/17990 * config/i386/i386.md (negsf2): Fix condition for using sse. (negdf2, abssf2, absdf2): Likewise. (negsf2_if, abssf2_if): Don't disable if sse enabled. (movv4sf_internal splitter): Postpone til after reload. (movv2di_internal splitter): Likewise. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.731&r2=2.2326.2.732 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.502.2.10&r2=1.502.2.11
(In reply to comment #25) > I have been completely unable to reproduce an unaligned stack, with either 3.4 > or 4.0. Can those gcc folk that claim to have reproduced it in the past still > do so? If so, what are you trying? Testcase 1 (without threads): --cut here-- float hmag[128],hphase[128]; void prepare(float *oscilFFTfreqs) { int i; for (i=0;i<128;i++) *oscilFFTfreqs=-hmag[i]*sin(hphase[i])/2.0; } --cut here-- Compile it with 'gcc -O2 msse2': ... prepare: pushl %ebp movl %esp, %ebp pushl %esi pushl %ebx xorl %ebx, %ebx subl $64, %esp movss .LC0, %xmm0 movl 8(%ebp), %esi movaps %xmm0, -40(%ebp) <- here, load from unaligned stack .p2align 4,,15 .L2: ... Testcase 2: (with threads): --cut here-- /* *** start ***/ #include <pthread.h> #include <stdio.h> #include <assert.h> #include <xmmintrin.h> #include <mmintrin.h> #ifdef __ICC #include <emmintrin.h> #endif void * f(void *p) { int x = (p == NULL) ? 0 : * (int *) p; __m128i s; printf("&x = %p &s= %p\n", &x, &s); return NULL; } int main(int argc, char ** argv) { pthread_t th; f(& argc); assert(pthread_create(& th, NULL, f, &argc)==0); assert(pthread_join(th, NULL)==0); return 0; } /* ***end ***/ gcc -O2 -msse2 -lpthread: ./a.out &x = 0xbffff9cc &s= 0xbffff9b0 &x = 0x4085fac8 &s= 0x4085faac <- here, s is not aligned correctly It is a question whether ifor pthreads a stack alignment code should be present in the function (that would be more robust), or it is a libc's job to provide correctly alligned stack. I think that summary of this bugreport is not correct now.
I was too fast, testcase 1 from comment #29 is OK, it doesn't segfault anymore.
Eric, can you doublecheck that the bug is gone for you too, with and without - mfpmath=sse, with a today snapshot of 3.4 or 4.0?
In response to comment #30: it is libpthread's responsibility to align the stack of subthreads properly. It doesn't do that, however, but I believe that we have another PR for that (this is something that we can't really do anything about, however). If Uros' first testcase now works, then I think we have fixed the part that we could fix, so I'm closing the PR. Eric, if there are more problems, please open another PR and have a link in it to this PR. Thanks Wolfgang
Subject: Re: [3.4/4.0 Regression] sse used for negate without -mfpmath=sse > > ------- Additional Comments From giovannibajo at libero dot it 2004-12-14 12:38 ------- > Eric, can you doublecheck that the bug is gone for you too, with and without - > mfpmath=sse, with a today snapshot of 3.4 or 4.0? Actually -mfpmath is about the floating point operations that might change results depending on unit (ie whether you use extended precision or not). Negation is safe operation in this context so I would not say that this is a bug. Perhaps documentation needs to be clarified? Honza > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17990 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
Subject: Bug 17990 CVSROOT: /cvs/gcc Module name: gcc Changes by: rth@gcc.gnu.org 2004-12-14 22:45:29 Modified files: gcc : ChangeLog gcc/config/i386: i386-protos.h i386.c i386.h i386.md predicates.md Log message: PR target/17990 * config/i386/i386.c (x86_use_bt): New. (ix86_expand_unary_operator): Use MEM_P. (ix86_expand_fp_absneg_operator): New. * config/i386/i386.h (x86_use_bt): Declare. (TARGET_USE_BT): New. * config/i386/i386-protos.h: Update. * config/i386/i386.md (negsf2): Use ix86_expand_fp_absneg_operator. (negdf2, negxf2, abssf2, absdf2, absxf2): Likewise. (negsf2_memory, negsf2_ifs, negsf2_if, negdf2_memory, negdf2_ifs, negdf2_ifs_rex64, negdf2_if, negdf2_if_rex64, negxf2_if, abssf2_memory, abssf2_ifs, abssf2_if, absdf2_memory, absdf2_ifs, absdf2_ifs_rex64, absdf2_if, absxf2_if): Remove. (absnegsf2_mixed, absnegsf2_sse, absnegsf2_i387, absnegdf2_mixed, absnegdf2_sse, absnegdf2_i387, absnegxf2_i387): New. Merge all neg and abs splitters. Handle DFmode in general regs in 64-bit mode. (negextendsfdf2, absextendsfdf2): Disable for non-mixed sse math. (btsq, btrq, btcq): New. Add peepholes as well. (movv4sf_internal splitter): Postpone til after reload. (movv2di_internal splitter): Likewise. * config/i386/predicates.md (const_0_to_63_operand): New. (absneg_operator): New. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6829&r2=2.6830 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386-protos.h.diff?cvsroot=gcc&r1=1.121&r2=1.122 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.752&r2=1.753 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.h.diff?cvsroot=gcc&r1=1.406&r2=1.407 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.573&r2=1.574 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/predicates.md.diff?cvsroot=gcc&r1=1.8&r2=1.9
Just FYI: first testcase from comment #29 shows the problem with missing i386 addressing mode. First leal is not needed as array can be addressed directly with flds. A PR 18643 is opened for this bug. .L2: leal 0(,%edx,4), %eax flds hphase(%eax) fsin flds hmag(%eax) fchs leal 1(%edx), %eax cmpl $128, %eax movl %eax, %edx fmulp %st, %st(1) fmul %st(1), %st fstps (%ecx) jne .L2