hi all, (first of all, sorry for this unprofessional bug report) compiling my application with gcc-4.3 with -O2 -ftree-vectorize, it segfaults. i haven't been able to write a stripped-down test case, but here are the information, that i gathered: the constructor of the main data structure of my application contains: 0xb5eb02bd <Environment+205>: movdqa %xmm0,0x1990(%edi) 0xb5eb02c5 <Environment+213>: movdqa %xmm0,0x19a0(%edi) 0xb5eb02cd <Environment+221>: movdqa %xmm0,0x19b0(%edi) 0xb5eb02d5 <Environment+229>: movdqa %xmm0,0x19c0(%edi) 0xb5eb02dd <Environment+237>: movdqa %xmm0,0x19d0(%edi) 0xb5eb02e5 <Environment+245>: movdqa %xmm0,0x19e0(%edi) 0xb5eb02ed <Environment+253>: movdqa %xmm0,0x19f0(%edi) 0xb5eb02f5 <Environment+261>: movdqa %xmm0,0x1a00(%edi) where %edi contains the this pointer to the class. the problem is, that the address, that Environment+205 tries to load seems not to be guarrantied to be aligned to a 16 byte boundary. in my debugging session, it pointed to 0x8369f58, 0x8369f58+0x1990 is not aligned as required by the movdqa instruction, though ... i am using gcc-4.3: Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure linux gnu Thread model: posix gcc version 4.3.1 20080401 (prerelease) (Debian 4.3.0-3) the command line options are: -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC unfortunately i haven't been able to construct a smaller test-case ... gcc-4.2 works fine for me ... the preprocessed source file is attached
Created attachment 15533 [details] preprocessed source file preprocessed source file
Hello, I've tried to complile the attached kernel_build.ii on my SUSE SLES 10 x86_64 machine, but got a bunch of compile errors like this: In file included from /usr/lib/gcc/i486-linux-gnu/4.3.1/include/xmmintrin.h:40, from source/dsp/simd_sse.hpp:24, from source/dsp/arithmetic_wrapper.hpp:32, from source/kernel/audio_backend/audio_backend.hpp:35, from source/kernel/scheduler.hpp:33, from source/kernel/class_system/classloader.hpp:33, from source/kernel/class_system/class.hpp:33, from source/kernel/object_system/gobj.hpp:31, from source/nova_includes.hpp:31, from source/nova.hpp:30, from /home/tim/workspace/nova/source/kernel/audio_backend/audio_backend.cpp:27, from /home/tim/workspace/nova/release/kernel/audio_backend/build.cpp:2, from release/kernel/kernel_build.cpp:2: /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h: In function вЂint __vector__ _mm_add_si64(int __vector__, int __vector__)’: /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h:312: error: cannot convert вЂlong long int’ to вЂlong long int __vector__’ for argument вЂ1’ to вЂlong long int __vector__ __builtin_ia32_paddq(long long int __vector__, long long int __vector__)’ Anyway. I see that recently were fixed some problems related to stack alignment like PR35496. Can you try to compile your example with latest gcc 4.4 compiler from mainline?
have you tried to compile with -march=core2 -mfpmath=sse -msse? i guess, that is required to compile the preprocessed source file correctly ... i will try gcc-4.4, when i find the time to compile it ...
> have you tried to compile with -march=core2 -mfpmath=sse -msse? Yes, I've compiled it as following: % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC kernel_build.ii
odd, it compiled fine for me: tim@laptop:~/workspace/nova$ g++-4.3 -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC kernel_build.ii -c tim@laptop:~/workspace/nova$ tim@laptop:~/workspace/nova$ g++-4.3 -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure linux gnu Thread model: posix gcc version 4.3.1 20080401 (prerelease) (Debian 4.3.0-3)
(In reply to comment #4) > > have you tried to compile with -march=core2 -mfpmath=sse -msse? > Yes, I've compiled it as following: > % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 > -Wnon-virtual-dtor -fPIC kernel_build.ii -m32 ?
> -m32? Better, but a bunch of 45 errors like below remained. % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -fPIC kernel_build.ii > log 2>&1 /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h: In function вЂint __vector__ _mm_add_si64(int __vector__, int __vector__)’: /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h:312: error: cannot convert вЂlong long int’ to вЂlong long int __vector__’ for argument вЂ1’ to вЂlong long int __vector__ __builtin_ia32_paddq(long long int __vector__, long long int __vector__)’
(In reply to comment #6) > (In reply to comment #4) > > > have you tried to compile with -march=core2 -mfpmath=sse -msse? > > Yes, I've compiled it as following: > > % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 > > -Wnon-virtual-dtor -fPIC kernel_build.ii > -m32 ? -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change of ia32 intrinsic conflict with 4.3.0 header files. I'm using 4.3.0. Compilation passes but I still got link errors like: /tmp/ccfJXXcV.o:(.rodata._ZTVN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[vtable for portaudio::MemFunCallbackStream<nova::PortAudioBackend>]+0x10): undefined reference to `portaudio::Stream::close()' /tmp/ccfJXXcV.o:(.rodata._ZTIN9portaudio20MemFunCallbackStreamIN4nova16PortAudioBackendEEE[typeinfo for portaudio::MemFunCallbackStream
(In reply to comment #8) > -m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline change Correction: -m32 is a must, but doesn't fix all. Options I'm using: g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 -Wnon-virtual-dtor -m32
(In reply to comment #7) > > -m32? > > Better, but a bunch of 45 errors like below remained. > > % g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate-depth-4096 > -Wnon-virtual-dtor -fPIC kernel_build.ii > log 2>&1 > > /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h: In function > вЂint __vector__ _mm_add_si64(int __vector__, int > __vector__)’: > /usr/lib/gcc/i486-linux-gnu/4.3.1/include/mmintrin.h:312: error: cannot convert > вЂlong long int’ to вЂlong long int > __vector__’ for argument вЂ1’ to > вЂlong long int __vector__ __builtin_ia32_paddq(long long int > __vector__, long long int __vector__)’ Just remove these functions (64bit MMX arithmetic and MMX shift builtins) from the file. You are compiling with 4.4 that has changed these intrinsic functions (and relevant builtins). The mismatch is created since this preprocessed source includes old (4.3.1) headers. Anyway, I can't check if 4.4 generates correct code since the test isn't executable. For a runtime problems, executable test is needed, and since the failure happens in Environment function, only this function should be present in the executable.
Tim, Since it doesn't link, I can only check the .s file. There are a couple of constructor called Environment, which one is the problemetic function? grep Environment kernel_build.s|grep glob ... .globl _ZN4nova11EnvironmentD1Ev .globl _ZN4nova11EnvironmentD2Ev .globl _ZN4nova11EnvironmentC1Ev
> Since it doesn't link, I can only check the .s file. There are a couple of > constructor called Environment, which one is the problemetic function? sorry for not providing a stripped down test case ... the specific constructor is the default constructor: _ZN4nova11EnvironmentC1Ev in my version of the assembler file, the specific line is: movdqa %xmm0, 6544(%esi) i hope, this is of any help ... thanks, tim
It is helpful. Root cause is that memory allocated by new is only aligned to 8 bytes under i386. In your case, object Environment is allocated by new and its constructor tried to use movdqa to initialize its members. Following small case shows the problem: /* Compile with option -m32 -msse2 Current behavior: runtime segment fault */ #include <stdio.h> #include <emmintrin.h> struct A { public: __m128i m; void init() { m = _mm_setzero_si128(); } }; int main() { A * a = new A; printf("Address of A: %p\n", a); a->init(); delete a; return 0; }
HJ, AVX will have the similar problem on x86_64, whose new only returns object aligned at 16 bytes. Dynamically allocated __m256 won't be guaranteed at 32 bytes boundary.
See http://gcc.gnu.org/ml/gcc/2006-10/msg00166.html and a couple others about a way to have an aligned operator new.
malloc has the same issue really. And from what I heard last time, glibc does not want to change malloc to return the alignment.
Comment #c16 doesn't make sense. Of course malloc(3) can't be changed to return alignment, that would break all programs out there, violate many standards, etc. There are posix_memalign and memalign, which work just fine.
hm, if that code is broken, what about the following one on x86_64 (__sync_bool_compare_and_swap_16 requires an alignment of 16 byte)? struct __attribute__((aligned(16))) foo_t { foo_t(int a = 0, int b = 0): a_(a), b_(b) {} void CAS (foo_t const & rhs) { typedef int TItype __attribute__ ((mode (TI))); union cu { long c[2]; TItype l; }; cu old; old.c[0] = (long)a_; old.c[1] = (long)b_; cu nw; nw.c[0] = (long)rhs.a_; nw.c[1] = (long)rhs.b_; __sync_bool_compare_and_swap_16(reinterpret_cast<volatile foo_t*>(this), old.l, nw.l); } int a_, b_; }; int main() { foo_t * f = new foo_t(); f->CAS(foo_t(1,2)); } thanks, tim
(In reply to comment #17) > Comment #c16 doesn't make sense. Of course malloc(3) can't be changed to > return > alignment, that would break all programs out there, violate many standards, > etc. Right now malloc violates the C standard with respect of alignment. I am not saying we should add an alignment argument to malloc but if the standard there is an alignment on the returned value, likewise for operator new. -- Pinski
It is a bug in testcase. But g++ doesn't diagnose it properly. I opened PR 36159 for it.
Created attachment 17293 [details] precompiled source The same issue with GCC 4.3.3 (i686-pc-linux-gnu). C source: http://svn.savannah.gnu.org/viewvc/*checkout*/trunk/kqemu.c?root=qemu&revision=6338&content-type=text%2Fplain cc1 -O3 -march=pentium4 kqemu.c Offending asm code: kqemu_cpu_exec: ... movl 8(%ebp), %ebx ... .L96: movdqa (%ebx), %xmm0 // segmentation fault
(In reply to comment #21) Sorry, it seems it's because of malloc(), not a GCC bug...