The following test program, when compiled with g++ -O2 -march=pentium4 -msse will segfault, at least on the machines I tried it on: * Pentium III (Katmai) * Intel(R) Pentium(R) 4 CPU 3.00GHz * Intel(R) Xeon(TM) CPU 2.40GHz When allocating fewer vectors, it sometimes works. Oh, and it also fails with intels own compiler... #include <xmmintrin.h> #include <stdio.h> class Vector4 { public: Vector4() { vec = _mm_setzero_ps(); } __m128 vec; }; int main(int argc, char **argv) { Vector4 *foo = new Vector4[200*200]; delete foo; }
Not a gcc bug as I can reproduce it withe the following code which is the same as the code you gave, aka new just calls malloc: #include <xmmintrin.h> #include <stdio.h> #include <stdlib.h> void __attribute__((noinline)) temp() { __m128 * foo = (__m128 * )malloc (200*200*sizeof(__m128)); for(int i = 0 ;i <200*200;i++) *foo = _mm_setzero_ps(); free (foo); } int main(int argc, char **argv) { temp(); }
So, where is the bug then? glibc? linux kernel?
No where as malloc or new does not define the alignment of the memory it allocates so you should use posix_memalign (or memalign) instead.
Ok. However "new" is C++ and not malloc, and it should know about what it is about to allocate, I think. I still think this is a gcc bug, as: * __m128 is an intrinsic, and so gcc should know about the way it must be aligned * new, a c++ keyword, should obey those things
Yes but the C++ standard say the alignment alllocated from new is undefined so this is not a bug still.
Ok, fair enough. Do you have any idea how to safely create an instance of my Vector4 class? Or do I have to "fix" malloc?
You should be using posix_memalign (or memalign) if you need a specific alignment.
Does that mean I cannot create Vector4 instances like this:? Vector4 *foo = new Vector4[200*200]; (I have played around with __malloc_hook and __realloc_hook to make them use posix_memalign, but no success, as malloc doesn't get called when I expect it to...) Or is it just me being stupid and missing the obvious?
How about this case then? Vector4 does not get allocated with new now. Why does this case not work? IMHO, Gcc should see: oh, I have something of type __m128, I better align __m128 to 16 when I create an instance of Foo... #include <xmmintrin.h> class Vector4 { public: Vector4() { vec = _mm_setzero_ps(); } __m128 vec; }; class Foo { public: float a; Vector4 foo; Foo(const Vector4 &v) { foo = v; } }; int main(int argc, char **argv) { Foo *foo = new Foo(Vector4()); }
Subject: Re: Strange bug / incorrect code generation with SSE On Jun 3, 2004, at 13:07, ma1flfs at bath dot ac dot uk wrote: > > ------- Additional Comments From ma1flfs at bath dot ac dot uk > 2004-06-03 17:07 ------- > How about this case then? > class Foo > { > public: > float a; > Vector4 foo; > > Foo(const Vector4 &v) > { > foo = v; > } > }; Because in the struct (class) Foo, the field is foo is at the offset of 16, really it has the same problem as before though as Foo does not start at a memory location &0xF (16byte aligned). Go read a book about alignment and vector registers. Thanks, Andrew Pinski
Andrew, though you may be right, gcc's behavior doesn't make much sense. The compiler knows the alignment requirements of such objects, and if it does not satisfy them in any way, then we can't say we support them since there is no way (except for using stack variables) in which one can write code that actually works. There must be some way, at least with using operator new, to allocate this space with the alignment requirements of such types. I'd like either Jan or one of the people with knowledge of the interface between C++ and libstdc++'s implementation of operator new to comment on this. In short, for latecomers: this program segfaults: ---------------- #include <xmmintrin.h> int main() { __m128 * foo = new __m128; *foo = _mm_setzero_ps(); } ---------------- Andrew claims that it is because of the pointer returned by malloc doesn't satisfy the alignment criteria for this data type. W.
Oh, I should have said that indeed on my system, the memory location is 8-byte aligned, not 16-byte aligned. If I do make it 16-byte aligned, for example with a hack like this char *p = new char[10000]; __m128 * foo = (__m128*)(((int)(p+16))/16*16); (for which I have no idea whether it is in accordance with aliasing rules), then the program correctly succeeds. Nevertheless, I believe that gcc should have some way of transmitting alignment requirements when it allocated memory. W.
Isn't the agruments of new defined by the ABI? If that is true then I still think this is an invalid bug.
Yes, the bug is in malloc. From C99 7.20.3, talking about malloc/realloc: The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The rule for operator new is identical. In this case, the pointer returned by malloc, and thence by operator new, is not suitably aligned for a vector, so the program segfaults.
Then someone definitely has to go bug the glibc people. W.
Disclaimer: I don't know that much about gcc, but I have some ideas anyway. As far as I can see there are two issues: Issue 1) __m128 *foo = new __m128; I don't think there is any excuse for this, it's just plain wrong that it doesn't work. __m128 is an intrinsic gcc is supposed to support. One thing about this intrinsic is that it's 128 bits long, the other, equally important is that it is aligned to a 16 byte boundary. If using malloc() to allocate that *foo, I would kindof expect failure, as malloc returns void* and thus does not know anything about what it allocates. however, new is a c++ keyword, i.e. part of c++. It should know what the hell it is allocating. In this case we ask it to allocate an object of type __m128. It DOES NOT allocate an object of type __m128. It just allocates something that is long enough. Issue 2) class { float a; __m128 b; } Foo; This is not that hard either, as sizeof(Foo) = 32. It already works if you allocate Foo on the stack. But the same problem as above remains. If you don't want to "fix" new, how about a C++ extension, so that you can tell new how to allocate objects...? Or failing that, emit a warning, telling the confused user about this IMHO weird behavior. Cheers, Florian
Florian, malloc is supposed to know what the strictest alignment requirement is, and DTRT whatever (after all, we use it for 'new int' which, on many systems, must also be aligned). This is not a gcc bug, but a system library bug. It is not sensible for GCC to go down the path of checking that each and every syslib call conforms to the appropriate std.
As for a workaround: you can of course overload operator new, and do a trick like struct Align16 {} align16; // overload operator new to take an argument of type Align16 __m128 *p = new(align16) __m128; W.
I have submitted a glibc bug report for this. http://sources.redhat.com/bugzilla/show_bug.cgi?id=206
The glibc people have decided that they will not change malloc to provide the alignment requirements one would need for the case in this PR (see http://sources.redhat.com/bugzilla/show_bug.cgi?id=206): --------------------------- MALLOC_ALIGNMENT in glibc is really not going to change, it is a quality of implementation, sure, by making it bigger the quality implementation would drop a lot for most of the programs out there. SSE types are certainly out of the scope of the C standard. GCC allows to create objects with arbitrary alignment, not just __m128, but you can use say: __attribute__((aligned (256))). With the same argumentation, you could request that all malloc memory is 256 bytes aligned (or 4K or whatever you choose). If you want to make C++ new working on these types, the compiler will simply need to do its part and call some (non-standard) new operator with additional alignment argument, which would in turn call posix_memalign, perhaps guarded with some compiler option which will otherwise result in a compile time error if new is used on types with too big alignment requirement. -------------------------------- In other words, the ball is back in our field. Anyone got ideas as to what to do? How do other compilers do this? (I just verified that icc 8.0 also segfaults on the small testcase in comment #11, but maybe some compiler is more clever there...) W.
if the glibc folks are unwilling to support a feature of the cpu, so be it. The compiler can only work around that by pessimizing all allocations. I understand glibc's reluctance to support arbitrary alignments, but alignments imposed by the CPU are different. Those are system requirements, and glibc is a system library, which should support them. BTW, there are many things in glibc that are outside the C standard.
I think their argument is that the compiler knows about the alignment of a type and should make use of this. Now, operator new is in libstdc++ and presumably calls malloc directly, so that's a little more complicated (operator new only gets passed a size, no alignment, so we can't play games with __alignof__), but there may still be ways. My feeling is that glibc should fix this, not the compiler. For the moment we seem to be at an impasse, though :-( W.
Here is a more elegant workaround: #include <xmmintrin.h> #include <stdio.h> #include <stdlib.h> #include <new> class Vector4 { public: Vector4() { vec = _mm_setzero_ps(); } void *operator new (size_t); void *operator new[] (size_t size) { return operator new (size); } __m128 vec; }; void *Vector4::operator new (size_t size) { void *p; int r = posix_memalign (&p, __alignof (Vector4), size); if (r) throw std::bad_alloc (); return p; } int main(int argc, char **argv) { Vector4 *foo = new Vector4[200*200]; delete foo; }
Subject: Re: No way to teach operator new anything about alignment requirements And again, without the extra whitespace (sorry): #include <xmmintrin.h> #include <stdio.h> #include <stdlib.h> #include <new> class Vector4 { public: Vector4() { vec = _mm_setzero_ps(); } void *operator new (size_t); void *operator new[] (size_t size) { return operator new (size); } __m128 vec; }; void *Vector4::operator new (size_t size) { void *p; int r = posix_memalign (&p, __alignof (Vector4), size); if (r) throw std::bad_alloc (); return p; } int main(int argc, char **argv) { Vector4 *foo = new Vector4[200*200]; delete foo; }
Subject: Re: No way to teach operator new anything about alignment requirements It would be possible for the compiler to implement this workaround transparently, as an extension; i.e. if the new'd type has alignment greater than MALLOC_ALIGN, use something like operator new (size_t size, enum align_tag, size_t align); i.e. translate new Vector4[200]; into new (__gnu_cxx::aligned, __alignof (Vector4)) Vector4[200]; But this would interact badly with overriding the default operator new. I think that if glibc is unwilling to change the default alignment for malloc, it doesn't make any sense for libstdc++ to change it in operator new. The above workaround stands a good chance of breaking things if implemented transparently. So I think that the way to fix this problem is for users to implement the workaround, possibly relying on the above operator new signature, to be provided by libstdc++. Vector4::operator new would then be simplified to void *operator new (size_t size) { return operator new (size, __gnu_cxx::aligned, __alignof (Vector4)); } Thoughts? Jason
Well, the workaround is a good idea in general, however, this program still segfaults... #define _XOPEN_SOURCE 600 #include <xmmintrin.h> #include <stdio.h> #include <stdlib.h> class Vector4 { public: __m128 vec; Vector4() { vec = _mm_setzero_ps(); } void *operator new (unsigned int s) { void *p; posix_memalign(&p, 16, s); return p; } void *operator new[] (unsigned int s) { return operator new (s); } }; class Foo { public: Vector4 b; Foo() {} }; int main(int argc, char **argv) { Foo *f = new Foo[200*200]; delete f; }
Subject: Re: No way to teach operator new anything about alignment requirements On 8 Jun 2004 18:45:04 -0000, "ma1flfs at bath dot ac dot uk" <gcc-bugzilla@gcc.gnu.org> wrote: > Well, the workaround is a good idea in general, however, this program still > segfaults... Because you also need to override operator new for Foo, and any other type which requires alignment greater than that provided by malloc. Jason
I we are to provide a workaround, I think that the signature you propose, i.e. operator new (size_t size, enum align_tag, size_t align); may not be a good idea, the second and third argument being an integer. This is just way too common and asking for trouble. Why not struct AlignTag { AlignTag (int alignment); //... }; operator new (size_t, AlignTag) and convert new Vector4[200]; into new (__gnu_cxx::AlignTag(__alignof (Vector4)) Vector4[200]; Having something like template <int> struct Alignment {}; template <int N> operator new (size_t, Alignment<N>); may also be a neat possibility if we use SFINAE to provide template <int N> typename SFINAE<(N<=STD_ALIGNMENT),void*> operator new (size_t,Alignment<T>) and the opposite case as two overloads, and let the compiler pick which one it wants to call. This way we can switch at compile time which allocation function shall be called. But I think I'm carried away... ;-) W.
Subject: Re: No way to teach operator new anything about alignment requirements On Tuesday 08 June 2004 19:04, jason at redhat dot com wrote: > Because you also need to override operator new for Foo, and any other type > which requires alignment greater than that provided by malloc. That's my point. Doing this would suck very much... And is sometimes not possible. Think template classes, STL or Qt are good examples.
Subject: Re: No way to teach operator new anything about alignment requirements On 8 Jun 2004 19:25:27 -0000, "bangerth at dealii dot org" <gcc-bugzilla@gcc.gnu.org> wrote: > I we are to provide a workaround, I think that the signature you > propose, i.e. > operator new (size_t size, enum align_tag, size_t align); > may not be a good idea, the second and third argument being an integer. This > is just way too common and asking for trouble. I don't think it would actually be a problem, since enums are distinct types. But I understand your concern, since they do promote to integers, and I'm not opposed to using a struct instead. > Having something like > template <int> struct Alignment {}; > template <int N> operator new (size_t, Alignment<N>); Or template <typename T> struct Alignment { static const size_t alignment = __alignof (T); }; template <typename T> inline void * operator new (size_t size, Alignment<T> align) { return align_new (size, align.alignment); } ... new (Alignment<Vector4>) Vector4[200]; Yes, that seems like a cleaner syntax for explicit placement new. > may also be a neat possibility if we use SFINAE to provide > template <int N> > typename SFINAE<(N<=STD_ALIGNMENT),void*> operator new (size_t,Alignment<T>) > and the opposite case as two overloads, and let the compiler pick which > one it wants to call. This way we can switch at compile time which allocation > function shall be called. But I think I'm carried away... ;-) I'm not familiar with SFINAE. Jason
Subject: Re: No way to teach operator new anything about alignment requirements On 8 Jun 2004 19:27:26 -0000, "ma1flfs at bath dot ac dot uk" <gcc-bugzilla@gcc.gnu.org> wrote: > On Tuesday 08 June 2004 19:04, jason at redhat dot com wrote: >> Because you also need to override operator new for Foo, and any other type >> which requires alignment greater than that provided by malloc. > > That's my point. Doing this would suck very much... And is sometimes not > possible. Think template classes, STL or Qt are good examples. Feel free to propose an alternative solution. Jason
Subject: Re: No way to teach operator new anything about alignment requirements On Tuesday 08 June 2004 20:06, jason at redhat dot com wrote: > > That's my point. Doing this would suck very much... And is sometimes not > > possible. Think template classes, STL or Qt are good examples. > > Feel free to propose an alternative solution. Ok, sorry for that, I am just incredible annoyed by this "feature"... I think I will patch my glibc sometime late, and see if it works, and how severe the penalties really are. Wolfgang Bangerth wrote: > In other words, the ball is back in our field. Anyone got ideas as to what > to do? How do other compilers do this? I got a friend to compile the program from comment #11 on windoze with visual studio .net. It segfaults as well.
Yeah, I was about to ask what other OSes do. From http://216.239.59.104/search?q=cache:itStYuxiBbQJ:www.ddj.com/dotnetbook/chapters/chapter15.pdf+malloc+alignment+SSE&hl=en&ie=UTF-8 and a bunch of other docs google found me it seems MSFT is returning 8 byte aligned objects from malloc like glibc and there are special interfaces for bigger alignment - _aligned_malloc, _aligned_realloc, and _aligned_free (like posix_memalign in POSIX). On Solaris for 32-bit programs things are also 8 byte aligned only, eventhough there is an instruction which requires 64 byte alignment.
The SFINAE ('substitution failure is not an error') thing would have been something along these lines: --------------------- #include <new> template <bool C, typename T> struct SFINAE; template <typename T> struct SFINAE<true,T> { typedef T type; }; template <typename T> struct Alignment { static const size_t alignment = __alignof (T); }; // allocate types with alignment less than or equal to 8 bytes template <typename T> typename SFINAE<(Alignment<T>::alignment<=8), void *>::type operator new (size_t sz, Alignment<T>) { return malloc (sz); } // allocate types with alignment or more than 8 bytes template <typename T> typename SFINAE<(Alignment<T>::alignment>8), void *>::type operator new (size_t sz, Alignment<T>) { return posix_memalign (sz); } ------------------------ Since the SFINAE structure is only declared for a true first argument, substition of the return type of the two operators only succeeds if the condition is true. Thus, new(Alignment<double>) double; calls the first version, while new(Alignment<__m128>) __m128; would call the second version. However, this switch could of course also be placed in the code of a (single) function, and be statically optimized away by the compiler as it knows the alignment at compile time. The fact that above code snippet doesn't compile (even if all the functions we call were declared) is a separate matter for which I have just filed a separate PR. W.
I should note that malloc on Mac OS X always returns 16 byte aligned (for altivec, even though some CPUs it runs on does not have altivec).
*** Bug 19432 has been marked as a duplicate of this bug. ***
Additionally, you can petition ISO/C++ to provide a more elegant solution for you. VxWorks also does 16-byte alignment on ppc (for altivec) as I recall.
Gah: just spent several hours trying to figure out why my malloced __v4sf weren't 16 byte aligned before I stumbled on this thread. Would be nice if the info gcc "Using vector instructions through built-in functions" section contained a big warning about the issue.
*** Bug 38063 has been marked as a duplicate of this bug. ***
Yes, at least the manual should be updated to reflect this non-obvious behavior. Possible fixes for the programmer: 1) Overload operators new. new[] for a class wrapping the vector datatypes. It works as long as you allocate memory through explicit new, however it fails for buffers allocated by STL's default allocator. 2) Overload the allocators for a wrapper class and select a specific allocator for STL arrays etc. of that class. Haven't tried that yet. 3) Replace the global ::new and ::new[] by something calling memalign instead of malloc.
You may want to take a look at PR 36159.
*** Bug 38455 has been marked as a duplicate of this bug. ***
*** Bug 69748 has been marked as a duplicate of this bug. ***
Just a quick note on this one, C++17 fixed this issue all together with C++20 requiring allocators also to handle the alignment. So changing this to moved really. Also GCC 11 defaults to C++17.
C++17 added support for dynamic allocation of over-aligned types, and requires std::allocator to use it. User-defined allocators are not required to support over-aligned types. Before C++17 it was implementation-defined whether std:allocator supports them. When using GCC they are supported when -faligned-new is used (which is the default for C++17 and can be enabled for earlier modes if needed).