alignment issues for sse

Brian Budge brian.budge@gmail.com
Thu Feb 17 16:35:00 GMT 2005


Hi guys -

I use posix_memalign to allocate memory on the heap... this works
great.  My problem now is that I'm declaring variables on the stack
and they're not being aligned.

So I wrote a little class which will soon (hopefully) be thread global
(using __thread), that uses posix_memalign to basically allocate a
stack of aligned addresses that I can use.

I am doing this because I'm under the impression that doing a malloc
and free in each call would be rather costly.

If (when!) I get it to work, I'll post it. 

Corey, you talk about advancing the pointer until it is aligned... is
there a trick like that for the stack?

  Brian


On Thu, 17 Feb 2005 12:10:47 +0200 (EET), Kimmo Fredriksson
<kfredrik@cs.joensuu.fi> wrote:
> On Wed, 16 Feb 2005, corey taylor wrote:
> 
> > I see the definition, but I've never seen any documentation on them.
> > Can you point to some useful documentation?
> 
> See Intel C++ Compiler User's Guide (copy-pasted):
> 
>   Use the _mm_malloc and _mm_free intrinsics to allocate and free aligned
>   blocks of memory. These intrinsics are based on malloc and free, which
>   are in the libirc.a library. You need to include malloc.h. The syntax for
>   these intrinsics is as follows:
> 
>   void* _mm_malloc (int size, int align)
> 
>   void _mm_free (void *p)
> 
>   The _mm_malloc routine takes an extra parameter, which is the alignment
>   constraint. This constraint must be a power of two. The pointer that is
>   returned from _mm_malloc is guaranteed to be aligned on the specified
>   boundary.
> 
>   Note
> 
>   Memory that is allocated using _mm_malloc must be freed using _mm_free .
>   Calling free on memory allocated with _mm_malloc or calling _mm_free on
>   memory allocated with malloc will cause unpredictable behavior.
> 
> From gcc's version of xmmintrin.h:
> 
> /* Implemented from the specification included in the Intel C++ Compiler
>     User Guide and Reference, version 8.0.  */
> 
> So gcc implements these for icc compatibility.
> 
> K
> 
> >
> > Currently, we develop for many platforms, so portability is better in
> > most instances although we do use MMX and some SSE for speed where
> > available.
> >
> > corey
> >
> >
> > On Thu, 17 Feb 2005 01:51:44 +0200 (EET), Kimmo Fredriksson
> > <kfredrik@cs.joensuu.fi> wrote:
> >> Hi,
> >>
> >> [Disclaimer: I haven't really been following this discussion...]
> >>
> >> On Wed, 16 Feb 2005, corey taylor wrote:
> >>
> >>>  However, after looking into the current public project I'm on, I
> >>> realize that it doesn't use SSE for the allocation.  It simply
> >>> advances to an aligned location and manually forces the alignment,
> >>> hides the actual allocation pointer, and returns the aligned pointer.
> >>
> >> Why not use:
> >>
> >> void * _mm_malloc (size_t size, size_t alignment)
> >> void _mm_free (void * ptr)
> >>
> >> ?
> >>
> >> Defined in xmmintrin.h (I think).
> >>
> >>> On Wed, 16 Feb 2005 17:58:15 +0100, Brian Budge <brian.budge@gmail.com> wrote:
> >>
> >>>> On Wed, 16 Feb 2005 10:46:54 -0600, corey taylor <corey.taylor@gmail.com> wrote:
> >>>>> Implementation's I've used and worked on always do aligned allocations
> >>>>> manually.  Typically the hidden and real sizes of the allocation are
> >>>>> put into the memory allocation itself and the returned pointer is
> >>>>> incremented a few bytes.  The downside to this is that you must be
> >>>>> strict in using the aligned free routine also.
> >>
> >> See above.
> >>
> >>>>> On Wed, 16 Feb 2005 10:09:27 -0600, Eljay Love-Jensen <eljay@adobe.com> wrote:
> >>
> >>>>>>> But surely thousands of people are writing sse code... how do they make
> >>>>>> it work?
> >>>>>>
> >>>>>> I presume by taking measures to assure the SSE structs are properly
> >>>>>> aligned.
> >>>>>>
> >>>>>>> Do I need to switch to the intel compiler/linker?
> >>>>>>
> >>>>>> I do not know.
> >>
> >> I do not know either, but that was my solution...
> >>
> >> But: my sse code used to work just fine with gcc. Then something happened,
> >> and I just get seg faults. Don't remember exactly anymore, but I think at
> >> the time it actually worked with gcc, I was using some early gcc 3.4
> >> snapshot, since it was the only one that worked. No version before, no
> >> version after (that I have tried, excluding e.g. 4.0)... And of course
> >> there is also the possibility that something else changed, I do/did
> >> something wrong, etc. Anyways, currently I use icc for sse code, and use
> >> _mm_malloc/_mm_free for dynamic allocation, statics are automagically 16
> >> byte aligned.
> >>
> >> For other things, I still use mostly gcc.
> >>
> >> K
> >>
> >>
> >
>



More information about the Gcc-help mailing list