To be 100% compatible with Intel C++ Compiler and MS VC++, there should be the functions void * _mm_malloc(size_t size, size_t alignment) and void _mm_free(void * ptr) in xmmintrin.h. However these are missing. Suggestion: add the following piece of code to xmmintrin.h (see further attachment).
Created attachment 6754 [details] Contains the code that should be added to xmmintrin.h
Is there a portable way to implement _mm_malloc?
Sure. Allocate ALIGN bytes more than needed, clear the first ALIGN bytes, mark the first byte with 1, and round up to proper alignment. On freeing you can then scan back to the mark byte.
Isn't posix_memalign() portable enough ?
No posix_memalign is not potable enough as Windows does not provide it.
Created attachment 6760 [details] Implementation following Falk's idea + a small test case. Well, I implemented Falk's idea with the marker. I tested the implementation with a small test case, so far it's working. IMHO The marker is not a good solution with big alignments (too slow and takes a lot of mem), but since _mm_malloc() is generally used for alignement like 16 it's not a real issue here. For Windows: MS' malloc.h provides _aligned_malloc() and _aligned_free(). Thus we can either go for the marker idea, or use preprocessor conditional paths using either _aligned_malloc/_aligned_free or posix_memalign/free.
Created attachment 6761 [details] corrected an evil bug in _mm_free if called with ptr = 0.
Created attachment 6762 [details] ok, fixed more bugs. Fixed corner case: alignement = 0. Fixed _mm_free starting at ptr and not (ptr - 1). Extended the test cases. Now, it should be ok (I certainly hope so). Tested the whole thing also on Visual C++ 7.1 in Debug mode to be sure.
Hi, A version of _aligned_malloc and friends was submitted to mingw project and declared as public domain. I have been using it for some time with no problem. It works on i686 linux and mingw. You may want to look at it for a portable implemtation that could be used as mm_malloc. It would also be useful for an overloaded (aligned) new/delete implementation See the latest file attachments at this link: https://sourceforge.net/tracker/index.php? func=detail&aid=668224&group_id=2435&atid=102435 Danny
I think we should have 2 versions. One uses posix_memalign and the other one doesn't. A target can pick the best one.
> I think we should have 2 versions. One uses posix_memalign and the other one > doesn't. A target can pick the best one. Agreed. As for the "other one", I like the _aligned_malloc implementation posted by Danny, even though it's using more memory (from 32 to 64 bits) than the marker idea.
Created attachment 6787 [details] An imcomplete patch This patch provides the posix_memalign version of <mm_malloc.h>. Someone can fill in the generic version.
Created attachment 6794 [details] An updated patch I fixed a few minor problems in my previous patch.
Created attachment 6808 [details] Generic i386 _mm_malloc
Created attachment 6812 [details] A complete patch for _mm_malloc/_mm_free Please give it a try. Can someone please come up with a testcase? Thanks.
An updated patch with testcase is posted at http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00053.html
Subject: Bug 16570 CVSROOT: /cvs/gcc Module name: gcc Changes by: hjl@gcc.gnu.org 2004-08-03 19:52:52 Modified files: gcc : ChangeLog config.gcc gcc/config/i386: xmmintrin.h gcc/testsuite : ChangeLog Added files: gcc/config/i386: gmm_malloc.h pmm_malloc.h t-gmm_malloc t-pmm_malloc gcc/testsuite/gcc.dg: i386-sse-9.c Log message: gcc/ 2004-08-03 H.J. Lu <hongjiu.lu@intel.com> PR target/16570 * config.gcc (i[34567]86-*-* | x86_64-*-*): Add i386/t-gmm_malloc to tmake_file. (i[34567]86-*-linux*aout* | i[34567]86-*-linux*libc1): Likewise. (i[34567]86-*-linux* | x86_64-*-linux*): Add i386/t-pmm_malloc to tmake_file. * config/i386/t-gmm_malloc: New file. * config/i386/t-pmm_malloc: Likewise. * config/i386/xmmintrin.h: Include <mm_malloc.h>. 2004-08-03 H.J. Lu <hongjiu.lu@intel.com> Tanguy Fautrà <tfautre@pandora.be> * config/i386/pmm_malloc.h: New file. 2004-08-03 Danny Smith <dannysmith@users.sourceforge.net> * config/i386/gmm_malloc.h: New file. gcc/testsuite/ 2004-08-03 H.J. Lu <hongjiu.lu@intel.com> PR target/16570 * gcc.dg/i386-sse-9.c: New test. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.4780&r2=2.4781 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config.gcc.diff?cvsroot=gcc&r1=1.475&r2=1.476 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/gmm_malloc.h.diff?cvsroot=gcc&r1=NONE&r2=1.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/pmm_malloc.h.diff?cvsroot=gcc&r1=NONE&r2=1.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/t-gmm_malloc.diff?cvsroot=gcc&r1=NONE&r2=1.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/t-pmm_malloc.diff?cvsroot=gcc&r1=NONE&r2=1.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/xmmintrin.h.diff?cvsroot=gcc&r1=1.29&r2=1.30 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4088&r2=1.4089 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/i386-sse-9.c.diff?cvsroot=gcc&r1=NONE&r2=1.1
Fixed.