Summary: | Alignment issues with real(16) on i686 | ||
---|---|---|---|
Product: | gcc | Reporter: | Jakub Jelinek <jakub> |
Component: | fortran | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | guillaume, Joost.VandeVondele, jvdelisle2, thatcadguy |
Priority: | P3 | Keywords: | wrong-code |
Version: | 4.8.0 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2013-01-09 00:00:00 |
Description
Jakub Jelinek
2013-01-09 11:47:15 UTC
Or inline the re-alignment code (and allocate one more element). That will allow for the most optimization opportunities. posix_memalign unfortunately has a weird enough interface that we can't easily extract alignment for the pointer. (In reply to comment #1) > posix_memalign unfortunately has a weird enough interface that we can't > easily extract alignment for the pointer. Is it just because it outputs through a pointer? Are aligned_alloc, memalign and others easier then? (In reply to comment #2) > (In reply to comment #1) > > posix_memalign unfortunately has a weird enough interface that we can't > > easily extract alignment for the pointer. > > Is it just because it outputs through a pointer? Are aligned_alloc, memalign > and others easier then? Yes. (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > posix_memalign unfortunately has a weird enough interface that we can't > > > easily extract alignment for the pointer. > > > > Is it just because it outputs through a pointer? Are aligned_alloc, memalign > > and others easier then? > > Yes. We can of course apply the same trick as we do for sincos - transform it to ({ void *tem; int ret; tem = __builtin_gcc_posix_memalign (&res, alignment, size); *memptr = tem; ret; }) and then expand it to RTL as posix_memalign using another memory temporary ... The issue with such return is that we cannot see all possible uses of the return-by-reference value which is easy to do if it is the actual return value. Possibly an alloc_align attribute would be useful as well. > As __float128 is outside of the scope of C/C++, this is not a bug on the C library side.
It might be outside the scope of C/C++ but doesn't C99 allow some implementation defined types?
On Wed, 9 Jan 2013, pinskia at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55916 > > --- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> 2013-01-09 16:09:25 UTC --- > > As __float128 is outside of the scope of C/C++, this is not a bug on the C library side. > > It might be outside the scope of C/C++ but doesn't C99 allow some > implementation defined types? C99 has no concept of types with extended alignment requirements. I've commented before on how __int128 needs to be thought of as a "sui generis extended type" not "extended integer type" because while it has some of the properties of an integer type, it's wider than intmax_t which forms part of the C library ABI so isn't that easy to change to make 128 bits. I suppose in fact such a sui generis extended type also isn't strictly an object type (just acts like one) in order to follow the "suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated" in C99. For C11 the wording is different - "suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated". Fundamental alignments are defined in 6.2.8#2 "A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to _Alignof (max_align_t).". Extended alignments are defined in the following paragraph (but over-aligned types aren't actually possible within the C11 syntax because the syntax for structure and union elements omits to include the possibility of _Alignas specifications). The meaning of "supported by the implementation in all contexts" isn't exactly clear. One might guess that it includes at least static objects, stack objects and malloc. Nothing actually seems to say what alignments are required to be supported by the implementation in all contexts. 6.2.8#4 says "Valid alignments include only those values returned by an _Alignof expression for fundamental types, plus an additional implementation-defined set of values, which may be empty.", but (a) "Valid" may not mean "supported in all contexts" and (b) as noted recently on comp.std.c, the C standard has no definition of "fundamental type", although it does define "basic type". The DFP specification, at least, defines DFP types in a way that would make them basic types (by text that would be inserted in the C standard to describe DFP), which certainly suggests that malloc should return memory suitably aligned for them. I imagine that when the IEEE 754-2008 bindings reach the point of defining _Float128, the same will apply there. I'm sure 32-bit powerpc glibc users will be glad if x86 people work out how to increase malloc alignment to 16-byte without breaking existing emacs binaries, since it's a longstanding problem <http://sourceware.org/bugzilla/show_bug.cgi?id=6527> that malloc there returns memory insufficiently aligned for long double. *** Bug 60088 has been marked as a duplicate of this bug. *** These issues affect probably all versions of gfortran, but I have verified this occurs with 4.6.4, 4.7.3, 4.8.1, and trunk revision 207606. It does seem unlikely that GNU libc malloc() will ever be modified to return sufficiently aligned pointers for vector types (or for long double on 32-bit ppc). I have a patch under testing against trunk that modifies libgfortran internal xmalloc() and xcalloc() calls, as well as the intrinsic malloc() calls, to use GNU libc specific memalign() call. I will attach it as soon as I verify it works correctly. If using memalign() is not acceptable, say so. The reason for using memalign() instead of posix_memalign() is that my (limited!) testing indicates that using memalign() has the least overhead. posix_memalign() has relatively larger overhead, possibly due to its call mechanism. aligned_alloc() is not a real possibility for now, because it was only added to glibc-2.16 (summer 2012). Computing clusters often have older libc versions, and requiring glibc-2.16 and newer would stop binaries compiled on newer gfortran versions from working. (And binaries are often compiled on newer compilers.) GCC generated code seems to assume malloc() returns a pointer aligned to at least __BIGGEST_ALIGNMENT__. Therefore marking the allocation functions with the malloc attribute (or perhaps with malloc and alloc_size) should be sufficient for GNU Fortran. (I shall also do some experiments with __builtin_assume_aligned() to see if there is any impact to the generated code, but I don't see any reason there should be. The compilation units are separate; AFAICT GCC must determine the pointer alignment from the pointer type only. That also matches current behaviour, and is the reason this bug exists.) Obviously, memalign() is obsolete, but I see the performance outweighing that here. After all, removing memalign() in a future glibc would be seen as petty, considering how tenderly emacs users have been treated; tolerating bugs for years, in order to not break existing emacs state dumps. My hardware selection is very limited, but if someone wishes to test the possibilities on other hardware, I'd be happy to clean up my benchmarking code. > My hardware selection is very limited, but if someone wishes to test the
> possibilities on other hardware, I'd be happy to clean up my benchmarking code.
I can do the testing on darwin (intel d10 and d13, ppc d9).
(In reply to Dominique d'Humieres from comment #9) > I can do the testing on darwin (intel d10 and d13, ppc d9). That would be very much appreciated. I've put a tarball and some background info on my web page at http://www.helsinki.fi/~joorava/memory/ Simply put, on my machine, 64-bit code has very little slowdown if memalign() (or even posix_memalign()) is used instead of malloc(). Compiling 32-bit code, both memalign() and posix_memalign() incur a 2x to 4x more CPU time, with posix_memalign() having a more or less fixed overhead compared to memalign(). As I mention on the web page, I only wrote this to have a reason to select a specific function. To me, the results indicate memalign() is the second-best choice (best being "fixing" malloc()). If you find any errors, or have results I may add to the above page (whether or not they contradict mine), I'd be happy to include your notes there. Just in case there are users who encounter this bug report, here is the single-file workaround. Save the following as nosegfault.c, and follow the instructions in the comments. It can be compiled into a shared library used e.g. via LD_PRELOAD, or into an object file and linked directly to a C or GNU Fortran program, to fix the issue. (It is trivial, so I consider it to be either uncopyrightable, or if it is copyrightable, in the public domain. It should still be smart enough to work on all architectures, as long as GCC's __BIGGEST_ALIGNMENT__ is reliable.) _______________________________________________________________________ #include <stdlib.h> #include <string.h> #include <malloc.h> #include <errno.h> /* Compile to a dynamic library: * gcc $FLAGS -fPIC -shared nosegfault.c \ * -Wl,-soname,libnosefault.so -o libnosegfault.so * Install the dynamic library at e.g. /usr/local/lib/, * sudo install -m 0644 libnosegfault.so /usr/local/lib/ * and you can use it via e.g. * env LD_PRELOAD=/usr/local/lib/libnosegfault.so COMMAND [ ARGS .. ] * * Compile this into an object file: * gcc $FLAGS -c nosegfault.c * compile your application, adding * nosegfault.o * to the build or final linking command; for example, * gfortran $FLAGS .... nosegfault.o -o program * * $FLAGS may be empty, or contain your preferred optimization flags. * If you are compiling 32-bit code, remember to include -m32 in it. * * If you compile static binaries, use a copy of the libc.a file * (mentioned in the error if you try to compile with the object file * statically), with malloc, calloc, __libc_malloc, and __libc_calloc * replaced by the objects in this file, using e.g. 'ar'. */ #if __BIGGEST_ALIGNMENT__ <= 8 # error "You don't need this, or your compiler options are wrong." #elif __BIGGEST_ALIGNMENT__ <= 16 # define ALIGNMENT 16 #elif __BIGGEST_ALIGNMENT__ <= 32 # define ALIGNMENT 32 #elif __BIGGEST_ALIGNMENT__ <= 64 # define ALIGNMENT 64 #else # error "Your architecture has insane alignment requirements." #endif void *malloc(size_t size) { return memalign(ALIGNMENT, size); } void *__libc_malloc(size_t size) { return memalign(ALIGNMENT, size); } void *calloc(size_t count, size_t size) { if (!count || !size) { return NULL; } else { const size_t n = size * count; if ((size_t)(n / size) == count) { void *p = memalign(ALIGNMENT, n); if (!p) return NULL; memset(p, 0, n); return p; } errno = ENOMEM; return NULL; } } void *__libc_calloc(size_t count, size_t size) { if (!count || !size) { return NULL; } else { const size_t n = size * count; if ((size_t)(n / size) == count) { void *p = memalign(ALIGNMENT, n); if (!p) return NULL; memset(p, 0, n); return p; } errno = ENOMEM; return NULL; } } /* The End. */ (In reply to Jouko Orava from comment #8) > These issues affect probably all versions of gfortran, > I have a patch under testing against trunk that modifies > libgfortran internal xmalloc() and xcalloc() calls, as well > as the intrinsic malloc() calls, to use GNU libc specific > memalign() call. I will attach it as soon as I verify it > works correctly. I'm wondering if you had a working patch at some point. There are other PRs that can only be solved if some alignment better than malloc is used in gfortran. PR64247 and PR68101 that talk about non-reproducible results and slow performance with avx/vector instructions respectively. (In reply to Joost VandeVondele from comment #12) > I'm wondering if you had a working patch at some point. Not a comprehensive one, no. I recall modifying libgfortran/runtime/memory.c was not sufficient. I ended up going pretty deep into how GCC generates the code in an effort to find out how to replace *all* calls to malloc(), free(), calloc(), and realloc() with fortran_malloc(), fortran_free(), fortran_calloc(), and fortran_realloc(), respectively, that GCC generates for Fortran code, with libgfortran initially providing wrappers around the respective C library calls. That way it would be possible to automagically choose whether the C library standard allocator was sufficient at the GCC compile time (based on maximum alignment versus largest non-vector type supported by the architecture), or if a higher-alignment allocator (memalign(), posix_memalign()) would be used instead. Or, better yet, even replace the C library allocator with a custom one, one that better caters to Fortran use cases. At some point I realized there was about zero percent chance such invasive changes would be accepted upstream, so I gave up. The workaround I showed in #11, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55916#c11 suffices for me, after all; it basically does the replacement by interposing those functions on the C library side instead. It can be used either at compile time (by linking the C code into the final binary), or at run time for already-compiled binaries (by using LD_PRELOAD). |