Bug 55916 - Alignment issues with real(16) on i686
Summary: Alignment issues with real(16) on i686
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 60088 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-01-09 11:47 UTC by Jakub Jelinek
Modified: 2015-12-06 10:09 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-01-09 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2013-01-09 11:47:15 UTC
The following testcase with -m32 -msse4 -g -O3 -fno-inline may segfault (depending on whether malloc returns just 8-byte or 16-byte aligned pointer).

  real(16), allocatable ::  aaaa(:)
  real(16) :: x(10)
  integer :: i,n=10

  allocate(aaaa(n))
  do i = 1,n
    aaaa(i) = 8.0
  enddo
  call foo (aaaa)
  deallocate(aaaa)
contains
  subroutine foo (aaaa)
    real(16), allocatable :: aaaa(:)
    aaaa(1) = aaaa(1) + 1
  end subroutine
end

The problem is that __float128 aka real(16) has alignment 128 bits, but on i686-linux malloc only guarantees 64 bit alignment (2 * sizeof (void *) byte alignment).  As __float128 is outside of the scope of C/C++, this is not a bug on the C library side.
Guess for real(16) allocatables or any allocatables that require alignment bigger than that of standard types Fortran should use posix_memalign (or perhaps some libgfortran wrapper) that will be passed not just the size to allocate, but also required alignment.
Comment 1 Richard Biener 2013-01-09 11:55:09 UTC
Or inline the re-alignment code (and allocate one more element).  That will
allow for the most optimization opportunities.

posix_memalign unfortunately has a weird enough interface that we can't
easily extract alignment for the pointer.
Comment 2 Marc Glisse 2013-01-09 12:25:43 UTC
(In reply to comment #1)
> posix_memalign unfortunately has a weird enough interface that we can't
> easily extract alignment for the pointer.

Is it just because it outputs through a pointer? Are aligned_alloc, memalign  and others easier then?
Comment 3 Richard Biener 2013-01-09 13:30:27 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > posix_memalign unfortunately has a weird enough interface that we can't
> > easily extract alignment for the pointer.
> 
> Is it just because it outputs through a pointer? Are aligned_alloc, memalign 
> and others easier then?

Yes.
Comment 4 Richard Biener 2013-01-09 13:38:12 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > posix_memalign unfortunately has a weird enough interface that we can't
> > > easily extract alignment for the pointer.
> > 
> > Is it just because it outputs through a pointer? Are aligned_alloc, memalign 
> > and others easier then?
> 
> Yes.

We can of course apply the same trick as we do for sincos - transform it to

({ void *tem; int ret;
   tem = __builtin_gcc_posix_memalign (&res, alignment, size);
   *memptr = tem;
   ret; })

and then expand it to RTL as posix_memalign using another memory temporary ...

The issue with such return is that we cannot see all possible uses of the
return-by-reference value which is easy to do if it is the actual return
value.

Possibly an alloc_align attribute would be useful as well.
Comment 5 Andrew Pinski 2013-01-09 16:09:25 UTC
> As __float128 is outside of the scope of C/C++, this is not a bug on the C library side.

It might be outside the scope of C/C++ but doesn't C99 allow some implementation defined types?
Comment 6 joseph@codesourcery.com 2013-01-09 20:57:13 UTC
On Wed, 9 Jan 2013, pinskia at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55916
> 
> --- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> 2013-01-09 16:09:25 UTC ---
> > As __float128 is outside of the scope of C/C++, this is not a bug on the C library side.
> 
> It might be outside the scope of C/C++ but doesn't C99 allow some
> implementation defined types?

C99 has no concept of types with extended alignment requirements.  I've 
commented before on how __int128 needs to be thought of as a "sui generis 
extended type" not "extended integer type" because while it has some of 
the properties of an integer type, it's wider than intmax_t which forms 
part of the C library ABI so isn't that easy to change to make 128 bits.  
I suppose in fact such a sui generis extended type also isn't strictly an 
object type (just acts like one) in order to follow the "suitably aligned 
so that it may be assigned to a pointer to any type of object and then 
used to access such an object or an array of such objects in the space 
allocated" in C99.

For C11 the wording is different - "suitably aligned so that it may be 
assigned to a pointer to any type of object with a fundamental alignment 
requirement and then used to access such an object or an array of such 
objects in the space allocated".  Fundamental alignments are defined in 
6.2.8#2 "A fundamental alignment is represented by an alignment less than 
or equal to the greatest alignment supported by the implementation in all 
contexts, which is equal to _Alignof (max_align_t).".  Extended alignments 
are defined in the following paragraph (but over-aligned types aren't 
actually possible within the C11 syntax because the syntax for structure 
and union elements omits to include the possibility of _Alignas 
specifications).

The meaning of "supported by the implementation in all contexts" isn't 
exactly clear.  One might guess that it includes at least static objects, 
stack objects and malloc.

Nothing actually seems to say what alignments are required to be supported 
by the implementation in all contexts.  6.2.8#4 says "Valid alignments 
include only those values returned by an _Alignof expression for 
fundamental types, plus an additional implementation-defined set of 
values, which may be empty.", but (a) "Valid" may not mean "supported in 
all contexts" and (b) as noted recently on comp.std.c, the C standard has 
no definition of "fundamental type", although it does define "basic type".  
The DFP specification, at least, defines DFP types in a way that would 
make them basic types (by text that would be inserted in the C standard to 
describe DFP), which certainly suggests that malloc should return memory 
suitably aligned for them.  I imagine that when the IEEE 754-2008 bindings 
reach the point of defining _Float128, the same will apply there.

I'm sure 32-bit powerpc glibc users will be glad if x86 people work out 
how to increase malloc alignment to 16-byte without breaking existing 
emacs binaries, since it's a longstanding problem 
<http://sourceware.org/bugzilla/show_bug.cgi?id=6527> that malloc there 
returns memory insufficiently aligned for long double.
Comment 7 Uroš Bizjak 2014-02-07 08:03:38 UTC
*** Bug 60088 has been marked as a duplicate of this bug. ***
Comment 8 Jouko Orava 2014-02-08 15:04:22 UTC
These issues affect probably all versions of gfortran,
but I have verified this occurs with 4.6.4, 4.7.3, 4.8.1,
and trunk revision 207606.

It does seem unlikely that GNU libc malloc() will ever be
modified to return sufficiently aligned pointers for vector
types (or for long double on 32-bit ppc).

I have a patch under testing against trunk that modifies
libgfortran internal xmalloc() and xcalloc() calls, as well
as the intrinsic malloc() calls, to use GNU libc specific
memalign() call. I will attach it as soon as I verify it
works correctly.

If using memalign() is not acceptable, say so.

The reason for using memalign() instead of posix_memalign()
is that my (limited!) testing indicates that using memalign()
has the least overhead. posix_memalign() has relatively larger
overhead, possibly due to its call mechanism.

aligned_alloc() is not a real possibility for now, because
it was only added to glibc-2.16 (summer 2012). Computing clusters
often have older libc versions, and requiring glibc-2.16 and newer
would stop binaries compiled on newer gfortran versions from working.
(And binaries are often compiled on newer compilers.)

GCC generated code seems to assume malloc() returns a pointer aligned
to at least __BIGGEST_ALIGNMENT__. Therefore marking the allocation
functions with the malloc attribute (or perhaps with malloc and alloc_size)
should be sufficient for GNU Fortran.

(I shall also do some experiments with __builtin_assume_aligned() to see
 if there is any impact to the generated code, but I don't see any
 reason there should be. The compilation units are separate; AFAICT
 GCC must determine the pointer alignment from the pointer type only.
 That also matches current behaviour, and is the reason this bug exists.)

Obviously, memalign() is obsolete, but I see the performance
outweighing that here. After all, removing memalign()
in a future glibc would be seen as petty, considering how tenderly
emacs users have been treated; tolerating bugs for years, in order to
not break existing emacs state dumps.

My hardware selection is very limited, but if someone wishes to test the
possibilities on other hardware, I'd be happy to clean up my benchmarking code.
Comment 9 Dominique d'Humieres 2014-02-08 15:11:18 UTC
> My hardware selection is very limited, but if someone wishes to test the
> possibilities on other hardware, I'd be happy to clean up my benchmarking code.

I can do the testing on darwin (intel d10 and d13, ppc d9).
Comment 10 Jouko Orava 2014-02-08 18:25:36 UTC
(In reply to Dominique d'Humieres from comment #9)
> I can do the testing on darwin (intel d10 and d13, ppc d9).

That would be very much appreciated.

I've put a tarball and some background info on my web page at
http://www.helsinki.fi/~joorava/memory/

Simply put, on my machine, 64-bit code has very little slowdown
if memalign() (or even posix_memalign()) is used instead of malloc().
Compiling 32-bit code, both memalign() and posix_memalign() incur
a 2x to 4x more CPU time, with posix_memalign() having a more or less
fixed overhead compared to memalign().

As I mention on the web page, I only wrote this to have a reason
to select a specific function. To me, the results indicate memalign()
is the second-best choice (best being "fixing" malloc()).

If you find any errors, or have results I may add to the above page
(whether or not they contradict mine), I'd be happy to include your
notes there.
Comment 11 Jouko Orava 2014-02-08 23:51:43 UTC
Just in case there are users who encounter this bug report,
here is the single-file workaround. Save the following as
nosegfault.c, and follow the instructions in the comments.

It can be compiled into a shared library used e.g. via LD_PRELOAD,
or into an object file and linked directly to a C or GNU Fortran
program, to fix the issue.

(It is trivial, so I consider it to be either uncopyrightable,
 or if it is copyrightable, in the public domain.
 It should still be smart enough to work on all architectures,
 as long as GCC's __BIGGEST_ALIGNMENT__ is reliable.)
_______________________________________________________________________

#include <stdlib.h>
#include <string.h>
#include <malloc.h>
#include <errno.h>

/* Compile to a dynamic library:
 *   gcc $FLAGS -fPIC -shared nosegfault.c \
 *       -Wl,-soname,libnosefault.so -o libnosegfault.so
 * Install the dynamic library at e.g. /usr/local/lib/,
 *   sudo install -m 0644 libnosegfault.so /usr/local/lib/
 * and you can use it via e.g.
 *   env LD_PRELOAD=/usr/local/lib/libnosegfault.so COMMAND [ ARGS .. ]
 *
 * Compile this into an object file:
 *   gcc $FLAGS -c nosegfault.c
 * compile your application, adding
 *   nosegfault.o
 * to the build or final linking command; for example,
 *   gfortran $FLAGS .... nosegfault.o -o program
 *
 * $FLAGS may be empty, or contain your preferred optimization flags.
 * If you are compiling 32-bit code, remember to include -m32 in it.
 *
 * If you compile static binaries, use a copy of the libc.a file
 * (mentioned in the error if you try to compile with the object file
 *  statically), with malloc, calloc, __libc_malloc, and __libc_calloc
 *  replaced by the objects in this file, using e.g. 'ar'.
*/

#if   __BIGGEST_ALIGNMENT__ <= 8
# error "You don't need this, or your compiler options are wrong."
#elif __BIGGEST_ALIGNMENT__ <= 16
# define ALIGNMENT 16
#elif __BIGGEST_ALIGNMENT__ <= 32
# define ALIGNMENT 32
#elif __BIGGEST_ALIGNMENT__ <= 64
# define ALIGNMENT 64
#else
# error "Your architecture has insane alignment requirements."
#endif

void *malloc(size_t size)
{
    return memalign(ALIGNMENT, size);
}

void *__libc_malloc(size_t size)
{
    return memalign(ALIGNMENT, size);
}

void *calloc(size_t count, size_t size)
{
    if (!count || !size) {
        return NULL;
    } else {
        const size_t n = size * count;
        if ((size_t)(n / size) == count) {
            void *p = memalign(ALIGNMENT, n);
            if (!p)
                return NULL;
            memset(p, 0, n);
            return p;
        }
        errno = ENOMEM;
        return NULL;
    }
}

void *__libc_calloc(size_t count, size_t size)
{
    if (!count || !size) {
        return NULL;
    } else {
        const size_t n = size * count;
        if ((size_t)(n / size) == count) {
            void *p = memalign(ALIGNMENT, n);
            if (!p)
                return NULL;
            memset(p, 0, n);
            return p;
        }
        errno = ENOMEM;
        return NULL;
    }
}

/* The End. */
Comment 12 Joost VandeVondele 2015-12-05 14:36:28 UTC
(In reply to Jouko Orava from comment #8)
> These issues affect probably all versions of gfortran,
> I have a patch under testing against trunk that modifies
> libgfortran internal xmalloc() and xcalloc() calls, as well
> as the intrinsic malloc() calls, to use GNU libc specific
> memalign() call. I will attach it as soon as I verify it
> works correctly.

I'm wondering if you had a working patch at some point. There are other PRs that can only be solved if some alignment better than malloc is used in gfortran. PR64247 and PR68101 that talk about non-reproducible results and slow performance with avx/vector instructions respectively.
Comment 13 Jouko Orava 2015-12-06 10:09:10 UTC
(In reply to Joost VandeVondele from comment #12)
> I'm wondering if you had a working patch at some point.

Not a comprehensive one, no. I recall modifying libgfortran/runtime/memory.c was not sufficient.

I ended up going pretty deep into how GCC generates the code in an effort to find out how to replace *all* calls to malloc(), free(), calloc(), and realloc() with fortran_malloc(), fortran_free(), fortran_calloc(), and fortran_realloc(), respectively, that GCC generates for Fortran code, with libgfortran initially providing wrappers around the respective C library calls.

That way it would be possible to automagically choose whether the C library standard allocator was sufficient at the GCC compile time (based on maximum alignment versus largest non-vector type supported by the architecture), or if a higher-alignment allocator (memalign(), posix_memalign()) would be used instead. Or, better yet, even replace the C library allocator with a custom one, one that better caters to Fortran use cases.

At some point I realized there was about zero percent chance such invasive changes would be accepted upstream, so I gave up.

The workaround I showed in #11,
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55916#c11
suffices for me, after all; it basically does the replacement by interposing those functions on the C library side instead. It can be used either at compile time (by linking the C code into the final binary), or at run time for already-compiled binaries (by using LD_PRELOAD).