Bug 65122 - std::vector doesn't honor element alignment
Summary: std::vector doesn't honor element alignment
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: unknown
: P3 normal
Target Milestone: 7.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 77742
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-19 15:26 UTC by Benoit Jacob
Modified: 2016-10-14 12:04 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-09-26 00:00:00


Attachments
testcase (148 bytes, text/x-c++src)
2015-02-19 15:26 UTC, Benoit Jacob
Details
Make new_allocator support types with new-extended alignment (473 bytes, patch)
2016-09-26 09:34 UTC, Jonathan Wakely
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benoit Jacob 2015-02-19 15:26:02 UTC
Created attachment 34807 [details]
testcase

Even in C++11 mode, even with the C++11 alignas keyword, element alignment isn't honored by std::vector.

$ g++ vectoralign.cpp -o a --std=c++11 && ./a
ctor 0x2422010
ctor 0x2422050

Tested with a freshly built GCC 4.9.2 on Ubuntu x86-64.

Attaching testcase.

Note: this is morphing bug 53900.
Comment 1 Benoit Jacob 2015-02-19 16:22:26 UTC
Homologous libc++ bug report: http://llvm.org/bugs/show_bug.cgi?id=22634
Comment 2 Marc Glisse 2015-02-19 19:08:06 UTC
IMHO the only sensible solution is in this direction:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3396.htm

I hope Clark is still working on this...
Comment 3 Benoit Jacob 2015-02-19 20:22:09 UTC
I'd be interested in an explanation of why the default STL allocator can't just honor the alignment of the value_type ? (The document linked in comment 2 seemed to assume that that goes without saying?)
Comment 4 Marc Glisse 2015-02-19 20:59:25 UTC
(In reply to Benoit Jacob from comment #3)
> I'd be interested in an explanation of why the default STL allocator can't
> just honor the alignment of the value_type ?

[allocator.members] "It is implementation-defined whether over-aligned types are supported" so we don't really have to. "the storage is obtained by calling ::operator new(std::size_t)" so we can't use posix_memalign (providing a separate allocator that does could be a good idea though). We could, when the type is over-aligned, do the usual trick of requesting too much memory so we have enough margin to find a suitably aligned region inside, and write a marker before so we remember where the real allocation started. That might be what was tried in PR55727 (I didn't check), the maintainers might be ok with this if someone posts a patch.

But calling new T would still be broken. It seems better in the long term to add the aligned versions of operator new and make both new and std::allocator use those (in the standard). Note that there would be ABI issues switching from the trick in the previous paragraph to this.
Comment 5 Benoit Jacob 2015-02-19 21:05:30 UTC
So while the standard says that over-aligned types dont have to be supported, it also says in 3.11/9 in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf that:

> If a request for a specific extended alignment in a specific context is not supported by an implementation,
> the program is ill-formed. Additionally, a request for runtime allocation of dynamic storage for which the
> requested alignment cannot be honored shall be treated as an allocation failure

In my naive understanding, that sounds like if over-aligned allocation is not supported then it must be an allocation failure (i.e. not fail silently to honor alignment).

That's relevant because failing all over-allocated allocations is probably not something that a compiler could do in the real world (that would break a lot of existing software) and so this 3.11/9 clause might then be de-facto forcing compilers to support over-allocated allocation.

What do you think? How else would you interprete 3.11/9?
Comment 6 Benoit Jacob 2015-02-19 21:11:33 UTC
(In reply to Marc Glisse from comment #4)
> "the storage is obtained by
> calling ::operator new(std::size_t)" so we can't use posix_memalign

Ouch. That's very unfortunate. I see. I would still be interested in how you understand 3.11/9 and how to reconcile it with that without breaking a lot of software.
Comment 7 Marc Glisse 2015-02-19 21:29:18 UTC
(In reply to Benoit Jacob from comment #6)
> I would still be interested in how you understand 3.11/9

I consider it a defect in the standard, so it needs fixing, not understanding...
Comment 8 Benoit Jacob 2015-02-19 21:32:25 UTC
If there is a defect in the standard, isn't it in the part that forces the compiler to not use the useful type information that it has, that is, the above-quoted "the storage is obtained by calling ::operator new(std::size_t)" ?
Comment 9 Benoit Jacob 2015-02-19 21:33:05 UTC
s/compiler/standard library
Comment 10 Richard Biener 2015-02-20 07:50:42 UTC
(In reply to Benoit Jacob from comment #6)
> (In reply to Marc Glisse from comment #4)
> > "the storage is obtained by
> > calling ::operator new(std::size_t)" so we can't use posix_memalign
> 
> Ouch. That's very unfortunate. I see. I would still be interested in how you
> understand 3.11/9 and how to reconcile it with that without breaking a lot
> of software.

But ::operator new(std::size_t) could always return memory aligned for the
most over-aligned type?  Thus our default new implementation could use
posix_memalign (..., MIN (size, BIGGEST_ALIGNMENT), size)?

If the user provides its own ::new then he is on its own, of course (and
doing that and using posix_memalign in it would be a workaround for this
issue?!).
Comment 11 Benoit Jacob 2015-02-20 13:03:40 UTC
(In reply to Richard Biener from comment #10)
> But ::operator new(std::size_t) could always return memory aligned for the
> most over-aligned type?  Thus our default new implementation could use
> posix_memalign (..., MIN (size, BIGGEST_ALIGNMENT), size)?

The problem is there are lots of use cases for really high alignment: AVX brings uses of 32-byte alignment, and cache lines are typically >= 64 byte aligned. So alignas has to support at least 64 or 128 byte alignment to support cache friendly code, and even without that, it would have to support at least 32 byte alignment for AVX vectorized code.

Making all allocations that large would lead to substantial allocator slop. For example, jemalloc has a quantum of 8 or 16 byte depending on whether the arch is 32 or 64 bit, so increasing it to 32, 64 or 128 byte would be a big difference.

> 
> If the user provides its own ::new then he is on its own, of course

I agree that's how I would like things to be. Unfortunately, the spec quote in comment 4, "the storage is obtained by calling ::operator new(std::size_t)", goes in the opposite direction, by requiring allocators to use the type-blind ::new , thereby losing useful type information such as the type's alignment. That's why I think that this is the spec's weak spot that might be questioned. but you're the expert, not me :)

> (and
> doing that and using posix_memalign in it would be a workaround for this
> issue?!).

That could be a good compromise for many applications, that either don't care about minimizing memory usage, or don't do many tiny allocations. Unfortunately, my context here is a library (Eigen), so we can't make this decision for all our users.
Comment 12 Richard Biener 2015-02-20 13:18:07 UTC
(In reply to Benoit Jacob from comment #11)
> (In reply to Richard Biener from comment #10)
> > But ::operator new(std::size_t) could always return memory aligned for the
> > most over-aligned type?  Thus our default new implementation could use
> > posix_memalign (..., MIN (size, BIGGEST_ALIGNMENT), size)?
> 
> The problem is there are lots of use cases for really high alignment: AVX
> brings uses of 32-byte alignment, and cache lines are typically >= 64 byte
> aligned. So alignas has to support at least 64 or 128 byte alignment to
> support cache friendly code, and even without that, it would have to support
> at least 32 byte alignment for AVX vectorized code.
> 
> Making all allocations that large would lead to substantial allocator slop.
> For example, jemalloc has a quantum of 8 or 16 byte depending on whether the
> arch is 32 or 64 bit, so increasing it to 32, 64 or 128 byte would be a big
> difference.

Though does it really matter in practice?  Tiny allocations would not suffer
because of the MIN (align, size), so the worst-case is max-align + 1
allocations.  Btw, you could as well try MIN (align, size & -size),
thus assume that the allocation size is N * alignof ().

> > 
> > If the user provides its own ::new then he is on its own, of course
> 
> I agree that's how I would like things to be. Unfortunately, the spec quote
> in comment 4, "the storage is obtained by calling ::operator
> new(std::size_t)", goes in the opposite direction, by requiring allocators
> to use the type-blind ::new , thereby losing useful type information such as
> the type's alignment. That's why I think that this is the spec's weak spot
> that might be questioned. but you're the expert, not me :)
> 
> > (and
> > doing that and using posix_memalign in it would be a workaround for this
> > issue?!).
> 
> That could be a good compromise for many applications, that either don't
> care about minimizing memory usage, or don't do many tiny allocations.
> Unfortunately, my context here is a library (Eigen), so we can't make this
> decision for all our users.
Comment 13 Benoit Jacob 2015-02-20 14:11:58 UTC
(In reply to Richard Biener from comment #12)
> (In reply to Benoit Jacob from comment #11)
> > (In reply to Richard Biener from comment #10)
> > > But ::operator new(std::size_t) could always return memory aligned for the
> > > most over-aligned type?  Thus our default new implementation could use
> > > posix_memalign (..., MIN (size, BIGGEST_ALIGNMENT), size)?
> > 
> > The problem is there are lots of use cases for really high alignment: AVX
> > brings uses of 32-byte alignment, and cache lines are typically >= 64 byte
> > aligned. So alignas has to support at least 64 or 128 byte alignment to
> > support cache friendly code, and even without that, it would have to support
> > at least 32 byte alignment for AVX vectorized code.
> > 
> > Making all allocations that large would lead to substantial allocator slop.
> > For example, jemalloc has a quantum of 8 or 16 byte depending on whether the
> > arch is 32 or 64 bit, so increasing it to 32, 64 or 128 byte would be a big
> > difference.
> 
> Though does it really matter in practice?  Tiny allocations would not suffer
> because of the MIN (align, size), so the worst-case is max-align + 1
> allocations.  Btw, you could as well try MIN (align, size & -size),
> thus assume that the allocation size is N * alignof ().

Oh, I was missing the MIN (size, BIGGEST_ALIGNMENT) part of your proposal. So you're right, that nicely takes care of tiny allocations. Correct also on MIN (align, size & -size)   (though I have to trust your bit wizardry here) so that neatly exploits the low bits of the size parameter to infer the alignof --- sounds very good to me!
Comment 14 Richard Biener 2015-02-20 14:20:03 UTC
(In reply to Benoit Jacob from comment #13)
> (In reply to Richard Biener from comment #12)
> > (In reply to Benoit Jacob from comment #11)
> > > (In reply to Richard Biener from comment #10)
> > > > But ::operator new(std::size_t) could always return memory aligned for the
> > > > most over-aligned type?  Thus our default new implementation could use
> > > > posix_memalign (..., MIN (size, BIGGEST_ALIGNMENT), size)?
> > > 
> > > The problem is there are lots of use cases for really high alignment: AVX
> > > brings uses of 32-byte alignment, and cache lines are typically >= 64 byte
> > > aligned. So alignas has to support at least 64 or 128 byte alignment to
> > > support cache friendly code, and even without that, it would have to support
> > > at least 32 byte alignment for AVX vectorized code.
> > > 
> > > Making all allocations that large would lead to substantial allocator slop.
> > > For example, jemalloc has a quantum of 8 or 16 byte depending on whether the
> > > arch is 32 or 64 bit, so increasing it to 32, 64 or 128 byte would be a big
> > > difference.
> > 
> > Though does it really matter in practice?  Tiny allocations would not suffer
> > because of the MIN (align, size), so the worst-case is max-align + 1
> > allocations.  Btw, you could as well try MIN (align, size & -size),
> > thus assume that the allocation size is N * alignof ().
> 
> Oh, I was missing the MIN (size, BIGGEST_ALIGNMENT) part of your proposal.
> So you're right, that nicely takes care of tiny allocations. Correct also on
> MIN (align, size & -size)   (though I have to trust your bit wizardry here)
> so that neatly exploits the low bits of the size parameter to infer the
> alignof --- sounds very good to me!

That trick of course requires you to not "optimize" your allocations manually
to omit tail padding in aggregates (or manually compute allocation sizes).
Or use

struct { avx512vector x; char c; } __attribute__((packed,aligned(64)));

which has alignment 64 and size 65... (of course arrays of such objects
are ill-defined)
Comment 15 Richard Biener 2015-02-20 14:22:37 UTC
Oh, and I wonder if there is an aligned realloc (though C++ doesn't define
sth like realloc and thus std::vector can't optimize the copy when
reallocating?!).  But for a C program using posix_memalign or any other such
facility means realloc is out of the question as well.
Comment 16 H.J. Lu 2016-01-12 16:55:24 UTC
I think this is a dup of PR 36159.
Comment 17 Andrew Pinski 2016-09-24 23:44:48 UTC
I think this is fixed for GCC 7 with -std=c++17 support.
Comment 18 Benoit Jacob 2016-09-25 01:42:12 UTC
(In reply to Andrew Pinski from comment #17)
> I think this is fixed for GCC 7 with -std=c++17 support.

Thanks for the update, that's great news!
Comment 19 Marc Glisse 2016-09-25 08:36:35 UTC
(In reply to Andrew Pinski from comment #17)
> I think this is fixed for GCC 7 with -std=c++17 support.

No, it isn't. new T[10] will give suitably aligned memory, but not std::allocator<T>. Only the core part of alignment support was added in C++17, we are expecting a corresponding library part in C++20. On the other hand, providing it as an extension in C++17-mode (gnu++17?) might be ok (requires a discussion).
Comment 20 Jonathan Wakely 2016-09-26 08:56:10 UTC
In C++17 std::allocator no longer says "It is implementation-defined whether over-aligned types are supported" and is no longer required to call operator new(size_t), it is supposed to use the over-aligned form as appropriate.

We should do that when __cpp_aligned_new is defined.
Comment 21 Jonathan Wakely 2016-09-26 09:34:12 UTC
Created attachment 39686 [details]
Make new_allocator support types with new-extended alignment

This works, but currently produces a warning due to PR 77742.
Comment 22 Marc Glisse 2016-09-26 09:39:39 UTC
(In reply to Jonathan Wakely from comment #20)
> In C++17 std::allocator no longer says "It is implementation-defined whether
> over-aligned types are supported" and is no longer required to call operator
> new(size_t), it is supposed to use the over-aligned form as appropriate.

Ah, I had missed those last few lines of p0035r4 (they were not in r2). That's great news! I remember discussions and polls about the fact that the proposal wasn't handling the library part, is there still anything missing (postponed?) or did the last couple revisions add everything?
Comment 23 Jonathan Wakely 2016-09-26 09:53:09 UTC
I think it's supposed to be cover everything, but I'm not sure.
Comment 24 Jonathan Wakely 2016-10-14 12:03:54 UTC
Fixed for GCC 7
Comment 25 Jonathan Wakely 2016-10-14 12:04:20 UTC
Author: redi
Date: Fri Oct 14 12:03:47 2016
New Revision: 241158

URL: https://gcc.gnu.org/viewcvs?rev=241158&root=gcc&view=rev
Log:
PR65122 extended alignment support in allocators

	PR libstdc++/65122
	* include/ext/malloc_allocator.h (malloc_allocator::allocate): Use
	aligned_alloc for types with extended alignment if available,
	otherwise throw bad_alloc if malloc doesn't return a suitable value.
	* include/ext/bitmap_allocator.h (bitmap_allocator::allocate)
	(bitmap_allocator::deallocate): Use aligned new/delete for types with
	extended alignment.
	* include/ext/mt_allocator.h (__mt_alloc::allocate)
	(__mt_alloc::deallocate): Likewise.
	* include/ext/new_allocator.h (new_allocator::allocate)
	(new_allocator::deallocate): Likewise.
	* include/ext/pool_allocator.h (__pool_alloc::allocate)
	(__pool_alloc::deallocate): Likewise.
	* testsuite/20_util/allocator/overaligned.cc: New test.
	* testsuite/ext/bitmap_allocator/overaligned.cc: New test.
	* testsuite/ext/malloc_allocator/overaligned.cc: New test.
	* testsuite/ext/mt_allocator/overaligned.cc: New test.
	* testsuite/ext/new_allocator/overaligned.cc: New test.
	* testsuite/ext/pool_allocator/overaligned.cc: New test.

Added:
    trunk/libstdc++-v3/testsuite/20_util/allocator/overaligned.cc
    trunk/libstdc++-v3/testsuite/ext/bitmap_allocator/overaligned.cc
    trunk/libstdc++-v3/testsuite/ext/malloc_allocator/overaligned.cc
    trunk/libstdc++-v3/testsuite/ext/mt_allocator/overaligned.cc
    trunk/libstdc++-v3/testsuite/ext/new_allocator/overaligned.cc
    trunk/libstdc++-v3/testsuite/ext/pool_allocator/overaligned.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/ext/bitmap_allocator.h
    trunk/libstdc++-v3/include/ext/malloc_allocator.h
    trunk/libstdc++-v3/include/ext/mt_allocator.h
    trunk/libstdc++-v3/include/ext/new_allocator.h
    trunk/libstdc++-v3/include/ext/pool_allocator.h