[committed 1/2] libstdc++: Reduce monotonic_buffer_resource overallocation [PR 96942]

Jonathan Wakely jwakely@redhat.com
Thu Sep 10 14:42:47 GMT 2020


The primary reason for this change is to reduce the size of buffers
allocated by std::pmr::monotonic_buffer_resource. Previously, a new
buffer would always add the size of the linked list node (11 bytes) and
then round up to the next power of two. This results in a huge increase
if the expected size of the next buffer is already a power of two. For
example, if the resource is constructed with a desired initial size of
4096 the first buffer it allocates will be std::bit_ceil(4096+11) which
is 8192.  If the user has carefully selected the initial size to match
their expected memory requirements then allocating double that amount
wastes a lot of memory.

After this patch the allocated size will be rounded up to a 64-byte
boundary, instead of to a power of two. This means for an initial size
of 4096 only 4160 bytes get allocated.

Previously only the base-2 logarithm of the size was stored, which could
be stored in a single 8-bit integer. Now that the size isn't always a
power of two we need to use more bits to store it. As the size is always
a multiple of 64 the low six bits are not needed, and so we can use the
same approach that the pool resources already use of storing the base-2
logarithm of the alignment in the low bits that are not used for the
size. To avoid code duplication, a new aligned_size<N> helper class is
introduced by this patch, which is then used by both the pool resources'
big_block type and the monotonic_buffer_resource::_Chunk type.

Originally the big_block type used two bit-fields to store the size and
alignment in the space of a single size_t member. The aligned_size type
uses a single size_t member and uses masks and bitwise operations to
manipulate the size and alignment values. This results in better code
than the old version, because the bit-fields weren't optimally ordered
for little endian architectures, so the alignment was actually stored in
the high bits, not the unused low bits, requiring additional shifts to
calculate the values. Using bitwise operations directly avoids needing
to reorder the bit-fields depending on the endianness.

While adapting the _Chunk and big_block types to use aligned_size<N> I
also added checks for size overflows (technically, unsigned wraparound).
The memory resources now ensure that when they require an allocation
that is too large to represent in size_t they will request SIZE_MAX
bytes from the upstream resource, rather than requesting a small value
that results from wrapround. The testsuite is enhanced to verify this.

libstdc++-v3/ChangeLog:

	PR libstdc++/96942
	* include/std/memory_resource (monotonic_buffer_resource::do_allocate):
	Use __builtin_expect when checking if a new buffer needs to be
	allocated from the upstream resource, and for checks for edge
	cases like zero sized buffers and allocations.
	* src/c++17/memory_resource.cc (aligned_size): New class template.
	(aligned_ceil): New helper function to round up to a given
	alignment.
	(monotonic_buffer_resource::chunk): Replace _M_size and _M_align
	with an aligned_size member. Remove _M_canary member. Change _M_next
	to pointer instead of unaligned buffer.
	(monotonic_buffer_resource::chunk::allocate): Round up to multiple
	of 64 instead of to power of two. Check for size overflow. Remove
	redundant check for minimum required alignment.
	(monotonic_buffer_resource::chunk::release): Adjust for changes
	to data members.
	(monotonic_buffer_resource::_M_new_buffer): Use aligned_ceil.
	(big_block): Replace _M_size and _M_align with aligned_size
	member.
	(big_block::big_block): Check for size overflow.
	(big_block::size, big_block::align): Adjust to use aligned_size.
	(big_block::alloc_size): Use aligned_ceil.
	(munge_options): Use aligned_ceil.
	(__pool_resource::allocate): Use big_block::align for alignment.
	* testsuite/20_util/monotonic_buffer_resource/allocate.cc: Check
	upstream resource gets expected values for impossible sizes.
	* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
	Likewise. Adjust checks for expected alignment in existing test.

Tested powerpc64le-linux. Committed to trunk.

-------------- next part --------------
commit 1e718ec51a223d65a09757b999202390871b778d
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Sep 10 15:39:15 2020

    libstdc++: Reduce monotonic_buffer_resource overallocation [PR 96942]
    
    The primary reason for this change is to reduce the size of buffers
    allocated by std::pmr::monotonic_buffer_resource. Previously, a new
    buffer would always add the size of the linked list node (11 bytes) and
    then round up to the next power of two. This results in a huge increase
    if the expected size of the next buffer is already a power of two. For
    example, if the resource is constructed with a desired initial size of
    4096 the first buffer it allocates will be std::bit_ceil(4096+11) which
    is 8192.  If the user has carefully selected the initial size to match
    their expected memory requirements then allocating double that amount
    wastes a lot of memory.
    
    After this patch the allocated size will be rounded up to a 64-byte
    boundary, instead of to a power of two. This means for an initial size
    of 4096 only 4160 bytes get allocated.
    
    Previously only the base-2 logarithm of the size was stored, which could
    be stored in a single 8-bit integer. Now that the size isn't always a
    power of two we need to use more bits to store it. As the size is always
    a multiple of 64 the low six bits are not needed, and so we can use the
    same approach that the pool resources already use of storing the base-2
    logarithm of the alignment in the low bits that are not used for the
    size. To avoid code duplication, a new aligned_size<N> helper class is
    introduced by this patch, which is then used by both the pool resources'
    big_block type and the monotonic_buffer_resource::_Chunk type.
    
    Originally the big_block type used two bit-fields to store the size and
    alignment in the space of a single size_t member. The aligned_size type
    uses a single size_t member and uses masks and bitwise operations to
    manipulate the size and alignment values. This results in better code
    than the old version, because the bit-fields weren't optimally ordered
    for little endian architectures, so the alignment was actually stored in
    the high bits, not the unused low bits, requiring additional shifts to
    calculate the values. Using bitwise operations directly avoids needing
    to reorder the bit-fields depending on the endianness.
    
    While adapting the _Chunk and big_block types to use aligned_size<N> I
    also added checks for size overflows (technically, unsigned wraparound).
    The memory resources now ensure that when they require an allocation
    that is too large to represent in size_t they will request SIZE_MAX
    bytes from the upstream resource, rather than requesting a small value
    that results from wrapround. The testsuite is enhanced to verify this.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/96942
            * include/std/memory_resource (monotonic_buffer_resource::do_allocate):
            Use __builtin_expect when checking if a new buffer needs to be
            allocated from the upstream resource, and for checks for edge
            cases like zero sized buffers and allocations.
            * src/c++17/memory_resource.cc (aligned_size): New class template.
            (aligned_ceil): New helper function to round up to a given
            alignment.
            (monotonic_buffer_resource::chunk): Replace _M_size and _M_align
            with an aligned_size member. Remove _M_canary member. Change _M_next
            to pointer instead of unaligned buffer.
            (monotonic_buffer_resource::chunk::allocate): Round up to multiple
            of 64 instead of to power of two. Check for size overflow. Remove
            redundant check for minimum required alignment.
            (monotonic_buffer_resource::chunk::release): Adjust for changes
            to data members.
            (monotonic_buffer_resource::_M_new_buffer): Use aligned_ceil.
            (big_block): Replace _M_size and _M_align with aligned_size
            member.
            (big_block::big_block): Check for size overflow.
            (big_block::size, big_block::align): Adjust to use aligned_size.
            (big_block::alloc_size): Use aligned_ceil.
            (munge_options): Use aligned_ceil.
            (__pool_resource::allocate): Use big_block::align for alignment.
            * testsuite/20_util/monotonic_buffer_resource/allocate.cc: Check
            upstream resource gets expected values for impossible sizes.
            * testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
            Likewise. Adjust checks for expected alignment in existing test.

diff --git a/libstdc++-v3/include/std/memory_resource b/libstdc++-v3/include/std/memory_resource
index 2b8735b8c39..3db22978294 100644
--- a/libstdc++-v3/include/std/memory_resource
+++ b/libstdc++-v3/include/std/memory_resource
@@ -636,11 +636,11 @@ namespace pmr
     void*
     do_allocate(size_t __bytes, size_t __alignment) override
     {
-      if (__bytes == 0)
+      if (__builtin_expect(__bytes == 0, false))
 	__bytes = 1; // Ensures we don't return the same pointer twice.
 
       void* __p = std::align(__alignment, __bytes, _M_current_buf, _M_avail);
-      if (!__p)
+      if (__builtin_expect(__p == nullptr, false))
 	{
 	  _M_new_buffer(__bytes, __alignment);
 	  __p = _M_current_buf;
@@ -671,7 +671,7 @@ namespace pmr
     static size_t
     _S_next_bufsize(size_t __buffer_size) noexcept
     {
-      if (__buffer_size == 0)
+      if (__builtin_expect(__buffer_size == 0, false))
 	__buffer_size = 1;
       return __buffer_size * _S_growth_factor;
     }
diff --git a/libstdc++-v3/src/c++17/memory_resource.cc b/libstdc++-v3/src/c++17/memory_resource.cc
index 95352b23537..392d72ef32d 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -175,6 +175,47 @@ namespace pmr
   // versions will not use this symbol.
   monotonic_buffer_resource::~monotonic_buffer_resource() { release(); }
 
+  namespace {
+
+  // aligned_size<N> stores the size and alignment of a memory allocation.
+  // The size must be a multiple of N, leaving the low log2(N) bits free
+  // to store the base-2 logarithm of the alignment.
+  // For example, allocate(1024, 32) is stored as 1024 + log2(32) = 1029.
+  template<unsigned N>
+  struct aligned_size
+  {
+    // N must be a power of two
+    static_assert( std::__popcount(N) == 1 );
+
+    static constexpr size_t _S_align_mask = N - 1;
+    static constexpr size_t _S_size_mask = ~_S_align_mask;
+
+    constexpr
+    aligned_size(size_t sz, size_t align) noexcept
+    : value(sz | (std::__bit_width(align) - 1u))
+    {
+      __glibcxx_assert(size() == sz); // sz must be a multiple of N
+    }
+
+    constexpr size_t
+    size() const noexcept
+    { return value & _S_size_mask; }
+
+    constexpr size_t
+    alignment() const noexcept
+    { return size_t(1) << (value & _S_align_mask); }
+
+    size_t value; // size | log2(alignment)
+  };
+
+  // Round n up to a multiple of alignment, which must be a power of two.
+  constexpr size_t aligned_ceil(size_t n, size_t alignment)
+  {
+    return (n + alignment - 1) & ~(alignment - 1);
+  }
+
+  } // namespace
+
   // Memory allocated by the upstream resource is managed in a linked list
   // of _Chunk objects. A _Chunk object recording the size and alignment of
   // the allocated block and a pointer to the previous chunk is placed
@@ -189,23 +230,26 @@ namespace pmr
     allocate(memory_resource* __r, size_t __size, size_t __align,
 	     _Chunk*& __head)
     {
-      __size = std::__bit_ceil(__size + sizeof(_Chunk));
+      const size_t __orig_size = __size;
 
-      if constexpr (alignof(_Chunk) > 1)
+      // Add space for the _Chunk object and round up to 64 bytes.
+      __size = aligned_ceil(__size + sizeof(_Chunk), 64);
+
+      // Check for unsigned wraparound
+      if (__size < __orig_size) [[unlikely]]
 	{
-	  // PR libstdc++/90046
-	  // For targets like epiphany-elf where alignof(_Chunk) != 1
-	  // ensure that the last sizeof(_Chunk) bytes in the buffer
-	  // are suitably-aligned for a _Chunk.
-	  // This should be unnecessary, because the caller already
-	  // passes in max(__align, alignof(max_align_t)).
-	  if (__align < alignof(_Chunk))
-	    __align = alignof(_Chunk);
+	  // monotonic_buffer_resource::do_allocate is not allowed to throw.
+	  // If the required size is too large for size_t then ask the
+	  // upstream resource for an impossibly large size and alignment.
+	  __size = -1;
+	  __align = ~(size_t(-1) >> 1);
 	}
 
       void* __p = __r->allocate(__size, __align);
 
       // Add a chunk defined by (__p, __size, __align) to linked list __head.
+      // We know the end of the buffer is suitably-aligned for a _Chunk
+      // because the caller ensured __align is at least alignof(max_align_t).
       void* const __back = (char*)__p + __size - sizeof(_Chunk);
       __head = ::new(__back) _Chunk(__size, __align, __head);
       return { __p, __size - sizeof(_Chunk) };
@@ -220,16 +264,9 @@ namespace pmr
       while (__next)
 	{
 	  _Chunk* __ch = __next;
-	  __builtin_memcpy(&__next, __ch->_M_next, sizeof(_Chunk*));
-
-	  __glibcxx_assert(__ch->_M_canary != 0);
-	  __glibcxx_assert(__ch->_M_canary == (__ch->_M_size|__ch->_M_align));
-
-	  if (__ch->_M_canary != (__ch->_M_size | __ch->_M_align))
-	    return; // buffer overflow detected!
-
-	  size_t __size = (size_t)1 << __ch->_M_size;
-	  size_t __align = (size_t)1 << __ch->_M_align;
+	  __next = __ch->_M_next;
+	  size_t __size = __ch->_M_size.size();
+	  size_t __align = __ch->_M_size.alignment();
 	  void* __start = (char*)(__ch + 1) - __size;
 	  __r->deallocate(__start, __size, __align);
 	}
@@ -237,24 +274,18 @@ namespace pmr
 
   private:
     _Chunk(size_t __size, size_t __align, _Chunk* __next) noexcept
-    : _M_size(std::__bit_width(__size) - 1),
-      _M_align(std::__bit_width(__align) - 1)
-    {
-      __builtin_memcpy(_M_next, &__next, sizeof(__next));
-      _M_canary = _M_size | _M_align;
-    }
+    : _M_size(__size, __align), _M_next(__next)
+    { }
 
-    unsigned char _M_canary;
-    unsigned char _M_size;
-    unsigned char _M_align;
-    unsigned char _M_next[sizeof(_Chunk*)];
+    aligned_size<64> _M_size;
+    _Chunk* _M_next;
   };
 
   void
   monotonic_buffer_resource::_M_new_buffer(size_t bytes, size_t alignment)
   {
     const size_t n = std::max(bytes, _M_next_bufsiz);
-    const size_t m = std::max(alignment, alignof(std::max_align_t));
+    const size_t m = aligned_ceil(alignment, alignof(std::max_align_t));
     auto [p, size] = _Chunk::allocate(_M_upstream, n, m, _M_head);
     _M_current_buf = p;
     _M_avail = size;
@@ -550,49 +581,43 @@ namespace pmr
   // An oversized allocation that doesn't fit in a pool.
   struct big_block
   {
-    // Alignment must be a power-of-two so we only need to use enough bits
-    // to store the power, not the actual value:
-    static constexpr unsigned _S_alignbits
-      = std::__bit_width((unsigned)numeric_limits<size_t>::digits - 1);
-    // Use the remaining bits to store the size:
-    static constexpr unsigned _S_sizebits
-      = numeric_limits<size_t>::digits - _S_alignbits;
-    // The maximum value that can be stored in _S_size
-    static constexpr size_t all_ones = size_t(-1) >> _S_alignbits;
-    // The minimum size of a big block (smaller sizes will be rounded up).
-    static constexpr size_t min = 1u << _S_alignbits;
+    // The minimum size of a big block.
+    // All big_block allocations will be a multiple of this value.
+    // Use bit_ceil to get a power of two even for e.g. 20-bit size_t.
+    static constexpr size_t min = __bit_ceil(numeric_limits<size_t>::digits);
 
+    constexpr
     big_block(size_t bytes, size_t alignment)
-    : _M_size(alloc_size(bytes) >> _S_alignbits),
-      _M_align_exp(std::__bit_width(alignment) - 1u)
-    { }
+    : _M_size(alloc_size(bytes), alignment)
+    {
+      // Check for unsigned wraparound
+      if (size() < bytes) [[unlikely]]
+	{
+	  // (sync|unsync)_pool_resource::do_allocate is not allowed to throw.
+	  // If the required size is too large for size_t then ask the
+	  // upstream resource for an impossibly large size and alignment.
+	  _M_size.value = -1;
+	}
+    }
 
     void* pointer = nullptr;
-    size_t _M_size : numeric_limits<size_t>::digits - _S_alignbits;
-    size_t _M_align_exp : _S_alignbits;
+    aligned_size<min> _M_size;
 
     size_t size() const noexcept
     {
-      // If all bits are set in _M_size it means the maximum possible size:
-      if (__builtin_expect(_M_size == (size_t(-1) >> _S_alignbits), false))
-	return (size_t)-1;
-      else
-	return _M_size << _S_alignbits;
+      if (_M_size.value == size_t(-1)) [[unlikely]]
+	return size_t(-1);
+      return _M_size.size();
     }
 
-    size_t align() const noexcept { return size_t(1) << _M_align_exp; }
+    size_t align() const noexcept
+    { return _M_size.alignment(); }
 
     // Calculate size to be allocated instead of requested number of bytes.
     // The requested value will be rounded up to a multiple of big_block::min,
-    // so the low _S_alignbits bits are all zero and don't need to be stored.
+    // so the low bits are all zero and can be used to hold the alignment.
     static constexpr size_t alloc_size(size_t bytes) noexcept
-    {
-      const size_t s = bytes + min - 1u;
-      if (__builtin_expect(s < bytes, false))
-	return size_t(-1); // addition wrapped past zero, return max value
-      else
-	return s & ~(min - 1u);
-    }
+    { return aligned_ceil(bytes, min); }
 
     friend bool operator<(void* p, const big_block& b) noexcept
     { return less<void*>{}(p, b.pointer); }
@@ -895,9 +920,8 @@ namespace pmr
       {
 	// Round to preferred granularity
 	static_assert(std::__has_single_bit(pool_sizes[0]));
-	constexpr size_t mask = pool_sizes[0] - 1;
-	opts.largest_required_pool_block += mask;
-	opts.largest_required_pool_block &= ~mask;
+	opts.largest_required_pool_block
+	  = aligned_ceil(opts.largest_required_pool_block, pool_sizes[0]);
       }
 
     if (opts.largest_required_pool_block < big_block::min)
@@ -964,7 +988,9 @@ namespace pmr
     auto& b = _M_unpooled.emplace_back(bytes, alignment);
     __try {
       // N.B. need to allocate b.size(), which might be larger than bytes.
-      void* p = resource()->allocate(b.size(), alignment);
+      // Also use b.align() instead of alignment parameter, which will be
+      // an impossibly large value if (bytes+bookkeeping) > SIZE_MAX.
+      void* p = resource()->allocate(b.size(), b.align());
       b.pointer = p;
       if (_M_unpooled.size() > 1)
 	{
diff --git a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/allocate.cc b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/allocate.cc
index b4954014828..637bdf1b30e 100644
--- a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/allocate.cc
+++ b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/allocate.cc
@@ -210,6 +210,51 @@ test06()
   }
 }
 
+void
+test07()
+{
+  // Custom exception thrown on expected allocation failure.
+  struct very_bad_alloc : std::bad_alloc { };
+
+  struct careful_resource : __gnu_test::memory_resource
+  {
+    void* do_allocate(std::size_t bytes, std::size_t alignment)
+    {
+      // pmr::monotonic_buffer_resource::do_allocate is not allowed to
+      // throw an exception when asked for an allocation it can't satisfy.
+      // The libstdc++ implementation will ask upstream to allocate
+      // bytes=SIZE_MAX and alignment=bit_floor(SIZE_MAX) instead of throwing.
+      // Verify that we got those values:
+      if (bytes != std::numeric_limits<std::size_t>::max())
+	VERIFY( !"upstream allocation should request maximum number of bytes" );
+      if (alignment != (1 + std::numeric_limits<std::size_t>::max() / 2))
+	VERIFY( !"upstream allocation should request maximum alignment" );
+
+      // A successful failure:
+      throw very_bad_alloc();
+    }
+  };
+
+  careful_resource cr;
+  std::pmr::monotonic_buffer_resource mbr(&cr);
+  try
+  {
+    // Try to allocate a ridiculous size:
+    void* p = mbr.allocate(std::size_t(-2), 1);
+    // Should not reach here!
+    VERIFY( !"attempt to allocate SIZE_MAX-1 should not have succeeded" );
+    throw p;
+  }
+  catch (const very_bad_alloc&)
+  {
+    // Should catch this exception from careful_resource::do_allocate
+  }
+  catch (const std::bad_alloc&)
+  {
+    VERIFY( !"monotonic_buffer_resource::do_allocate is not allowed to throw" );
+  }
+}
+
 int
 main()
 {
@@ -219,4 +264,5 @@ main()
   test04();
   test05();
   test06();
+  test07();
 }
diff --git a/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc b/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
index 5bf20cf262c..2775e1260ea 100644
--- a/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
+++ b/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
@@ -189,7 +189,16 @@ test06()
       if (bytes < expected_size)
 	throw bad_size();
       else if (align != expected_alignment)
-	throw bad_alignment();
+      {
+	if (bytes == std::numeric_limits<std::size_t>::max()
+	    && align == (1 + std::numeric_limits<std::size_t>::max() / 2))
+	{
+	  // Pool resources request bytes=SIZE_MAX && align=bit_floor(SIZE_MAX)
+	  // when they are unable to meet an allocation request.
+	}
+	else
+	  throw bad_alignment();
+      }
       // Else just throw, don't really try to allocate:
       throw std::bad_alloc();
     }
@@ -239,6 +248,58 @@ test06()
   }
 }
 
+void
+test07()
+{
+  // Custom exception thrown on expected allocation failure.
+  struct very_bad_alloc : std::bad_alloc { };
+
+  struct careful_resource : __gnu_test::memory_resource
+  {
+    void* do_allocate(std::size_t bytes, std::size_t alignment)
+    {
+      // Need to allow normal allocations for the pool resource's internal
+      // data structures:
+      if (alignment < 1024)
+	return __gnu_test::memory_resource::do_allocate(bytes, alignment);
+
+      // pmr::unsynchronized_pool_resource::do_allocate is not allowed to
+      // throw an exception when asked for an allocation it can't satisfy.
+      // The libstdc++ implementation will ask upstream to allocate
+      // bytes=SIZE_MAX and alignment=bit_floor(SIZE_MAX) instead of throwing.
+      // Verify that we got those values:
+      if (bytes != std::numeric_limits<size_t>::max())
+	VERIFY( !"upstream allocation should request SIZE_MAX bytes" );
+      if (alignment != (1 + std::numeric_limits<size_t>::max() / 2))
+	VERIFY( !"upstream allocation should request SIZE_MAX/2 alignment" );
+
+      // A successful failure:
+      throw very_bad_alloc();
+    }
+  };
+
+  careful_resource cr;
+  std::pmr::unsynchronized_pool_resource upr(&cr);
+  try
+  {
+    // Try to allocate a ridiculous size (and use a large extended alignment
+    // so that careful_resource::do_allocate can distinguish this allocation
+    // from any required for the pool resource's internal data structures):
+    void* p = upr.allocate(std::size_t(-2), 1024);
+    // Should not reach here!
+    VERIFY( !"attempt to allocate SIZE_MAX-1 should not have succeeded" );
+    throw p;
+  }
+  catch (const very_bad_alloc&)
+  {
+    // Should catch this exception from careful_resource::do_allocate
+  }
+  catch (const std::bad_alloc&)
+  {
+    VERIFY( !"unsynchronized_pool_resource::do_allocate is not allowed to throw" );
+  }
+}
+
 int
 main()
 {
@@ -248,4 +309,5 @@ main()
   test04();
   test05();
   test06();
+  test07();
 }


More information about the Gcc-patches mailing list