[PATCH] libstdc++: Add allocate_at_least (P0401) [PR118030]
Nathan Myers
ncm@cantrip.org
Tue Mar 10 12:28:38 GMT 2026
On 3/10/26 8:12 AM, Jonathan Wakely wrote:
> On Tue, 10 Mar 2026 at 12:05, Nathan Myers <ncm@cantrip.org> wrote:
>>
>> On 3/10/26 7:55 AM, Tomasz Kaminski wrote:
>>>
>>>
>>> On Tue, Mar 10, 2026 at 12:17 PM Nathan Myers <ncm@cantrip.org
>>> <mailto:ncm@cantrip.org>> wrote:
>>>
>>> On 3/10/26 6:56 AM, Jonathan Wakely wrote:
>>> > On Mon, 9 Mar 2026 at 15:18, Nathan Myers <ncm@cantrip.org
>>> <mailto:ncm@cantrip.org>> wrote:
>>> >>
>>> >> On 3/9/26 6:01 AM, Jonathan Wakely wrote:
>>> >>>
>>> >>>
>>> >>> On Sun, 8 Mar 2026, 23:39 Nathan Myers, <ncm@cantrip.org
>>> <mailto:ncm@cantrip.org>
>>> >>> <mailto:ncm@cantrip.org <mailto:ncm@cantrip.org>>> wrote:
>>> >>>
>>> >>> Have you thought about how std::allocator<>::allocate_at_least
>>> >>> might usefully discover whether the ::op new (it is
>>> obliged by the
>>> >>> Standard to use) is the one in libstdc++, and will supply
>>> extra
>>> >>> details about memory allocated, not one the user has
>>> supplied and
>>> >>> the linker has substituted in its place? Ideally it would
>>> all be
>>> >>> decided at link time, and not conditionally in every call, but
>>> >>> linker magic is fragile even without LTO. And users can
>>> call ::op
>>> >>> new directly, with no #include to declare it.
>>> >>>
>>> >>> Maybe pass an extra argument to ::op new that the user's
>>> version
>>> >>> won't notice, and that ours may discover is present by
>>> looking at
>>> >>> the stack frame? The extra argument might be a pointer to
>>> a place
>>> >>> to scribble extra details. (A hint argument might be
>>> placed there,
>>> >>> besides.) But stack frame conventions vary too.
>>> >>>
>>> >>>
>>> >>> We don't need to do that. This feature is a customisation point
>>> for non-
>>> >>> standard allocators (and program-defined specializations of
>>> >>> std::allocator<User type>), we don't need to make
>>> std::allocator use
>>> >>> it. P0901 proposed changes to operator new which would have
>>> been useful
>>> >>> here, but that got abandoned.
>>> >>
>>> >> I don't understand that. If a user displaces ::operator new, they
>>> >> will expect std::allocator members to call it. The Standard seems
>>> >> to require all members obtain whatever memory they deliver by
>>> >> calling ::operator new. So, when calls to ::operator new are
>>> visible,
>>> >> we seem to be obliged to use it.
>>> >>
>>> >>> I did post a prototype in https://gcc.gnu.org/PR106477
>>> <https://gcc.gnu.org/PR106477> <https://
>>> >>> gcc.gnu.org/PR106477 <http://gcc.gnu.org/PR106477>> which can
>>> detect whether operator new has been
>>> >>> replaced, but I don't think we would need that here.
>>> >>>
>>> >>> We could just have a thread_local size_t* which is initially
>>> null and
>>> >>> which std::allocator would set to a local size_t, and operator
>>> new could
>>> >>> check for non-null and conditionally write the allocated size
>>> to it. But
>>> >>> that assumes that malloc provides a way to get the size. If a user
>>> >>> interposes their own malloc but doesn't work replace
>>> malloc_usable_size
>>> >>> then you'd have UB.
>>> >>
>>> >> Thread-local storage is a big hammer. Linker magic seems fragile.
>>> >> We could make calling ::operator new(0) poke a global if it knows
>>> >> about that global:
>>> >>
>>> >> std::atomic<int> ::__op_new_state{0};
>>> >>
>>> >> auto allocate_at_least(size_t __n) -> allocation_result<pointer>
>>> >> {
>>> >> auto __state = __op_new_state.load(memory_order_relaxed);
>>> >> if (__state > 0)
>>> >> return _M_allocate_at_least(__n);
>>> >> else if (__state < 0)
>>> >> return { ::operator new(__n * sizeof(_Tp)), __n };
>>> >>
>>> >> ::operator delete(::operator new(0)); // sample it
>>> >> if (__op_new_state.load(memory_order_relaxed) == 0) // still?
>>> >> __op_new_state.store(-1, memory_order_relaxed);
>>> >> return allocate_at_least(n);
>>> >> }
>>> >>
>>> >> void* ::operator new(size_t n)
>>> >> {
>>> >> if (n == 0 && ::__op_new_state.load(memory_order_relaxed)
>>> == 0)
>>> >> ::__op_new_state.store(1, memory_order_relaxed);
>>> >> ...
>>> >> }
>>> >>
>>> >> Relaxed semantics is safe here because all writers would write
>>> >> the same value. I don't think there is any repetitive performance
>>> >> penalty for relaxed reads to an atomic value that is not changing.
>>> >>
>>> >> Allowing the user to poke __op_new_state in their ::op new and
>>> >> supply their own __allocate_at_least to use is trivially more
>>> >> complicated.
>>> >
>>> > How would operator new return the real size to the caller?
>>>
>>> It doesn't. std::allocator<>::_M_allocate_at_least() does that job.
>>>
>>> What would _M_allocate_at_least do? As far as I know there is no
>>> interface in malloc that allows asking for and then using returned bytes.
>>> malloc_usable_size (https://man7.org/linux/man-pages/man3/
>>> malloc_usable_size.3.html <https://man7.org/linux/man-pages/man3/
>>> malloc_usable_size.3.html>)
>>> It is close but does not allow the number to be used, without realloc
>>> call, according to docs.
>> _M_allocate_at_least may do anything it likes -- call jemalloc,
>> call mmap, whatever.
>
> But that means the memory comes from a different source, so can't be
> freed using the same allocator<T>::deallocate function.
::operator delete and allocator<>::deallocate can rely on the
same ::__op_new_state to determine that the memory was allocated
by libstdc++ code, not user code, so it can do anything it needs
to. Whichever interface delivered the memory has certainly salted
away enough information to free it properly, because that is what
allocators do.
>> I really don't understand what is difficult about this. Is there
>> some prescribed constraint I am not aware of? I don't think the
>> Standard requires that we use C library facilities.
>>
>>> All that is new in ::operator new is establishing for std::allocator
>>> that the user has not displaced ::op new(): seeing __op_new_state==1,
>>> allocate_at_least() knows it can do whatever it likes because calls
>>> to ::op new are not user-visible, so users cannot determine whether
>>> memory has been obtained from it.
>>>
>>> > operator new can't encode the result in the return value somehow,
>>> > because that would be an ABI break for all callers that aren't aware
>>> > that operator new would be doing that now.
>>> >
>>> > Poking something to the stack seems a lot less portable and more
>>> > fragile than using a thread-local variable. And likely to cause
>>> > problems with -fstack-protector and/or AddressSanitizer.
>>> > Maybe something with __builtin_return_address would be possible, but
>>> > operator new has no idea if the caller is opting in to having its
>>> > stack fiddled with (maybe the atomic was set by a caller in another
>>> > thread, and the caller in the current thread is not
>>> > allocate_at_least).
>>>
>>> Thread-local variables are a snake pit. Fortunately, we don't
>>> need anything like that. Literally all the machinery is right
>>> there in the code presented above, except the implementation of
>>> _M_allocate_at_least(), which may call jemalloc or whatever it
>>> likes, and communicate its extra information to the caller via
>>> an ordinary returned allocation_result<T*>{p, n}.
>>>
>>> -N
>>>
>>
>
More information about the Libstdc++
mailing list