This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: mt allocator documentation (draft)


Felix Yen wrote:

When allocating:
> A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are any blocks
> of this size on the current threads freelist. If this is not NULL - fine, just
> remove the block that _S_bin[ bin ].first[ thread_id ] points to from the list


When deallocating:
>  always add the block to the end of the current threads freelist

If a thread consumes all the memory it allocates, it allocates or recycles some blocks in order to satisfy its first allocation request, and then uses all of the blocks in this initial allocation before it reuses the first one. That might be the best approach, but it's also plausible to put the deallocated blocks at the head of the free list so they're reused first. The documentation should explain why one approach was chosen over the other, and should address scenarios where sequences are constructed and destroyed. (This request may lead to complications.)

I admit that I haven't thought this through. This approach was chosen on a "it seems like a good idea" basis. It could in fact be better (from a cache point of view) to add blocks to the front of the list (in case an application inserts and removes only a few blocks). The "last pointers" could also be removed then.


I feel that I need more feedback on this. Such a patch is very "doable" however!



Still deallocating:
> In order to reduce lock contention (since this requires this bins mutex to
> be locked) this operation is also made in chunks of blocks (just like when
> chunks of blocks are moved from the global freelist to a threads freelist


When the allocator reuses blocks from the global free list, it's reasonable to take a chunk's worth of blocks if possible, and as many as possible otherwise. You have to acquire a lock to accurately measure the global free list's size, and once you've done this, it would be strange to allocate from the heap in the case where there is enough memory to satisfy the request, but less than a chunk's worth. There's also no point in leaving a block in the global free list in order to satisfy another thread's request.

Not sure that I understand. As I read your comment, this is exactly what happens: when we have locked the global pool (in allocate()) we grab blocks until the end of list _or_ the amount of blocks that fits into a chunk. This limitation is done in order to avoid that we get "to much" memory on a threads freelist that then will be moved back to the global freelist by the deallocate() routine. This background information should of course be part of the documentation...




The corresponding deallocation scenario is more complicated. When a thread's free list reaches its maximum size, you can try to move a chunk's worth of blocks to the global free list, which could erase the thread's free list (potentially inefficient), or do something else, e.g. refer to chunk size in the maximum size computation. The documentation should provide more detail about this aspect of the algorithm.

I agree, and this should also be the piece of code that gets the most attention since this can further improve performance in real world applications. I have some ideas that I will submit at some time (when the doc and pending patches are done).


Brgds

/Stefan




Felix





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]