This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: RFC: Hint parameter for allocators.
- From: Dhruv Matani <dhruvbird at gmx dot net>
- To: libstdc++ <libstdc++ at gcc dot gnu dot org>
- Cc: Loren James Rittle <rittle at labs dot mot dot com>
- Date: 23 Jan 2004 14:30:34 +0530
- Subject: Re: RFC: Hint parameter for allocators.
- Organization:
- References: <200401230319.i0N3JsrO035649@latour.rsch.comm.mot.com>
On Fri, 2004-01-23 at 08:49, Loren James Rittle wrote:
> Dhruv Matani writes:
> > I'd like to discuss the specific use of the hint parameter of a standard
> > allocator when used to specify the region of memory the allocator should
> > prefer when choosing it for the current request.
> [...]
>
> This is an interesting line of ideas. I do wonder if the dynamic
> complexity outweighs memory layout advantage.
That's the main question. I guess only studying a real application would
provide the answer.
> In a static system
> build, where precompute time is a one-time cost, this can be a big win.
I did not get you here. What precompute time are you talking about?
> > 4. Does anyone know of an allocator that makes use of the hint parameter
> > for locality preservation?
>
> Not in the context of a general allocator for use with STL. How are
> you passing a hint in your current effort (code example, etc)?
First, we need to modify the list's base class so that get node accepts
hints. Then, modify the create_node function to accept hints. Then, in
the insert (iterator pos, const_reference Data) function, we change the
line that calls create_node (Data) to
create_node (Data, pos._M_node->_M_prev);
pos._M_node->_M_prev is the last valid node in case of a non-empty list.
Then there's no need to change any driver program. Just re-compile any
existing test program that uses list and check the timings. I did that,
and did not notice any significant changes, so I suppose that:
1. It does not matter that much or:
2. In real application that runs for longer times, that would probably
make a difference?
Of course, you need to use an allocator that can make sense of the hint
parameter.
For the file attached, the controlling #define is USE_HINT_PARAMETER.
You can turn it on/off by simply commenting it away.
--
-Dhruv Matani.
http://www.geocities.com/dhruvbird/
/* Copyright (c) 2003
* Dhruv Matani.
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. No representations about the
* suitability of this software for any purpose. It is provided
* "as is" without express or implied warranty.
*/
//Bitmapped Allocator.
//====================
// This allocator will make use of 1 single bit to keep track of whther
// it has been allocated or not. A bit 1 indicates free, while 0
// indicates allocated. This has been done so that you can easily check a
// collection of bits for a free block. This kind of Bitmapped strategy
// works best for single object allocations, and with the STL type
// parameterized, we do not need to choose any size for the block which
// will be represented by a single bit. This will be the size of the
// parameter around which the allocator has been parameterized. Thus,
// close to optimal performace will result.
// Another issue would be whether to keep the all bitmaps in a
// separate area in memory, or to keep them near the actual blocks
// that will be given out or allocated for the client. After some
// testing, I've decided to keep these bitmaps close to the actual
// blocks. this will help in 2 ways. 1. Constant time access for the
// bitmap themselves, since no kind of look up will be needed to find
// the correct bitmap list or it's equivalent. 2. And also this would
// preserve the cache as far as possible. So in effect, this kind of
// an allocator might not prove beneficial from a purely cache point
// of view. But this allocator has been made to try and roll out the
// defects of the node_allocator, whrein the nodes get skewed about in
// memory, if they are not returned in the exact reverse order or in
// the same order in which they were allocated. Also, the
// vma_allocator's book keeping overhead is too much for small objects
// and single object allocations, though it preserves the locality of
// blocks very well when they are returned back to the allocator.
// Expected overhead per block would be 1 bit in memory. Also, once
// the address of the free list has been found, the cost for
// allocation/deallocation would be negligible, and is supposed to be
// conatant time. For these very reasons, it is very important to
// minimize the linear time costs, which include finding a free list
// with a free block while allocating, and finding the corresponding
// free list for a block while deallocating. Therefore, I have decided
// that the growth of the internal pool for this allocator will be
// exponential as compared to linear for node_allocator. There, linear
// time works well, because we are mainly concerned with speed of
// allocation/deallocation and memory consumption, whereas here, the
// allocation/deallocation part does have some linear/logarathmic
// complexity components in it. Thus, to try and minimize them would
// be a good thing to do at the cost of a little bit of memory.
// Another thing to be noted is the the pool size will double every
// time the internal pool gets exhausted, and all the free blocks have
// been given away. The initial size of the pool would be
// sizeof(int)*8 which is the nuber of bits in an integer, which can
// fit exactly in a CPU register. Hence, the term given is exponential
// growth of the internal pool.
#if !defined __STL_BITMAP_ALLOCATOR_HPP__
#define __STL_BITMAP_ALLOCATOR_HPP__
#include <cstddef>
//For std::size_t, and ptrdiff_t.
#include <utility>
//For std::pair.
#include <algorithm>
//std::find_if.
#include <vector>
//For the free list of exponentially growing memory blocks. At max,
//size of the vector should be not more than the number of bits in an
//integer or an unsigned integer.
#include <functional>
//For greater_equal, and less_equal.
#include <new>
#include <cheap_alloc.hpp>
//For std::cheap_allocator for std::vector.
#include <cassert>
#define ARCH_HAS_BSF
#define MAGIC_STOP_GROWTH 1048576
//If USE_HINT_PARAMETER is defined, the allocator may get potentially
//slower, but it will ensure that the locality of reference for memory
//blocks is preserved as far as possible, if a correct hint parameter
//is passed. If an incorrect or useless hint parameter is passed, then
//the allocator's performance may degrade by a huge margin.
#define USE_HINT_PARAMETER
//#define CHECK_FOR_ERRORS
namespace __gnu_cxx
{
namespace _aux_balloc {
static const unsigned int Bits_Per_Byte = 8;
static const unsigned int Bits_Per_Block = sizeof(unsigned int) * Bits_Per_Byte;
#if defined USE_HINT_PARAMETER
template <typename T>
class RAII_Swap {
T& T1;
T& T2;
RAII_Swap (RAII_Swap const&);
RAII_Swap& operator= (RAII_Swap const&);
public:
RAII_Swap (T& _t1, T& _t2) : T1(_t1), T2(_t2)
{
std::swap (T1, T2);
}
~RAII_Swap ()
{
std::swap (T1, T2);
}
};
#endif
//T should be a pointer type.
template <typename T>
class inclusive_between {
typedef T pointer;
pointer ptr_value;
typedef typename std::pair<T, T> block_pair;
public:
inclusive_between (pointer _ptr) : ptr_value(_ptr) { }
bool operator () (block_pair _bp)
{
if (std::less_equal<pointer> ()(ptr_value, _bp.second) &&
std::greater_equal<pointer> ()(ptr_value, _bp.first))
return true;
else
return false;
}
};
//Used to pass a Functor to functions by reference.
template <typename Functor>
class Functor_Ref {
Functor& FRef;
public:
Functor_Ref (Functor& _fref) : FRef(_fref) { }
//operator Functor& () { return FRef; }
template <typename X>
bool operator() (X _x) { return FRef (_x); }
};
//T should be a pointer type, and A is the Allocator for the vector.
template <typename T, typename A>
class ffit_finder {
typedef typename std::vector<std::pair<T, T>, A> BPVector;
typedef typename BPVector::difference_type counter_type;
typedef typename std::pair<T, T> block_pair;
unsigned int *pbitmap;
unsigned int _offset;
public:
ffit_finder () : pbitmap (0), _offset (0) { }
bool operator() (block_pair _bp)
{
//Set the _rover to the last unsigned integer, which is the
//bitmap to the first free block. Thus, the bitmaps are in exact
//reverse order of the actual memory layout. So, we count down
//the bimaps, which is the same as moving up the memory.
unsigned int *_rover = reinterpret_cast<unsigned int*>(_bp.first) - 1;
counter_type _diff = _bp.second - _bp.first + 1;
_diff /= Bits_Per_Block;
for (counter_type i = 0; i < _diff; ++i)
{
_offset = i;
if (*_rover)
{
pbitmap = _rover;
return true;
}
--_rover;
}
return false;
}
unsigned int *get () { return pbitmap; }
unsigned int offset () { return _offset * Bits_Per_Block; }
};
//T should be a pointer type.
template <typename T, typename A>
class Bit_Map_Counter {
typedef typename std::vector<std::pair<T, T>, A> BPVector;
typedef typename BPVector::size_type index_type;
typedef T pointer;
BPVector& _vbp;
unsigned int *_curr_bmap;
unsigned int *_last_bmap_in_block;
index_type _curr_index;
public:
//Use the 2nd parameter with care. Make sure that such an entry
//exists in the vector before passing that particular index to
//this ctor.
Bit_Map_Counter (BPVector& Rvbp, index_type Index = 0) : _vbp(Rvbp)
{
this->reset (Index);
}
void reset (index_type Index = 0)
{
_curr_index = Index;
_curr_bmap = reinterpret_cast<unsigned int*>(_vbp[_curr_index].first) - 1;
assert (Index <= _vbp.size() - 1);
_last_bmap_in_block = _curr_bmap -
((_vbp[_curr_index].second - _vbp[_curr_index].first + 1) / Bits_Per_Block - 1);
}
//Dangerous Function! Use with extreme care. Pass to this
//functions ONLY those values that are known to be correct,
//otherwise this will mess up big time.
void Set_Internal_Bit_Map (unsigned int *New_Internal_Marker)
{
_curr_bmap = New_Internal_Marker;
}
bool finished ()
{
return (_curr_bmap == 0);
}
Bit_Map_Counter& operator++ ()
{
if (_curr_bmap == _last_bmap_in_block)
{
if (++_curr_index == _vbp.size())
{
_curr_bmap = 0;
}
else
{
this->reset (_curr_index);
}
}
else
{
--_curr_bmap;
}
return *this;
}
unsigned int *get ()
{
return _curr_bmap;
}
pointer base () { return _vbp[_curr_index].first; }
unsigned int offset ()
{
return Bits_Per_Block * ((reinterpret_cast<unsigned int*>(this->base()) - _curr_bmap) - 1);
}
unsigned int where () { return _curr_index; }
};
}
template <class T> class bitmap_allocator;
// specialize for void:
template <> class bitmap_allocator<void> {
public:
typedef void* pointer;
typedef const void* const_pointer;
// reference-to-void members are impossible.
typedef void value_type;
template <class U> struct rebind { typedef bitmap_allocator<U> other; };
};
template <class T> class bitmap_allocator {
public:
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
typedef T value_type;
template <class U> struct rebind { typedef bitmap_allocator<U> other; };
private:
static const unsigned int Bits_Per_Byte = 8;
static const unsigned int Bits_Per_Block = sizeof(unsigned int) * Bits_Per_Byte;
//The following function has been written by Apurva
//Mehta. apurva@gmx.net.
static inline unsigned int first_right_bit (unsigned int num)
{
#if defined ARCH_HAS_BSF
int tmp = 0;
__asm__ ("bsf %1, %0"
: "=r" (tmp)
: "r" (num)
: "%eax", "%ebx", "%edx", "%ecx"
);
//I don't know whether these registers need to be clobbered?
return tmp;
#else
int ret_val = 0;
while (num % 2 == 0)
++ret_val, num = num/2;
return ret_val;
#endif
}
static inline void bit_allocate (unsigned int *pbmap, unsigned int Pos)
{
unsigned int _mask = 1 << Pos;
_mask = ~_mask;
*pbmap &= _mask;
}
static inline void bit_free (unsigned int *pbmap, unsigned int Pos)
{
unsigned int _mask = 1 << Pos;
*pbmap |= _mask;
}
static inline void *Memory_Get (size_t _sz) throw (std::bad_alloc)
{
return operator new (_sz);
}
static inline void Memory_Put (void *_ptr) throw ()
{
operator delete (_ptr);
}
typedef typename std::pair<pointer, pointer> block_pair;
typedef typename std::cheap_allocator<block_pair> BPVec_Allocator_Type;
typedef typename std::vector<block_pair, BPVec_Allocator_Type> BPVector;
#if defined CHECK_FOR_ERRORS
static void Check_For_Free_Blocks ()
{
typedef typename __gnu_cxx::_aux_balloc::ffit_finder<pointer, BPVec_Allocator_Type> FFF;
FFF _fff;
typedef typename BPVector::iterator bpiter;
bpiter _bpi = std::find_if (mem_blocks.begin(), mem_blocks.end(),
__gnu_cxx::_aux_balloc::Functor_Ref<FFF> (_fff));
assert (_bpi == mem_blocks.end());
}
#endif
static void Refill_Pool ()
{
#if defined CHECK_FOR_ERRORS
Check_For_Free_Blocks ();
#endif
const unsigned int Num_Bit_Maps = Block_Size / Bits_Per_Block;
const unsigned int _size_to_allocate =
Block_Size * sizeof(value_type) + Num_Bit_Maps*sizeof(unsigned int);
unsigned int *temp =
reinterpret_cast<unsigned int*>(Memory_Get (_size_to_allocate));
//The Header information goes at the Beginning of the Block.
block_pair _bp = std::make_pair (reinterpret_cast<pointer>(temp + Num_Bit_Maps),
reinterpret_cast<pointer>(temp + Num_Bit_Maps) + Block_Size - 1);
//Fill the Vector with this information.
mem_blocks.push_back (_bp);
unsigned int Bit_Mask = 0; //0 Indicates all Allocated.
Bit_Mask = ~Bit_Mask; //1 Indicates all Free.
for (unsigned int i = 0; i < Num_Bit_Maps; ++i)
temp[i] = Bit_Mask;
//On some implementations, operator new might throw bad_alloc, or
//malloc might fail if the size passed is too large, therefore, we
//limit the size passed to malloc or operator new.
if (Block_Size < MAGIC_STOP_GROWTH) Block_Size *= 2;
}
static void _init_me ()
{
if (!mem_blocks.empty ())
return;
mem_blocks.reserve (16);
Refill_Pool ();
static __gnu_cxx::_aux_balloc::Bit_Map_Counter<pointer, BPVec_Allocator_Type> _bmc (mem_blocks);
// Bit_Map_Counter *_bmc = new Bit_Map_Counter (mem_blocks);
last_request = &_bmc;
}
static BPVector mem_blocks;
static unsigned int Block_Size;
static __gnu_cxx::_aux_balloc::Bit_Map_Counter<pointer, BPVec_Allocator_Type> *last_request;
#if defined USE_HINT_PARAMETER
static __gnu_cxx::_aux_balloc::Bit_Map_Counter<pointer, BPVec_Allocator_Type> *hint_request;
#endif
public:
bitmap_allocator()
{
_init_me ();
}
//Copy Ctor means that a class of this type has already been
//instantiated. Meaning that we need not call _do_init() again.
bitmap_allocator(const bitmap_allocator&) { }
template <class U> bitmap_allocator(const bitmap_allocator<U>&) throw()
{
_init_me ();
}
//Dtor not needed.
//~bitmap_allocator() throw();
pointer Allocate_Single_Object ()
{
//The algorithm is something like this: The last_requst variable
//points to the last accessed Bit Map. When such a condition
//occurs, we try to find a free block in the current bitmap, or
//succeeding bitmaps until the last bitmap is reached. If no free
//block turns up, we resort to First Fit method. But, again, the
//First Fit is used only upto the point where we started the
//previous linear search.
while (last_request->finished () == false && (*(last_request->get()) == 0))
{
last_request->operator++ ();
}
if (last_request->finished ())
{
//Fall Back to First Fit algorithm.
typedef typename __gnu_cxx::_aux_balloc::ffit_finder<pointer, BPVec_Allocator_Type> FFF;
FFF _fff;
typedef typename BPVector::iterator bpiter;
bpiter _bpi = std::find_if (mem_blocks.begin(), mem_blocks.end(),
__gnu_cxx::_aux_balloc::Functor_Ref<FFF> (_fff));
if (_bpi != mem_blocks.end())
{
//Search was successful. Ok, now mark the first bit from
//the right as 0, meaning Allocated. This bit is obtained
//by calling get() on _fff.
unsigned int nz_bit = first_right_bit (*_fff.get());
bit_allocate (_fff.get(), nz_bit);
last_request->reset (_bpi - mem_blocks.begin());
//Now, get the address of the bit we marked as allocated.
pointer Ret_Val = _bpi->first + _fff.offset() + nz_bit;
return Ret_Val;
}
else
{
//Search was unsuccessful. We Add more memory to the pool
//by calling Refill_Pool().
Refill_Pool ();
//Reset the last_request structure to the first free
//block's bit map.
last_request->reset (mem_blocks.size () - 1);
//Now, mark that bit as allocated.
unsigned int nz_bit = first_right_bit (*last_request->get());
bit_allocate (last_request->get(), nz_bit);
pointer Ret_Val = last_request->base() + last_request->offset() + nz_bit;
return Ret_Val;
}
}
else
{
//last_request holds a pointer to a valid bit map, that points
//to a free block in memory.
unsigned int nz_bit = first_right_bit (*last_request->get());
bit_allocate (last_request->get(), nz_bit);
pointer Ret_Val = last_request->base() + last_request->offset() + nz_bit;
return Ret_Val;
}
}
pointer allocate (size_type n)
{
if (n != 1)
return reinterpret_cast<pointer>(Memory_Get (n * sizeof(value_type)));
else
return Allocate_Single_Object ();
}
pointer allocate (size_type n, typename bitmap_allocator<void>::const_pointer hint)
{
//I will probably implement this beahviour later. For no, just
//call the previous allocate function.
#if !defined USE_HINT_PARAMETER
return allocate (n);
#else
if (n != 1)
return reinterpret_cast<pointer>(Memory_Get (n * sizeof(value_type)));
else if (!hint)
return Allocate_Single_Object ();
typedef typename __gnu_cxx::_aux_balloc::Bit_Map_Counter<pointer, BPVec_Allocator_Type> bmc_type;
static bmc_type _bmc (mem_blocks);
hint_request = &_bmc;
pointer _phint = reinterpret_cast<pointer> (const_cast<void*> (hint));
if (hint_request->finished() == true)
hint_request->reset (mem_blocks.size () - 1);
if (__gnu_cxx::_aux_balloc::inclusive_between<pointer> (_phint) (mem_blocks[hint_request->where ()]))
//Condition satisfied. We have assumed correctly. The hint
//parameter is actually within the area returned by where().
{
//We first swap hint_request and last_request, and then call
//the default Single Object Allocation routione.
__gnu_cxx::_aux_balloc::RAII_Swap<bmc_type*> Auto_Kill (last_request, hint_request);
pointer Ret_Val = Allocate_Single_Object ();
return Ret_Val;
}
//Sorry! The current Area does not correspond well to the hint
//parameter. We first need to find the corresponding area.
typedef typename BPVector::iterator iterator;
typedef typename BPVector::difference_type diff_type;
iterator iter = std::find_if (mem_blocks.begin(), mem_blocks.end(),
__gnu_cxx::_aux_balloc::inclusive_between<pointer> (_phint));
assert (iter != mem_blocks.end());
//Get the position of the iterator that has been found.
diff_type _diff = iter - mem_blocks.begin();
int _displacement = _phint - mem_blocks[_diff].first;
unsigned int *_bit_mapC = reinterpret_cast<unsigned int*>(mem_blocks[_diff].first) - 1;
_bit_mapC -= (_displacement / Bits_Per_Block);
//Use a potentially Dangerous Function.
hint_request->reset (_diff);
hint_request->Set_Internal_Bit_Map (_bit_mapC);
//We first swap hint_request and last_request, and then call
//the default Single Object Allocation routione.
__gnu_cxx::_aux_balloc::RAII_Swap<bmc_type*> Auto_Kill (last_request, hint_request);
pointer Ret_Val = Allocate_Single_Object ();
return Ret_Val;
#endif
}
void deallocate(pointer p, size_type n)
{
if (n != 1)
Memory_Put (p);
else
Deallocte_Single_Object (p);
}
static void Deallocte_Single_Object (pointer p)
{
typedef typename BPVector::iterator iterator;
typedef typename BPVector::difference_type diff_type;
iterator iter = std::find_if (mem_blocks.begin(), mem_blocks.end(),
__gnu_cxx::_aux_balloc::inclusive_between<pointer> (p));
assert (iter != mem_blocks.end());
//Get the position of the iterator that has been found.
diff_type _diff = iter - mem_blocks.begin();
int _displacement = p - mem_blocks[_diff].first;
unsigned int _rotate = _displacement % Bits_Per_Block;
unsigned int *_bit_mapC = reinterpret_cast<unsigned int*>(mem_blocks[_diff].first) - 1;
_bit_mapC -= (_displacement / Bits_Per_Block);
bit_free (_bit_mapC, _rotate);
}
pointer address (reference r) const { return &r; }
const_pointer address (const_reference r) const { return &r; }
size_type max_size (void) const throw () { return (size_type()-1)/sizeof(value_type); }
void construct (pointer p, const_reference _data)
{
new (p) value_type (_data);
}
void destroy (pointer p)
{
p->~value_type();
}
};
template <typename T>
bitmap_allocator<T>::BPVector bitmap_allocator<T>::mem_blocks;
template <typename T>
unsigned int bitmap_allocator<T>::Block_Size = bitmap_allocator<T>::Bits_Per_Block;
template <typename T>
__gnu_cxx::_aux_balloc::Bit_Map_Counter
<bitmap_allocator<T>::pointer, bitmap_allocator<T>::BPVec_Allocator_Type>
*bitmap_allocator<T>::last_request;
#if defined USE_HINT_PARAMETER
template <typename T>
__gnu_cxx::_aux_balloc::Bit_Map_Counter
<bitmap_allocator<T>::pointer, bitmap_allocator<T>::BPVec_Allocator_Type>
*bitmap_allocator<T>::hint_request;
#endif
template <typename T1, typename T2>
bool operator== (const bitmap_allocator<T1>&, const bitmap_allocator<T2>&) throw()
{
return true;
}
template <typename T1, typename T2>
bool operator!= (const bitmap_allocator<T1>&, const bitmap_allocator<T2>&) throw()
{
return false;
}
}
#endif //__STL_BITMAP_ALLOCATOR_HPP__