This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
std::vector's freestore management

From: Daniel Kostecky <dk at kotelna dot sk>
To: "comp.std.c++ e-mail submission" <std-c++ at ncar dot ucar dot edu>, GCC std C++ library discussion <libstdc++ at gcc dot gnu dot org>
Date: Mon, 8 Apr 2002 20:47:51 +0200
Subject: std::vector's freestore management
Reply-to: Daniel Kostecky <dk at kotelna dot sk>
Dear C++ users,

The space overhead introduced in std::vector prevents the reallocation
from occuring too frequently. This in turn should make the programs
run faster, since less operations have to be performed.
Unfortunately, in special situations, e.g. when vectors consuming huge
amounts of memory are used, this space overhead may make the program
to run very slowly in swapping environments. Or even such a program
may fail with the out of memory error. The reason behind is that
the std::vector's space overhead is considerable in many vendors
implementations (roughly speaking twice the storage may be allocated
than it is required). Moreover, the standard provides the users
with no handy means to alter the behaviour defined
in the vendors' implementations.
A possible way how to circumvent this difficulty without compromising
the existing standards and the existing code is to enhance std::vector
by introducing special objects that will provide std::vector instances
with hints on what to do when the size is about to get changed.
These 'capacity hint' objects may be supplied by users when a special
behaviour is needed. If not supplied, std::vector will fall back
to the default behaviour.

Those interested in technical details are kindly ask to consult
the appendix.

Well, I'm afraid, that I'm about to create a patch against my favourite
vendor's, that is GNU's, std C++ library implementation. This will give
me more comfort when using std::vector in my projects.
I'm aware that it's generally not a good idea, but I failed to find
anything better. Since, it seems that I may easily fall in spending serious
effort, without obtaining relevant benefit, I would be much oblidged
for your critique, comments and advices.

Thank you for your help.

With best regards,

Daniel Kostecky

APPENDIX:

Title: std::vector's freestore management
Author: Daniel Kostecky, dk@kotelna.sk
Date: 2002/04/08

Content:

1. Terms
2. An example std::vector implementation
3. A possible solution
4. Links
5. References
6. Credits
7. Legal notice

1. Terms

In the following, the term container will refer to a container
which may allocate the memory for some extra elements in order
not to perform the reallocation too frequently, thus speeding up
its operations. An example of such a container is std::vector.
The term capacity will refer to the number of elements
for which the memory has been allocated. The term size will refer
to the number of the elements which are used or ready to be used
by the container's user. Obviously the memory for these elements
must be already allocated and initialized. The following trivial
assertion holds 0 <= size <= capacity.
Let sizeof(container_type) be the size in bytes of the container
object itself excluding the memory used by the elements and
let sizeof(element_type) be the size in bytes of the element.
The absolute and relative space overheads can be computed as follows

abs_overhead = (capacity - size) * sizeof(element_type)

rel_overhead = (sizeof(container_type) + capacity * sizeof(element_type)) /
               (sizeof(container_type) + size * sizeof(element_type))

The term enlarging-only container will refer to a container object
instance which size only increases during its lifetime.

2. An example std::vector implementation

As an example std::vendor implementation was choosen the implemenation
provided by GNU with the GCC compiler version 3.0.3.
Within this implementation when a std::vector instance increases
its size from 0 in steps of 1, the capacity is the lowest power of 2
greater or equal to the size. And when a std::vector instance decreases
its size, the capacity remains unchanged. This yields the following facts:

- the absolute space overhead is generally not bounded
- for enlarging only std::vector instances the relative space overhead
  cannot be bounded by a constant less than 2 (it seems that the bound
  exists and is equal to 2, but to be exact more arguments are needed)
- for std::vector instances, which size has decreased at some point
  during their lifetime the relative space overhead is generally not bounded

The term bounded is understood in the sense that there exist a constant,
the bound, which depends only on how the std::vector freestore management
is implemented. This bound shall not depend on the ranges of the types
used to store sizes, pointer differences, etc., nor this bound shall
depend on parameters of a particular host system, e.g. the total amount
of memory available.

The header file 'include/g++-v3/bits/stl_vector.h' can be inspected
briefly and it seems that the new capacity when the reallocation is needed
may be computed in several places. In the following such statements
are listed with the corresponding method name and the line number.

method: vector<_Tp, _Alloc>::_M_insert_aux(iterator, const _Tp&)
line  : 585
stat. : const size_type __len = __old_size != 0 ? 2 * __old_size : 1;

method: vector<_Tp, _Alloc>::_M_insert_aux(iterator)
line  : 619
stat. : const size_type __len = __old_size != 0 ? 2 * __old_size : 1;

method: vector<_Tp, _Alloc>::_M_fill_insert(iterator, size_type, const _Tp&)
line  : 665
stat. : const size_type __len = __old_size + max(__old_size, __n);
rem.  : __n is the number of the elements to be inserted

method: vector<_Tp, _Alloc>::_M_range_insert
line  : 729
stat. : const size_type __len = __old_size + max(__old_size, __n);
rem.  : __n is the size of the range to be inserted

This list may not be necessarily complete and it's provided mostly
for illustration. Regardless how the std::vector's freestore
management is implemented by a particular vendor, it seems
that it may not be suitable to some users in some special situations.
Small space overhead may result in too slow operations for some users.
On the other hand, a bigger space overhead may result in considerable
swapping activity or even out of memory errors in special situations
for other users.
I think, that according to the standard, the std::vector freestore
management should be implemeted in such a way, that an insertion operation
is reasonably fast. In order to fulfil this requirement an exponential
capacity growth with the overallocation quotient of 2 was choosen.
This strategy seems to be suitable for the most of the users.
Very likely many of them are not using huge vectors and even
when they do, the unwanted effects of the considerable space overhead
may be hidden by demand paging in their host environments.
Unfortunately, even demand paging may not hide the unwanted effects
in special situations. When a huge vector of small vectors is used,
such that the small vectors fit in a memory page, then no memory page
belonging to the composed container representation contains only
allocated, not initialized and thus unused space. It may happen that
the amount of the memory really used by such a composed container is well
bellow the total amount of memory available, but the amount of the memory
allocated by the container is much greater. In host environments with page
swapping, traversals of such a container may cause extensive swapping
activity and move the performance of the application to unacceptable levels.
In host environments without page swapping such application fails
with out of memory error. In this particular situation, it seems that
if the std::vector freestore management would be adjusted so that the
space overhead for the whole composed container is acceptable, that is
the allocated amount of memory is lower than the total amount of memory
available, the performance reduction linked to the fact that the reallocations
are occuring more frequently may be acceptable as well.
Finally, the application performance would be boosted by the fact that
no page swapping activity is present.

Remark to the version of gcc compiler used: The tests were performed against
the version 3.0.4 of GCC as well, with the same results. The header files
corresponding to std::vector, are the same in both versions.

Remark to the choice of the quotient for the exponential capacity growth:
In fact, there seems to be no reason for the overallocation quotient to be 2,
or any other fixed constant. In theory, the same complexity estimates seem
to be achieved when using any overallocation quotient greater than 1. 
However, the choice of 2 is reasonable, since it is the first integer
greater than 1 and the use of integer quotient provides the benefit,
that no floating point operations are involved.

Remark to complexity estimates (please skip, if not interested in theory):
The real value of saying that something is 'big O ( 1 )' seems to me not high.
Many times the term 'big O ( 1 )' seems to be misused. People use to say that
something is 'big O ( 1 )', while they want to say, there exists a small bound
for that thing. Moreover, technically speaking, one should say that something
belongs to 'big O ( 1 )', since 'big O ( 1 )', or more precisely
'big O (lambda: n -> 1)' is a set of functions. Since it may hurt that this
set can look not as nice as one wants it to look ( depends on the way one
chooses to look over sets of functions ) this remark ends.

3. A possible solution

A possible solution is to free std::vector from taking care
about the computations of the capacity in the case its size
is about to get changed and move this task to a different
object which will provide the std::vector instance with a hint
on what to do in such a case. Such capacity hint objects, call them
'caphint's, shall be able to answer the std::vector instances
the following questions:

- My size is about to get changed, I (std::vector instance) can tell
  you ('caphint' instance) my current capacity.
  Should I perform reallocation? If so, what should be my new capacity?

- The user wants to ensure that I (std::vector instance) have at least
  a given capacity, I can tell you ('caphint' instance) my current capacity.
  Should I perform reallocation? If so, what should be my new capacity?

Let 'caphint' be the class to implement such hints. The following
methods may be used to implement the answering the above questions:

bool caphint::hint_resize(
	std::size_t __requested_size,
	std::size_t & __capacity_in_out) const;

bool caphint::hint_reserve(
	std::size_t __requested_capacity,
	std::size_t & __capacity_in_out) const;

Let '__capacity_in', respectively '__capacity_out' be the value
of the parameter '__capacity_in_out' when entering, respectively
when leaving the method.
Obviously, the 'caphint' objects shall fulfil some requirements,
since a container instance may get corrupted, or other "bad" things may
happen when an instance will follow nonconforming capacity hints.
These requirements are the following:

Req.#1:
 hint_resize method: __capacity_out >= __requested_size
 hint_reserve method: __capacity_out >= __requested_capacity

Req.#2:
 both methods: (__capacity_in != __capacity_out) implies that the value
	returned was 'true'

A convenient way is to use a class hierarchy with 'caphint' as the abstract
base class giving the shape to the interface of the derived classes.
A sample implementation of the 'caphint' base class and few derived classes
already exists, see links for details. The derived classes already
implemented are the following:

safe_caphint - a wrapper class for the requirements checking
stl_caphint - mimics the behaviour of the implementation discussed above
abs_caphint - allows the user to control the absolute space overhead
rel_caphint - allows the user to control the relative space overhead

Both classes 'abs_caphint' and 'rel_caphint' have builtin functionality
that implements automated shrinking of the container. This functionality
is turned off by default. In the derived class 'rel_caphint' the relative
overhead was approximated by the quotient capacity over size in order
to simplify the implementation.

The ehancing of std::vector made by creating a patch against a vendor's
implemenation must not require a single line change in the existing codes.
Technically, any std::vector instance should be able to get a const
(inspector/questioner) access to its cooresponding 'caphint' object in order
to ask any of the above mentioned questions by calling the appropriate
methods. A similar approach as with the allocator template parameter
could be used. After the patching, the std::vector template declaration
first lines might be outlined as follows:

template <class _Tp, class _Alloc = allocator<_Tp>,
	class _Caphint = stl_caphint>
class vector : protected _Vector_base<_Tp, _Alloc>

And the declaration for one of the constructors might be outlined as follows:

vector(size_type __n, const _Tp& __value,
	const allocator_type& __a = allocator_type(),
	const caphint& __h = _Caphint())

This approach would not provide a possibility to change the corresponding
'caphint' reference during the lifetime of a std::vector instance.
If behaviour changing at runtime would be needed, there would be
a possibility to use an accordingly designed 'caphint' class
which behaviour would be changing, e.g. a proxy class.
Originally there was an idea to introduce a static table like structure
per each std::vector specialization and use simple tags (e.g. std::size_t)
for lookups (std::vector instance will have an additional tag member).
However, such an approach, seems to have impacts on thread safety.
As far as exception safety is concerned, there should be no impact there,
since the 'caphint' object takes care only about answering the capacity
related questions and the handling with the user elements would remain
untouched under the responsibility of std::vector.

Remark to the second requirement: The second requirement for 'caphint' object
can be made more tight by using equivalence instead of implication. However,
such a change seems to generate additional coding in parametrized 'caphint'
classes and provide no real benefit. Of course, it's assumed that the derived
class is designed in such a way that for sane combinations of the parameters,
the equivalence holds true.

Remark to the modification of the standard: The above stated ideas should
not be understood as a proposal for a change of the standard. However,
if it turns out wise to do so, it may be a good idea to base
the modifications on something like the following:

- The modified standard will guarantee the same as the original one
  when the default 'caphint', i.e. 'stl_caphint' is used.
  The default 'caphint' is used by default within the codes not aware
  about 'caphint' objects, thus the modified standard guarantees the same
  as the original for the codes written according to the original standard.

- The modified standard will guarantee that the std::vector instances
  will follow the hints provided by 'caphint' objects exactly,
  i.e. the hints will represent policies for them. This will give the users
  ability to make decisions about std::vector behaviour when using customized
  'caphint' derived classes.

- Some derived 'caphint' classes may be provided within the modified
  standard. The behaviour (performance, operations complexity, etc.)
  of std::vector instances when using such 'caphint's should be provided
  by the modified standard.

- The modified standard shall not provide any other guarantees,
  nor indications about the behaviour (performance, operations complexity,
  etc.) of std::vector instances when using 'caphint's not provided
  by the modified standard.

4. Links

An implementation of the 'caphint' stuff with some test programs is freely
avaiable from http://www.kotelna.sk/dk/sw/caphint/caphint-020408.tar.gz.
The site is hosted by Kotelna Consulting.

5. References

 C++ FAQ Lite by Marshall Cline, http://www.parashift.com/c++-faq-lite

 The C++ Programming Language by Bjarne Stroustrup, ISBN 0201700735
 
 C++ Standard Library Active Issues List (Revision 20),
  http://std.dkuug.dk/jtc1/sc22/wg21/docs/library20.zip
  http://std.dkuug.dk/jtc1/sc22/wg21/docs/lwg-active.html

6. Credits

I would like to thank to the following people

 Rastislav Senderak for his careful reading, critique and fruitful advices
 Martin Keseg for giving me shelter at 'kotelna.sk'.

7. Legal notice

No part of this work is in any way affiliated to my current employer
Alcatel Slovakia, a.s., Liptovsky Hradok, Slovakia. No part of this work
is in any way related to my position in Alcatel Slovakia. No part of this
work was done based on my managers tasks assigments or advices.
This work is partly related to my Ph.D. study at the Department of Numerical
and Optimization Methods, Faculty of Mathematics, Physics and Informatics,
Comenius University, Bratislava.

End of APPENDIX.

-- 
Daniel Kostecky, dk@kotelna.sk
Follow-Ups:
- Re: std::vector's freestore management
  - From: Phil Edwards
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]