[C++] Should the complexity of std::list::size() be O(n) or O(1)?
Howard Hinnant
hhinnant@apple.com
Sat Nov 26 02:19:00 GMT 2005
On Nov 24, 2005, at 3:57 PM, Gabriel Dos Reis wrote:
> Certainly. My point was that I'm not convinced penalizing proper list
> operations is the right resolution.
I'm not convinced that labeling list::size() as improper is correct
characterization. I have no trouble thinking of applications which
need to know the size of the list. Admittedly I also have no trouble
thinking of applications which don't need to know the size of the list.
What is improper is having an O(N) size. It should either be O(1),
or not exist. The fact that it exists and is O(N) is what is
improper imho. I continue to come across code that uses list::size
in the mistaken belief that it is O(1).
http://gpstk.sourceforge.net/doxygen/FileFilter_8hpp-source.html
> 00103 std::vector<lItrType> data(dataVec.size()); //
> dataVec is std::list
> 00104 lItrType itr = dataVec.begin();
> 00105 typename std::vector<lItrType>::size_type i = 0;
> 00106 while (itr != dataVec.end())
> 00107 {
> 00108 data[i] = itr;
> 00109 i++;
> 00110 itr++;
> 00111 }
This is not a simple case of people not being sufficiently educated.
This is a very poor interface design on the part of std::list.
> 3. Leave it as is.
> 4. Be clear that size() is linear for list<>.
These are unacceptable because they leave in place the bad interface.
Fwiw, I've been working on an example which uses both list::size, and
list::splice-some-from-other. My motivation is to get real data on
the various effects, instead of just tossing words back and forth.
The results are obvious for examples that would use only size, or
only splice, so I find it more interesting to explore a case that
would use both.
The spec for the example I'm working with is:
Consider an aircraft seating program which has a function called
free_upgrade. This function tries to upgrade passengers from economy
to first class, but is subject to the size limits of first class
seating, and also subject to weight & balance rules to keep the
aircraft stable. It is also smart enough to not split up a party
that's traveling together. Here is how it might look:
/*
Upgrade as many economy passengers as you can to first class
without overbooking
first class, or violating weight/balance limits that state there
must be more
economy passengers than first class passengers.
Passengers travel in groups. Never partially upgrade a group.
Passengers
traveling together want to sit together. Grouped Passengers are
adjacent in
the lists.
Passenger address must remain stable as other data structures
will reference
individual Passengers.
*/
void free_upgrade(std::list<Passenger>& first_class,
std::list<Passenger>& economy)
{
// splices Passengers from economy to first_class subject to the
above constraints.
// splices a group at a time.
}
The parameters to the problem might look something like:
const std::size_t first_class_initial_size = 25;
const std::size_t first_class_limit = 50;
const std::size_t economy_class_initial_size = 100;
const std::size_t max_group_size = 5;
I'm giving each Passenger a first and last name. And Passengers that
are adjacent in the lists, and have the same last name are
constrained to travel together (either all or none of that subgroup
will be upgraded). Passenger groups occur with uniform randomness in
the range [1, max_group_size].
I've coded this up 3 ways so far:
1. Assuming an O(1) size, and being right.
2. Assuming an O(N) size, and being right - sizes are computed and
cached externally.
3. Assuming an O(1) size, but being wrong (size is actually O(N)).
The results vary greatly of course with the values of the initial
parameters. And obviously as the parameters are increased in value
the differences in the timing results become more exaggerated.
I've yet to find a case where solution 1 is not fastest. Solution 2
is usually a close second. The best run it has had is only 1% slower
(with the above parameters). The worst is 37% slower with:
const std::size_t first_class_initial_size = 2500;
const std::size_t first_class_limit = 5000;
const std::size_t economy_class_initial_size = 10000;
const std::size_t max_group_size = 100;
(I really don't care if the above numbers aren't realistic for an
aircraft - that is beside the point. I'm exploring performance
characteristics of list)
Case 3 is pretty much always a disaster. With the first set of
parameters it is 39% slower. With the latter set of parameters it is
73 times slower.
Imho, we need to do whatever it takes to make sure our customers
don't accidently fall into the case 3 trap.
-Howard
PS: Source code for this experiment is available, but I wanted to
give others a chance for independent verification.
PSS: Iteration, size and splice-some-from-other are the only list
functions being timed. Specifically left out of this analysis are
list assignment, resize and operator== (that is perhaps another test).
More information about the Libstdc++
mailing list