[C++] Should the complexity of std::list::size() be O(n) or O(1)?

Howard Hinnant hhinnant@apple.com
Sat Nov 26 02:19:00 GMT 2005


On Nov 24, 2005, at 3:57 PM, Gabriel Dos Reis wrote:

> Certainly.  My point was that I'm not convinced penalizing proper list
> operations is the right resolution.

I'm not convinced that labeling list::size() as improper is correct  
characterization.  I have no trouble thinking of applications which  
need to know the size of the list.  Admittedly I also have no trouble  
thinking of applications which don't need to know the size of the list.

What is improper is having an O(N) size.   It should either be O(1),  
or not exist.  The fact that it exists and is O(N) is what is  
improper imho.  I continue to come across code that uses list::size  
in the mistaken belief that it is O(1).

http://gpstk.sourceforge.net/doxygen/FileFilter_8hpp-source.html

> 00103       std::vector<lItrType> data(dataVec.size());         //  
> dataVec is std::list
> 00104       lItrType itr = dataVec.begin();
> 00105       typename std::vector<lItrType>::size_type i = 0;
> 00106       while (itr != dataVec.end())
> 00107       {
> 00108          data[i] = itr;
> 00109          i++;
> 00110          itr++;
> 00111       }

This is not a simple case of people not being sufficiently educated.   
This is a very poor interface design on the part of std::list.

>   3.  Leave it as is.
>   4.  Be clear that size() is linear for list<>.

These are unacceptable because they leave in place the bad interface.

Fwiw, I've been working on an example which uses both list::size, and  
list::splice-some-from-other.  My motivation is to get real data on  
the various effects, instead of just tossing words back and forth.   
The results are obvious for examples that would use only size, or  
only splice, so I find it more interesting to explore a case that  
would use both.

The spec for the example I'm working with is:

Consider an aircraft seating program which has a function called  
free_upgrade.  This function tries to upgrade passengers from economy  
to first class, but is subject to the size limits of first class  
seating, and also subject to weight & balance rules to keep the  
aircraft stable.  It is also smart enough to not split up a party  
that's traveling together.  Here is how it might look:

/*
     Upgrade as many economy passengers as you can to first class  
without overbooking
     first class, or violating weight/balance limits that state there  
must be more
     economy passengers than first class passengers.

     Passengers travel in groups.  Never partially upgrade a group.   
Passengers
     traveling together want to sit together.  Grouped Passengers are  
adjacent in
     the lists.

     Passenger address must remain stable as other data structures  
will reference
     individual Passengers.
*/

void free_upgrade(std::list<Passenger>& first_class,  
std::list<Passenger>& economy)
{
     // splices Passengers from economy to first_class subject to the  
above constraints.
     // splices a group at a time.
}

The parameters to the problem might look something like:

const std::size_t first_class_initial_size = 25;
const std::size_t first_class_limit = 50;
const std::size_t economy_class_initial_size = 100;
const std::size_t max_group_size = 5;

I'm giving each Passenger a first and last name.  And Passengers that  
are adjacent in the lists, and have the same last name are  
constrained to travel together (either all or none of that subgroup  
will be upgraded).  Passenger groups occur with uniform randomness in  
the range [1, max_group_size].

I've coded this up 3 ways so far:

1.  Assuming an O(1) size, and being right.
2.  Assuming an O(N) size, and being right - sizes are computed and  
cached externally.
3.  Assuming an O(1) size, but being wrong (size is actually O(N)).

The results vary greatly of course with the values of the initial  
parameters.  And obviously as the parameters are increased in value  
the differences in the timing results become more exaggerated.

I've yet to find a case where solution 1 is not fastest.  Solution 2  
is usually a close second.  The best run it has had is only 1% slower  
(with the above parameters).  The worst is 37% slower with:

const std::size_t first_class_initial_size = 2500;
const std::size_t first_class_limit = 5000;
const std::size_t economy_class_initial_size = 10000;
const std::size_t max_group_size = 100;

(I really don't care if the above numbers aren't realistic for an  
aircraft - that is beside the point.  I'm exploring performance  
characteristics of list)

Case 3 is pretty much always a disaster.  With the first set of  
parameters it is 39% slower.  With the latter set of parameters it is  
73 times slower.

Imho, we need to do whatever it takes to make sure our customers  
don't accidently fall into the case 3 trap.

-Howard

PS:  Source code for this experiment is available, but I wanted to  
give others a chance for independent verification.

PSS:  Iteration, size and splice-some-from-other are the only list  
functions being timed.  Specifically left out of this analysis are  
list assignment, resize and operator== (that is perhaps another test).



More information about the Libstdc++ mailing list