This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [C++] Should the complexity of std::list::size() be O(n) or O(1)?


On Nov 24, 2005, at 5:46 AM, Paolo Carlini wrote:

Personally, again, my only very serious concern is binary compatibility,
due to the additional storage of the size, which in any case will delay
this kind of change to the next library ABI. Otherwise, I would
certainly not resist to compelling evidence from actual usage that O (1)
is a must and much more important than splice.

<nod> Yup, this would definitely be an ABI-breaking change.


By the way, in my opinion Matt has a very good point that a size, in
principle, could be always *added* in user code, outside std::list
proper. I think that option overall leads to a third set of trade- offs.
It would be nice if a paper summarizing the issue could also compare to
that option (i.e., would be: O(N) size, O(1) size, O(N) size + external
size).

My informal note tries to go into this a little bit. The effect of an internally stored size has little effect on all but a few list member functions. For such members, storing the size externally is equivalent to storing it internally. There are a few member functions where storing the size externally has a negative effect:


1. (Nearly) all of the assignment members:

list& operator=(const list&);
iterator assign(iterator, iterator);  // forward or better
void assign(size_type, const T&);

If there is an internal O(1) size, these can all be given strong exception safety when T has a nothrow assignment.

The key is to (mentally) divide the source range up into two parts: 1) The first part of the source which can be assigned into the existing target nodes (just recycling the existing nodes). 2) The second part of the source which requires new nodes created and appended to the target (this part is empty if the source size is not greater than the target size).

After the subdivision of the source into these two parts, you first create a temporary list containing a copy of the source's second part. Then assign the source's first part into the target (recycling nodes). If there are extra target nodes left over, delete them and you're done. The temporary list will be empty. If the temporary is not empty, splice it onto the end of the target (in which case you didn't have to erase any target nodes). In either case, if T's assignment is nothrow, once you're past the stage of creating the temporary (and before you've modified the target), the rest of the operation is nothrow. This algorithm isn't practical (it is too expensive) unless you have an O(1) (internal) size.

This algorithm can be duplicated externally (if size is stored externally), but it is highly inconvenient compared to using the member assignments.

2. resize(size_type n, T t):

The first thing you need to do in this function is decide whether you're going to erase or insert. To do that you need to know if n < size(). With an O(N) size this decision costs O(min(n, size())) (with a relatively high k). With an O(1) size the cost is O(1) (with a very small k).

This algorithm can be duplicated externally (if size is stored externally), but it is moderately inconvenient compared to using the member resize.

3. operator==,!=:

If an internal O(1) size exists, these operators can check size() first, possibly short circuiting the element-by-element comparison.

This algorithm can be duplicated externally (if size is stored externally), but it is somewhat inconvenient compared to using the already coded operator== and !=.

Conclusion: Yes, you can store the size externally. But you loose performance and/or functionality in the above members that will also have to be recoded.

    void splice(iterator position, list& x, iterator first, iterator
last, size_type n);  // must be O(1)

is a better compromise.  I also realize that it *is* a compromise,
and not a 100% solution.

Well, somehow I like it, but certainly I find it "weird", don't know how
to better explain. I don't think there is something similar elsewhere in
the library: a quantity that the library could compute by itself and is
instead passed by the user with the requirement that must be consistent
with what the library would compute. Weird! ;)

It is somewhat analogous to:


basic_string(const charT* s);
basic_string(const charT* s, size_type n);

Now certainly you might want to pass a non-null-terminated char* to a string, and the second ctor is handy for that. But you also might already know the length of a null-terminated char* from previous computations, and use the second ctor for that case too. Why have basic_string do an O(n) computation that you've already gone to the trouble to compute?

An interface that forces you to throw away information that takes linear time to compute, is a suboptimal interface.

-Howard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]