This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [C++] Should the complexity of std::list::size() be O(n) or O(1)?


On Nov 25, 2005, at 9:06 PM, Gabriel Dos Reis wrote:

Howard Hinnant <hhinnant@apple.com> writes:

| On Nov 24, 2005, at 3:57 PM, Gabriel Dos Reis wrote:
|
| > Certainly. My point was that I'm not convinced penalizing proper list
| > operations is the right resolution.
|
| I'm not convinced that labeling list::size() as improper is correct
| characterization.


Well, I don't know who said it was "improper".
Certainly, I don't believe it is a proper list operation as splice().

Sorry, the distinction between "improper" and "is (not) a proper list operation as splice()" is lost on me.


| I have no trouble thinking of applications which
| need to know the size of the list.

That is probably true, but it is beside the point.
The point is not whether an application may need size() on list or not.

Actually, and this is I think very important, this is very closely related to the point I'm trying to communicate. I'm trying to make a point, so I get to define the point. ;-)


Many C++ programs that make use of list today (nearly 8 years after the standard was ratified), make use of list::size, presumably because they need that information. And when they do make use of it, the odds are good (better than even) that the code will do so in a way such that the program will run slower (sometimes much slower) with an O(N) list::size (and O(1) splice-some-from-other). This (very partial) survey includes code that uses list::splice-some-from- other!

I've had the opportunity to review a fair amount of non-open source C+ + code. But you can also get this impression just by reviewing publicly available code. Search for C++ code that uses "list", "size", and "splice", or just "list" and "splice", or just "list" and "size". One good place to do this is:

http://www.koders.com/

Examples:
---
CacheFile.cpp:

void
CacheFile::cleanupMemCache() {
  if (!m_keep_in_memory) {
    if (m_page_cache_mem.size() > CACHE_SIZE) {
      ...
    }
  }
}

---

b2dconnectedranges.hxx:

... aNewConnectedComponent.maComponentList.size() == 1 ...

---

world.cc:
...
if (l.size() == 1) delete_agent(*itt);
...

This list goes on and on. I'm also searching this code base for just "list" and "splice" and trying to find an example that favors the O (1) splice-some-from-other tradeoff. I know it's in there somewhere, but I haven't found it yet. When I do find it, I'll bet it could easily make use of the the proposed O(1) variant of this function.

These are **your** customers. By the way they are coding, they are disagreeing with you. When they port their code from VC++ to libstdc+ +, their code will slow down. Are they all wrong? Or does our std::list have a suboptimal implementation?

Please point me to at least one public existing, (non-test-case example), use case of list that benefits from the O(N) list::size implementation. Mind you I'm not claiming it doesn't exist. I'm claiming it is hard to find and that you'll run across many more use cases that go the other way during your search. And when you do find it, I'll bet it is more easily fixed than other use cases (like implementing fixed-sized caches). I'm also not claiming that all use cases of splice-some-from-other can easily supply the count. Just that most can (to form a contiguous range of list iterators, you have to iterate, and can count those iterations). All I'm asking is for interested parties to survey real world code, and report back what you find.

| What is improper is having an O(N) size.   It should either be O(1),
| or not exist.

By the same token distance() shall be either O(1) or not exist.

I could be wrong. But my impression is that most C++ programmers are aware that distance will be O(N) with non-random-access iterators. Do you have any evidence at all (a single existing public use case) that you can point me to that suggests otherwise?


| The fact that it exists and is O(N) is what is
| improper imho.  I continue to come across code that uses list::size
| in the mistaken belief that it is O(1).

The claim wasn't whether people would be mistaken -- you'd always find
people mistaken about one aspect or the other.  The issue is whether
it is a proper list operation, as splice().

The issue is what do most of our customers expect (within the bounds of the standard). Will their code run better or worse under the competition's implementation? That is the issue. I want our implementation of the std::lib to be the best. And as vendors, we don't get to judge which one is best. Our customers enjoy that honor. If making a std-allowed change has the following effect:


65% customers see no change.
25% customers see some performance gain.
5% customers see dramatic performance gain.
5% customers see performance loss.

Is this not a good thing? Especially if you can council 95% of those who lost performance on how to get it back?

I can't claim these numbers as facts. But these are my honest impressions gathered from over the past 8 years. Not one customer ever complained about the Metrowerks implementation of list::splice- some-from-other. But this issue comes up continually for gcc. Does that alone not say something?

  I claim the answer is no.
An error was made in putting it there in the first place.  I rather
sequester that error instead of spreading through over the rest of
the interface.

I'll grant you that putting list::size in may have been an error. But a far larger error was putting it in, and allowing it to be O (N). Many examples of actual code in use today (want me to point to more?) attest to the fact that an O(N) size is a subtle run time error that is frequently encountered in the wild. Sometimes this error is easily fixed (e.g. use empty()). Sometimes it is not (size () compared to some non-zero number). It is these not-easy-to-fix cases that I find most worrisome.


Conversely I find it is far rarer to run across a use case (non-test- case, in-the-wild) of splice-some-from-other where the computation of the splice-from count is a significant performance penalty in the context of the code it is in, and in most of those cases, the client could count that value and pass it in with negligible cost. For example I've seen several uses of splice-some-from-other that are always splicing from the last element of the list (I can find pointers to these cases if there is interest). These use cases always have a splice-from count of 1 which is never going to lead to a performance problem if the splice-from count is computed by splice (and the count could trivially be supplied anyway). I ran across another example (which I can not supply a pointer to) that used splice-some-from other to splice the entire list. This use could trivially be fixed by using the splice-all-from-other signature.

[...]

| This is not a simple case of people not being sufficiently educated.
| This is a very poor interface design on the part of std::list.
|
| > 3. Leave it as is.
| > 4. Be clear that size() is linear for list<>.
|
| These are unacceptable because they leave in place the bad interface.


Then penalizing other proper list operations is worst.

We have a serious performance problem today with list. It does no good to wish it away because we disagree with a decision that was made nearly a decade ago. We need to deal with the state of affairs as it exists today. We need to take some positive action to make the libstdc++ list competitive compared to the competition in the context of our customer's code. We will not be able to make every single customer happy. But I do believe this function can be maximized, and that today we are not close to that optimum.


I have repeatedly tried to back up my opinions with evidence (and spent time gathering it for your benefit) (examples, pointers to actual code, etc.). I am fully aware of your opinion. Can you please point me towards supporting evidence so that I can better understand your position?

Thanks,
Howard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]