Possible improvement to std::list::sort

David Kastrup dak@gnu.org
Wed Oct 8 06:36:00 GMT 2014


Ivo Doko <ivo.doko@gmail.com> writes:

> On 2014-10-08 05:48, David Kastrup wrote:
>> That being said, I am not a fan of special-casing pre-sorted cases in
>> generic routines.  If the case occurs often enough in practice, the
>> application should invest the extra effort of doing the check itself.
>> Mergesort "special-cases" presorted material already by requiring about
>> half the number of comparisons than usual.  Of course, that's still
>> O(n lg n) rather than O(n), but it will also have benefits for partially
>> sorted lists.
>
> Oh, just wanted to point out - I have not actually "special-cased" the
> pre-sorted cases.
>
> If you take a look at the code, you will see that the O(n) (actually
> just a single pass through the list - so, n) complexity for sorted and
> backwards-sorted lists is a simple consequence of the pre-merge pass
> which builds sorted regions out of pre-existing runs.

Natural merge sort rather than base-2.  The problem with that is that it
either leads to worse divisions in the divide-and-conquer (if you make
use of the discovered natural runs) or that you waste comparisons on
each pass (if you don't make use of the discovered natural runs).
Either affects the worst-case behavior, and merge sort degrades into
worst case behavior very easily (which is actually a feature since its
worst case is pretty good):

Merging two sorted unrelated lists of size n and m takes an average of

       1       1
n m (----- + -----)
     n + 1   m + 1

comparisons if memory serves me right.  So for n and m about equal,
that's less than 1 away from the worst case of n+m-1.

So since we are almost always talking about worst case behavior in the
mergesort case, one does not want to make it even "slightly worse".

That's different from quicksort where the worst case is much more of a
statistic anomaly.  There it pays off to make sure that degenerate input
does not trigger the worst case.

But with merge sort, you have the worst case behavior all the time, so
you don't want to make it even somewhat worse.

-- 
David Kastrup



More information about the Libstdc++ mailing list