This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: [libstdc++] pdqsort - a faster std::sort
- From: Christopher Jefferson <chris at bubblescope dot net>
- To: Orson Peters <orsonpeters at gmail dot com>
- Cc: "libstdc++" <libstdc++ at gcc dot gnu dot org>
- Date: Wed, 8 Apr 2015 20:09:44 +0100
- Subject: Re: [libstdc++] pdqsort - a faster std::sort
- Authentication-results: sourceware.org; auth=none
- References: <CAJxLxMXy4gqAWfXj_p1tyyUSj4PqFp3ASzdxFtyDk2W1F_PdsQ at mail dot gmail dot com> <20150407135025 dot GF9755 at redhat dot com> <CAJxLxMVEW-+bSFyDs7sWytEMKVrhMpcQxk-9QetWem8ZtSeSxg at mail dot gmail dot com> <CAJxLxMWkj7kKteFh7YWHv5gr+Fge_2VeN6QdJ_TFqW2nUtin=w at mail dot gmail dot com> <CA+jCFLvDjSjPSF+dKDVuKWW-hsSK_LC3c=VwiFEhhx0EOyiqAA at mail dot gmail dot com> <CAJxLxMU_3vV1cozS1MnfSo5NNodW1J_1zZWuimdpCguO1cO-0w at mail dot gmail dot com> <CA+jCFLubcMg1OxspBjpcKjur6Vi6PJ=Ce_oSkA_b7OqcZR1UwQ at mail dot gmail dot com> <CAJxLxMX6CyVGxPgrZvvJ+Y62RN_86CQ8H0gFdUOHZde0UFJ2dA at mail dot gmail dot com> <CA+jCFLvUxRdmw71_H08ruDCf8QFmXcriE+G2_Cam4t5wQP-1gQ at mail dot gmail dot com> <CAJxLxMXfZ-GoTPO9uQ2Y9QyQJ_ep_+Ls4Di5q5A0B849Y32chg at mail dot gmail dot com>
Some of these things might be a bit lost to the depths of time (you
could dig through the mailing list archives). The aim when doing the
merge was to handle all methods consistently with minimal code
changes. Some algorithms (like std::find for example) get used in
funkier ways and with weirder operator==s which often only work with
exactly the const-ness the algorithm has provided previously. It's
nice to try not to break such code.
The main problems I remember are:
1) There are some weird proxy iterators out there, for example with
templated conversion operators, so you can never pass them to a
function which has more than one overload. These proxy iterators
(which at least used to exist in boost) don't work with libc++'s
__less. As clang/libc++ took the chance to tighten up many parts of
the compiler, it was reasonable for them to break this, but it would
probably be better to try not to.
2) Have to be careful to never add a 'const' where we didn't have one
before, to handle non-const operator< (once again not legal, and
broken by libc++, but there is a lot of code out there with this).
std::less<value_type> would do this for example.
Just to give you a feeling, I just threw the following code together.
I don't promise this shows all the problems, and I'm not saying your
code must accept this (I should probably add things like operator+=
for example), but I point out that this typechecks through gcc and
shows the basic proxy iterator problems.
struct X {};
bool operator<(X& x, X& y);
struct CrazyRef
{
// this turns us into an X, when required
template<typename T>
operator T();
template<typename T>
void operator=(T x);
};
bool operator<(CrazyRef x, X& y);
bool operator<(X& y, CrazyRef x);
bool operator<(CrazyRef x, CrazyRef y);
struct CrazyIt
{
CrazyRef operator*();
};
CrazyIt operator+(CrazyIt,int);
CrazyIt operator+(int, CrazyIt);
int operator-(CrazyIt, CrazyIt);
CrazyIt operator-(CrazyIt, int);
CrazyIt operator++(CrazyIt);
CrazyIt operator++(CrazyIt, int);
CrazyIt operator--(CrazyIt);
CrazyIt operator--(CrazyIt, int);
bool operator==(CrazyIt, CrazyIt);
bool operator!=(CrazyIt, CrazyIt);
bool operator<(CrazyIt, CrazyIt);
#include <algorithm>
namespace std
{
template<>
struct iterator_traits<CrazyIt>
{
typedef X value_type;
typedef CrazyRef reference;
typedef int difference_type;
typedef random_access_iterator_tag iterator_category;
};
}
int f()
{
CrazyIt i,j;
std::sort(i,j);
}
On 8 April 2015 at 17:06, Orson Peters <orsonpeters@gmail.com> wrote:
> Thanks for your explanation, but it would be nice if you could
> elaborate a bit more/link to examples or issues where using the
> comparator directly fails. I still don't fully grasp how it would fail
> or how std::less would end up not calling a < b.
>
> If it's specializations you're afraid of when std::less is used, we
> could make a private __op_less(a, b) that always performs a < b using
> operator<.
>
> However, merging the current way of doing things into pdqsort is more
> than just an annoyance - I think it could harm performance and/or
> correctness. Between creating all the copies of the comparison object
> in wrappers, and turning iter<iter comparators into val<iter or
> iter<val comparitors I don't think inlining will be fully preserved.
> And if a comparator holds state (I'm not up-to-date what the standard
> has to say about this), I wouldn't expect it to work at all.
>
> In fact, when proposing pdqsort to libc++ I was asked to do the exact
> opposite - take the comparator by reference in subroutines as to
> prevent copies of the comparator altogether.
>
>
> 2015-04-08 17:34 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
>> On 8 April 2015 at 13:34, Orson Peters <orsonpeters@gmail.com> wrote:
>>>> This does raise one (possible) problem, which is if your code might be
>>>> slower in C++03 due to copies -- to be exact, I don't care if it's
>>>> slower than it is in C++11, the question is if, with large types,
>>>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>>>> think it will be, but perhaps another thing to check. Should be easy
>>>> however!)
>>>
>>> Oh oh, forgive me. I was confused so my previous statements weren't 100%
>>> accurate.
>>>
>>> In the case of swap(*a, *b), iter_swap can be used with no problems
>>> whatsoever
>>> in pdqsort. I still think that insertion sort should use moves, not swaps,
>>> despite creating copies in C++03. The current libstdc++ implementation seems
>>> to
>>> agree with me (__insertion_sort), so it won't cause a performance regression
>>> compared to the status quo there. I'll change the code in my repository to
>>> use
>>> iter_swap.
>>>
>>> However, moving the pivot to a local variable for speed does invoke a copy
>>> in
>>> C++03. A total of 3 moves happen per partition operation, so if I'm not
>>> mistaken
>>> this performance hit will be dwarfed by the amount of moves happening in the
>>> insertion sort step. It does not make sense to remove this optimization
>>> while
>>> leaving insertion sort as it is.
>>
>> I agree that this will probably be higher performance.
>>
>>>
>>> But that wasn't the thing that confused me during my study of libstdc++, and
>>> what my previous performance remarks were about.
>>>
>>> No, my confusion was regarding __gnu_cxx::__ops::__iter_comp_iter and co
>>> that
>>> are used in libstdc++s implementation. I do not understand these comparison
>>> function adaptors purposes, it only limits the power of the comparison
>>> function.
>>
>> The reason these look like they do is... complicated :)
>>
>> The problem they are trying to solve is that until fairly recently,
>> libstdc++ would have two copies of every method in <algorithm>, one
>> which compared using < (or ==), and one which used a user-provided
>> comparator.
>>
>> Attempts to merge these by saying something like "use
>> std::equal/std::less for comparator" failed, because there was code
>> which got very upset when '*i < *j' wasn't actually called, and didn't
>> work through a wrapper.
>>
>> Therefore we abstract out calling the comparator, or <, through
>> '__iter_comp_iter', as internally this does the '*i < *j' in one go,
>> avoiding any weird issues. While this doesn't look nice, it had the
>> advantage of not breaking any code (we are aware of) which worked when
>> we had two copies of every function.
>>
>> From your point of view, while annoying for fitting in gcc, I believe
>> you can use the same design as the existing sort (which dispatches
>> using either __iter_less_iter or __iter_comp_iter(comp)).
>>
>> Then you can transform this 'iter_comp_iter' into a _Val_comp_iter (to
>> do (value, *iter)) using __val_comp_iter, or a _Iter_comp_val' (to do
>> (*iter, value) using __iter_comp_val. This is I realise reasonably
>> painful, and will compare the comparators around quite a bit (although
>> they already tend to get copies a fair amount anyway), but it should
>> all compile away.
>>
>> You might also be able to use 'std::addressof' to take the address of
>> value, and then do comp(std::addressof(value), iter), but I haven't
>> 100% thought through that. I can't see a problem with it however.
>>
>> Chris
>>
>>
>>>
>>> Since I move the pivot to a local variable for speed, I am not compatible
>>> with
>>> these comparison adaptors. In pdqsorts code I use comp(*iter, value),
>>> comp(value, *iter) and comp(*iter, *iter).
>>>
>>>
>>>
>>>
>>>
>>> 2015-04-08 13:59 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
>>>>
>>>> On 8 April 2015 at 11:12, Orson Peters <orsonpeters@gmail.com> wrote:
>>>> >> - I assume the sort works on move-only types in C++11? (I can see you
>>>> >> make use of move, but do you require it?).
>>>> >
>>>> > If C++11 is enabled, the implementation only moves or swaps. All moves
>>>> > or swaps are done using ADL-lookup unqualified swap or
>>>> > PDQSORT_PREFER_MOVE which becomes std::move on C++11.
>>>>
>>>> Great
>>>>
>>>> >> There was a reason (and I can't remember what it was now!) that we
>>>> >> used 'iter_swap' rather than 'swap(*a,*b)' on iterators. I think it
>>>> >> might be required to allow sorting std::vector<bool>? Either way, it
>>>> >> might be worth trying it.
>>>> >
>>>> > I noticed this when I was studying libstdc++s implementation, and I
>>>> > couldn't figure out why. The implementation of pdqsort does not
>>>> > currently do this, and for a good reason: it's faster to move the
>>>> > pivot into a local variable for speed.
>>>> >
>>>> > The compiler can't prove that *begin (where the pivot is kept during
>>>> > the partition operation) does not get changed during the partition
>>>> > operation, and thus has to load the memory for every comparison. This
>>>> > will always hit cache after the first few iterations, so it doesn't
>>>> > matter that much, but it's not nothing.
>>>>
>>>> Actually, moving into a local variable of type
>>>> iterator_traits<T>::value_type is OK. See for example
>>>> __insertion_sort, it uses _GLIBCXX_MOVE, like your
>>>> PDQSORT_PREFER_MOVE.
>>>>
>>>> The problem is if you write 'swap(*a, *b)', when
>>>> iterator_traits<T>::reference_type is a proxy reference. For example,
>>>> with vector<bool>, we end up instantiating swap<std::_Bit_iterator>,
>>>> not swap<bool> as you might expect, so the temporary created in swap
>>>> isn't a bool, it's a _Bit_iterator. Now in the case of vector<bool>,
>>>> we've made sure this works, but some other non-true iterators might
>>>> mess up in this sitution. The C++ standard is generally unclear on
>>>> proxy iterators, but it would be nice not to break them any more than
>>>> we have to. You should find (hopefully!) that replacing swap(*a,*b)
>>>> with std::iter_swap(a,b) makes no difference to performance.
>>>>
>>>> It is true, separately, that we probably swap rather than move more
>>>> than we should for C++11 code. This is because when the library is
>>>> compiled in C++03 mode, then std::swap on vectors is O(1), but copy is
>>>> O(n).
>>>>
>>>> This does raise one (possible) problem, which is if your code might be
>>>> slower in C++03 due to copies -- to be exact, I don't care if it's
>>>> slower than it is in C++11, the question is if, with large types,
>>>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>>>> think it will be, but perhaps another thing to check. Should be easy
>>>> however!)
>>>>
>>>> Chris
>>>
>>>