This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: [libstdc++] pdqsort - a faster std::sort
- From: Orson Peters <orsonpeters at gmail dot com>
- To: Christopher Jefferson <chris at bubblescope dot net>
- Cc: libstdc++ at gcc dot gnu dot org
- Date: Thu, 9 Apr 2015 00:03:51 +0200
- Subject: Re: [libstdc++] pdqsort - a faster std::sort
- Authentication-results: sourceware.org; auth=none
- References: <CAJxLxMXy4gqAWfXj_p1tyyUSj4PqFp3ASzdxFtyDk2W1F_PdsQ at mail dot gmail dot com> <20150407135025 dot GF9755 at redhat dot com> <CAJxLxMVEW-+bSFyDs7sWytEMKVrhMpcQxk-9QetWem8ZtSeSxg at mail dot gmail dot com> <CAJxLxMWkj7kKteFh7YWHv5gr+Fge_2VeN6QdJ_TFqW2nUtin=w at mail dot gmail dot com> <CA+jCFLvDjSjPSF+dKDVuKWW-hsSK_LC3c=VwiFEhhx0EOyiqAA at mail dot gmail dot com> <CAJxLxMU_3vV1cozS1MnfSo5NNodW1J_1zZWuimdpCguO1cO-0w at mail dot gmail dot com> <CA+jCFLubcMg1OxspBjpcKjur6Vi6PJ=Ce_oSkA_b7OqcZR1UwQ at mail dot gmail dot com> <CAJxLxMX6CyVGxPgrZvvJ+Y62RN_86CQ8H0gFdUOHZde0UFJ2dA at mail dot gmail dot com> <CA+jCFLvUxRdmw71_H08ruDCf8QFmXcriE+G2_Cam4t5wQP-1gQ at mail dot gmail dot com> <CAJxLxMXfZ-GoTPO9uQ2Y9QyQJ_ep_+Ls4Di5q5A0B849Y32chg at mail dot gmail dot com> <CA+jCFLvtfsq6+0_r-iXAjE6JSBrxkDuaFu6FMguVhCghOdc0bw at mail dot gmail dot com>
Hmm, if you guys wish to support such arcane abuses of C++ for
backwards compatibility, I guess your solution is probably the sanest
way. I'll create a branch on my pdqsort repo to start working on the
changes needed to transform pdqsort in a state where it can be merged
into libstdc++.
Three relevant questions:
1. Should iter_swap be used like swap (using std::swap; followed by
unqualified lookup), or should it be fully qualified as
std::iter_swap?
2. Is there any particular reason _Iter_comp_iter and co store copies
of the comparator, and not a reference?
3. Are you guys/gals also found on some more informal interactive
channel than this mailing list (for example IRC)? In case I need
help with libstdc++ peculiarities, as to not spam this mailing
list with trivial questions.
On Wed, Apr 8, 2015 at 9:09 PM, Christopher Jefferson
<chris@bubblescope.net> wrote:
> Some of these things might be a bit lost to the depths of time (you
> could dig through the mailing list archives). The aim when doing the
> merge was to handle all methods consistently with minimal code
> changes. Some algorithms (like std::find for example) get used in
> funkier ways and with weirder operator==s which often only work with
> exactly the const-ness the algorithm has provided previously. It's
> nice to try not to break such code.
>
> The main problems I remember are:
>
> 1) There are some weird proxy iterators out there, for example with
> templated conversion operators, so you can never pass them to a
> function which has more than one overload. These proxy iterators
> (which at least used to exist in boost) don't work with libc++'s
> __less. As clang/libc++ took the chance to tighten up many parts of
> the compiler, it was reasonable for them to break this, but it would
> probably be better to try not to.
>
> 2) Have to be careful to never add a 'const' where we didn't have one
> before, to handle non-const operator< (once again not legal, and
> broken by libc++, but there is a lot of code out there with this).
> std::less<value_type> would do this for example.
>
> Just to give you a feeling, I just threw the following code together.
> I don't promise this shows all the problems, and I'm not saying your
> code must accept this (I should probably add things like operator+=
> for example), but I point out that this typechecks through gcc and
> shows the basic proxy iterator problems.
>
> struct X {};
>
> bool operator<(X& x, X& y);
>
> struct CrazyRef
> {
> // this turns us into an X, when required
> template<typename T>
> operator T();
> template<typename T>
> void operator=(T x);
> };
>
>
> bool operator<(CrazyRef x, X& y);
> bool operator<(X& y, CrazyRef x);
> bool operator<(CrazyRef x, CrazyRef y);
>
> struct CrazyIt
> {
> CrazyRef operator*();
> };
>
> CrazyIt operator+(CrazyIt,int);
> CrazyIt operator+(int, CrazyIt);
> int operator-(CrazyIt, CrazyIt);
> CrazyIt operator-(CrazyIt, int);
> CrazyIt operator++(CrazyIt);
> CrazyIt operator++(CrazyIt, int);
> CrazyIt operator--(CrazyIt);
> CrazyIt operator--(CrazyIt, int);
> bool operator==(CrazyIt, CrazyIt);
> bool operator!=(CrazyIt, CrazyIt);
> bool operator<(CrazyIt, CrazyIt);
>
> #include <algorithm>
>
> namespace std
> {
> template<>
> struct iterator_traits<CrazyIt>
> {
> typedef X value_type;
> typedef CrazyRef reference;
> typedef int difference_type;
> typedef random_access_iterator_tag iterator_category;
> };
> }
>
>
>
> int f()
> {
> CrazyIt i,j;
> std::sort(i,j);
> }
>
> On 8 April 2015 at 17:06, Orson Peters <orsonpeters@gmail.com> wrote:
>> Thanks for your explanation, but it would be nice if you could
>> elaborate a bit more/link to examples or issues where using the
>> comparator directly fails. I still don't fully grasp how it would fail
>> or how std::less would end up not calling a < b.
>>
>> If it's specializations you're afraid of when std::less is used, we
>> could make a private __op_less(a, b) that always performs a < b using
>> operator<.
>>
>> However, merging the current way of doing things into pdqsort is more
>> than just an annoyance - I think it could harm performance and/or
>> correctness. Between creating all the copies of the comparison object
>> in wrappers, and turning iter<iter comparators into val<iter or
>> iter<val comparitors I don't think inlining will be fully preserved.
>> And if a comparator holds state (I'm not up-to-date what the standard
>> has to say about this), I wouldn't expect it to work at all.
>>
>> In fact, when proposing pdqsort to libc++ I was asked to do the exact
>> opposite - take the comparator by reference in subroutines as to
>> prevent copies of the comparator altogether.
>>
>>
>> 2015-04-08 17:34 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
>>> On 8 April 2015 at 13:34, Orson Peters <orsonpeters@gmail.com> wrote:
>>>>> This does raise one (possible) problem, which is if your code might be
>>>>> slower in C++03 due to copies -- to be exact, I don't care if it's
>>>>> slower than it is in C++11, the question is if, with large types,
>>>>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>>>>> think it will be, but perhaps another thing to check. Should be easy
>>>>> however!)
>>>>
>>>> Oh oh, forgive me. I was confused so my previous statements weren't 100%
>>>> accurate.
>>>>
>>>> In the case of swap(*a, *b), iter_swap can be used with no problems
>>>> whatsoever
>>>> in pdqsort. I still think that insertion sort should use moves, not swaps,
>>>> despite creating copies in C++03. The current libstdc++ implementation seems
>>>> to
>>>> agree with me (__insertion_sort), so it won't cause a performance regression
>>>> compared to the status quo there. I'll change the code in my repository to
>>>> use
>>>> iter_swap.
>>>>
>>>> However, moving the pivot to a local variable for speed does invoke a copy
>>>> in
>>>> C++03. A total of 3 moves happen per partition operation, so if I'm not
>>>> mistaken
>>>> this performance hit will be dwarfed by the amount of moves happening in the
>>>> insertion sort step. It does not make sense to remove this optimization
>>>> while
>>>> leaving insertion sort as it is.
>>>
>>> I agree that this will probably be higher performance.
>>>
>>>>
>>>> But that wasn't the thing that confused me during my study of libstdc++, and
>>>> what my previous performance remarks were about.
>>>>
>>>> No, my confusion was regarding __gnu_cxx::__ops::__iter_comp_iter and co
>>>> that
>>>> are used in libstdc++s implementation. I do not understand these comparison
>>>> function adaptors purposes, it only limits the power of the comparison
>>>> function.
>>>
>>> The reason these look like they do is... complicated :)
>>>
>>> The problem they are trying to solve is that until fairly recently,
>>> libstdc++ would have two copies of every method in <algorithm>, one
>>> which compared using < (or ==), and one which used a user-provided
>>> comparator.
>>>
>>> Attempts to merge these by saying something like "use
>>> std::equal/std::less for comparator" failed, because there was code
>>> which got very upset when '*i < *j' wasn't actually called, and didn't
>>> work through a wrapper.
>>>
>>> Therefore we abstract out calling the comparator, or <, through
>>> '__iter_comp_iter', as internally this does the '*i < *j' in one go,
>>> avoiding any weird issues. While this doesn't look nice, it had the
>>> advantage of not breaking any code (we are aware of) which worked when
>>> we had two copies of every function.
>>>
>>> From your point of view, while annoying for fitting in gcc, I believe
>>> you can use the same design as the existing sort (which dispatches
>>> using either __iter_less_iter or __iter_comp_iter(comp)).
>>>
>>> Then you can transform this 'iter_comp_iter' into a _Val_comp_iter (to
>>> do (value, *iter)) using __val_comp_iter, or a _Iter_comp_val' (to do
>>> (*iter, value) using __iter_comp_val. This is I realise reasonably
>>> painful, and will compare the comparators around quite a bit (although
>>> they already tend to get copies a fair amount anyway), but it should
>>> all compile away.
>>>
>>> You might also be able to use 'std::addressof' to take the address of
>>> value, and then do comp(std::addressof(value), iter), but I haven't
>>> 100% thought through that. I can't see a problem with it however.
>>>
>>> Chris
>>>
>>>
>>>>
>>>> Since I move the pivot to a local variable for speed, I am not compatible
>>>> with
>>>> these comparison adaptors. In pdqsorts code I use comp(*iter, value),
>>>> comp(value, *iter) and comp(*iter, *iter).
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2015-04-08 13:59 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
>>>>>
>>>>> On 8 April 2015 at 11:12, Orson Peters <orsonpeters@gmail.com> wrote:
>>>>> >> - I assume the sort works on move-only types in C++11? (I can see you
>>>>> >> make use of move, but do you require it?).
>>>>> >
>>>>> > If C++11 is enabled, the implementation only moves or swaps. All moves
>>>>> > or swaps are done using ADL-lookup unqualified swap or
>>>>> > PDQSORT_PREFER_MOVE which becomes std::move on C++11.
>>>>>
>>>>> Great
>>>>>
>>>>> >> There was a reason (and I can't remember what it was now!) that we
>>>>> >> used 'iter_swap' rather than 'swap(*a,*b)' on iterators. I think it
>>>>> >> might be required to allow sorting std::vector<bool>? Either way, it
>>>>> >> might be worth trying it.
>>>>> >
>>>>> > I noticed this when I was studying libstdc++s implementation, and I
>>>>> > couldn't figure out why. The implementation of pdqsort does not
>>>>> > currently do this, and for a good reason: it's faster to move the
>>>>> > pivot into a local variable for speed.
>>>>> >
>>>>> > The compiler can't prove that *begin (where the pivot is kept during
>>>>> > the partition operation) does not get changed during the partition
>>>>> > operation, and thus has to load the memory for every comparison. This
>>>>> > will always hit cache after the first few iterations, so it doesn't
>>>>> > matter that much, but it's not nothing.
>>>>>
>>>>> Actually, moving into a local variable of type
>>>>> iterator_traits<T>::value_type is OK. See for example
>>>>> __insertion_sort, it uses _GLIBCXX_MOVE, like your
>>>>> PDQSORT_PREFER_MOVE.
>>>>>
>>>>> The problem is if you write 'swap(*a, *b)', when
>>>>> iterator_traits<T>::reference_type is a proxy reference. For example,
>>>>> with vector<bool>, we end up instantiating swap<std::_Bit_iterator>,
>>>>> not swap<bool> as you might expect, so the temporary created in swap
>>>>> isn't a bool, it's a _Bit_iterator. Now in the case of vector<bool>,
>>>>> we've made sure this works, but some other non-true iterators might
>>>>> mess up in this sitution. The C++ standard is generally unclear on
>>>>> proxy iterators, but it would be nice not to break them any more than
>>>>> we have to. You should find (hopefully!) that replacing swap(*a,*b)
>>>>> with std::iter_swap(a,b) makes no difference to performance.
>>>>>
>>>>> It is true, separately, that we probably swap rather than move more
>>>>> than we should for C++11 code. This is because when the library is
>>>>> compiled in C++03 mode, then std::swap on vectors is O(1), but copy is
>>>>> O(n).
>>>>>
>>>>> This does raise one (possible) problem, which is if your code might be
>>>>> slower in C++03 due to copies -- to be exact, I don't care if it's
>>>>> slower than it is in C++11, the question is if, with large types,
>>>>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>>>>> think it will be, but perhaps another thing to check. Should be easy
>>>>> however!)
>>>>>
>>>>> Chris
>>>>
>>>>