This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [libstdc++] pdqsort - a faster std::sort


Thanks for your explanation, but it would be nice if you could
elaborate a bit more/link to examples or issues where using the
comparator directly fails. I still don't fully grasp how it would fail
or how std::less would end up not calling a < b.

If it's specializations you're afraid of when std::less is used, we
could make a private __op_less(a, b) that always performs a < b using
operator<.

However, merging the current way of doing things into pdqsort is more
than just an annoyance - I think it could harm performance and/or
correctness. Between creating all the copies of the comparison object
in wrappers, and turning iter<iter comparators into val<iter or
iter<val comparitors I don't think inlining will be fully preserved.
And if a comparator holds state (I'm not up-to-date what the standard
has to say about this), I wouldn't expect it to work at all.

In fact, when proposing pdqsort to libc++ I was asked to do the exact
opposite - take the comparator by reference in subroutines as to
prevent copies of the comparator altogether.


2015-04-08 17:34 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
> On 8 April 2015 at 13:34, Orson Peters <orsonpeters@gmail.com> wrote:
>>> This does raise one (possible) problem, which is if your code might be
>>> slower in C++03 due to copies -- to be exact, I don't care if it's
>>> slower than it is in C++11, the question is if, with large types,
>>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>>> think it will be, but perhaps another thing to check. Should be easy
>>> however!)
>>
>> Oh oh, forgive me. I was confused so my previous statements weren't 100%
>> accurate.
>>
>> In the case of swap(*a, *b), iter_swap can be used with no problems
>> whatsoever
>> in pdqsort. I still think that insertion sort should use moves, not swaps,
>> despite creating copies in C++03. The current libstdc++ implementation seems
>> to
>> agree with me (__insertion_sort), so it won't cause a performance regression
>> compared to the status quo there. I'll change the code in my repository to
>> use
>> iter_swap.
>>
>> However, moving the pivot to a local variable for speed does invoke a copy
>> in
>> C++03. A total of 3 moves happen per partition operation, so if I'm not
>> mistaken
>> this performance hit will be dwarfed by the amount of moves happening in the
>> insertion sort step. It does not make sense to remove this optimization
>> while
>> leaving insertion sort as it is.
>
> I agree that this will probably be higher performance.
>
>>
>> But that wasn't the thing that confused me during my study of libstdc++, and
>> what my previous performance remarks were about.
>>
>> No, my confusion was regarding __gnu_cxx::__ops::__iter_comp_iter and co
>> that
>> are used in libstdc++s implementation. I do not understand these comparison
>> function adaptors purposes, it only limits the power of the comparison
>> function.
>
> The reason these look like they do is... complicated :)
>
> The problem they are trying to solve is that until fairly recently,
> libstdc++ would have two copies of every method in <algorithm>, one
> which compared using < (or ==), and one which used a user-provided
> comparator.
>
> Attempts to merge these by saying something like "use
> std::equal/std::less for comparator" failed, because there was code
> which got very upset when '*i < *j' wasn't actually called, and didn't
> work through a wrapper.
>
> Therefore we abstract out calling the comparator, or <, through
> '__iter_comp_iter', as internally this does the '*i < *j' in one go,
> avoiding any weird issues. While this doesn't look nice, it had the
> advantage of not breaking any code (we are aware of) which worked when
> we had two copies of every function.
>
> From your point of view, while annoying for fitting in gcc, I believe
> you can use the same design as the existing sort (which dispatches
> using either __iter_less_iter or __iter_comp_iter(comp)).
>
> Then you can transform this 'iter_comp_iter' into a _Val_comp_iter (to
> do (value, *iter)) using __val_comp_iter, or a _Iter_comp_val' (to do
> (*iter, value) using __iter_comp_val. This is I realise reasonably
> painful, and will compare the comparators around quite a bit (although
> they already tend to get copies a fair amount anyway), but it should
> all compile away.
>
> You might also be able to use 'std::addressof' to take the address of
> value, and then do comp(std::addressof(value), iter), but I haven't
> 100% thought through that. I can't see a problem with it however.
>
> Chris
>
>
>>
>> Since I move the pivot to a local variable for speed, I am not compatible
>> with
>> these comparison adaptors. In pdqsorts code I use comp(*iter, value),
>> comp(value, *iter) and comp(*iter, *iter).
>>
>>
>>
>>
>>
>> 2015-04-08 13:59 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
>>>
>>> On 8 April 2015 at 11:12, Orson Peters <orsonpeters@gmail.com> wrote:
>>> >>  - I assume the sort works on move-only types in C++11? (I can see you
>>> >> make use of move, but do you require it?).
>>> >
>>> > If C++11 is enabled, the implementation only moves or swaps. All moves
>>> > or swaps are done using ADL-lookup unqualified swap or
>>> > PDQSORT_PREFER_MOVE which becomes std::move on C++11.
>>>
>>> Great
>>>
>>> >> There was a reason (and I can't remember what it was now!) that we
>>> >> used 'iter_swap' rather than 'swap(*a,*b)' on iterators. I think it
>>> >> might be required to allow sorting std::vector<bool>? Either way, it
>>> >> might be worth trying it.
>>> >
>>> > I noticed this when I was studying libstdc++s implementation, and I
>>> > couldn't figure out why. The implementation of pdqsort does not
>>> > currently do this, and for a good reason: it's faster to move the
>>> > pivot into a local variable for speed.
>>> >
>>> > The compiler can't prove that *begin (where the pivot is kept during
>>> > the partition operation) does not get changed during the partition
>>> > operation, and thus has to load the memory for every comparison. This
>>> > will always hit cache after the first few iterations, so it doesn't
>>> > matter that much, but it's not nothing.
>>>
>>> Actually, moving into a local variable of type
>>> iterator_traits<T>::value_type is OK. See for example
>>> __insertion_sort, it uses _GLIBCXX_MOVE, like your
>>> PDQSORT_PREFER_MOVE.
>>>
>>> The problem is if you write 'swap(*a, *b)', when
>>> iterator_traits<T>::reference_type is a proxy reference. For example,
>>> with vector<bool>, we end up instantiating swap<std::_Bit_iterator>,
>>> not swap<bool> as you might expect, so the temporary created in swap
>>> isn't a bool, it's a _Bit_iterator. Now in the case of vector<bool>,
>>> we've made sure this works, but some other non-true iterators might
>>> mess up in this sitution. The C++ standard is generally unclear on
>>> proxy iterators, but it would be nice not to break them any more than
>>> we have to. You should find (hopefully!) that replacing swap(*a,*b)
>>> with std::iter_swap(a,b) makes no difference to performance.
>>>
>>> It is true, separately, that we probably swap rather than move more
>>> than we should for C++11 code. This is because when the library is
>>> compiled in C++03 mode, then std::swap on vectors is O(1), but copy is
>>> O(n).
>>>
>>> This does raise one (possible) problem, which is if your code might be
>>> slower in C++03 due to copies -- to be exact, I don't care if it's
>>> slower than it is in C++11, the question is if, with large types,
>>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>>> think it will be, but perhaps another thing to check. Should be easy
>>> however!)
>>>
>>> Chris
>>
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]