This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [libstdc++] pdqsort - a faster std::sort


On 8 April 2015 at 13:34, Orson Peters <orsonpeters@gmail.com> wrote:
>> This does raise one (possible) problem, which is if your code might be
>> slower in C++03 due to copies -- to be exact, I don't care if it's
>> slower than it is in C++11, the question is if, with large types,
>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>> think it will be, but perhaps another thing to check. Should be easy
>> however!)
>
> Oh oh, forgive me. I was confused so my previous statements weren't 100%
> accurate.
>
> In the case of swap(*a, *b), iter_swap can be used with no problems
> whatsoever
> in pdqsort. I still think that insertion sort should use moves, not swaps,
> despite creating copies in C++03. The current libstdc++ implementation seems
> to
> agree with me (__insertion_sort), so it won't cause a performance regression
> compared to the status quo there. I'll change the code in my repository to
> use
> iter_swap.
>
> However, moving the pivot to a local variable for speed does invoke a copy
> in
> C++03. A total of 3 moves happen per partition operation, so if I'm not
> mistaken
> this performance hit will be dwarfed by the amount of moves happening in the
> insertion sort step. It does not make sense to remove this optimization
> while
> leaving insertion sort as it is.

I agree that this will probably be higher performance.

>
> But that wasn't the thing that confused me during my study of libstdc++, and
> what my previous performance remarks were about.
>
> No, my confusion was regarding __gnu_cxx::__ops::__iter_comp_iter and co
> that
> are used in libstdc++s implementation. I do not understand these comparison
> function adaptors purposes, it only limits the power of the comparison
> function.

The reason these look like they do is... complicated :)

The problem they are trying to solve is that until fairly recently,
libstdc++ would have two copies of every method in <algorithm>, one
which compared using < (or ==), and one which used a user-provided
comparator.

Attempts to merge these by saying something like "use
std::equal/std::less for comparator" failed, because there was code
which got very upset when '*i < *j' wasn't actually called, and didn't
work through a wrapper.

Therefore we abstract out calling the comparator, or <, through
'__iter_comp_iter', as internally this does the '*i < *j' in one go,
avoiding any weird issues. While this doesn't look nice, it had the
advantage of not breaking any code (we are aware of) which worked when
we had two copies of every function.

>From your point of view, while annoying for fitting in gcc, I believe
you can use the same design as the existing sort (which dispatches
using either __iter_less_iter or __iter_comp_iter(comp)).

Then you can transform this 'iter_comp_iter' into a _Val_comp_iter (to
do (value, *iter)) using __val_comp_iter, or a _Iter_comp_val' (to do
(*iter, value) using __iter_comp_val. This is I realise reasonably
painful, and will compare the comparators around quite a bit (although
they already tend to get copies a fair amount anyway), but it should
all compile away.

You might also be able to use 'std::addressof' to take the address of
value, and then do comp(std::addressof(value), iter), but I haven't
100% thought through that. I can't see a problem with it however.

Chris


>
> Since I move the pivot to a local variable for speed, I am not compatible
> with
> these comparison adaptors. In pdqsorts code I use comp(*iter, value),
> comp(value, *iter) and comp(*iter, *iter).
>
>
>
>
>
> 2015-04-08 13:59 GMT+02:00 Christopher Jefferson <chris@bubblescope.net>:
>>
>> On 8 April 2015 at 11:12, Orson Peters <orsonpeters@gmail.com> wrote:
>> >>  - I assume the sort works on move-only types in C++11? (I can see you
>> >> make use of move, but do you require it?).
>> >
>> > If C++11 is enabled, the implementation only moves or swaps. All moves
>> > or swaps are done using ADL-lookup unqualified swap or
>> > PDQSORT_PREFER_MOVE which becomes std::move on C++11.
>>
>> Great
>>
>> >> There was a reason (and I can't remember what it was now!) that we
>> >> used 'iter_swap' rather than 'swap(*a,*b)' on iterators. I think it
>> >> might be required to allow sorting std::vector<bool>? Either way, it
>> >> might be worth trying it.
>> >
>> > I noticed this when I was studying libstdc++s implementation, and I
>> > couldn't figure out why. The implementation of pdqsort does not
>> > currently do this, and for a good reason: it's faster to move the
>> > pivot into a local variable for speed.
>> >
>> > The compiler can't prove that *begin (where the pivot is kept during
>> > the partition operation) does not get changed during the partition
>> > operation, and thus has to load the memory for every comparison. This
>> > will always hit cache after the first few iterations, so it doesn't
>> > matter that much, but it's not nothing.
>>
>> Actually, moving into a local variable of type
>> iterator_traits<T>::value_type is OK. See for example
>> __insertion_sort, it uses _GLIBCXX_MOVE, like your
>> PDQSORT_PREFER_MOVE.
>>
>> The problem is if you write 'swap(*a, *b)', when
>> iterator_traits<T>::reference_type is a proxy reference. For example,
>> with vector<bool>, we end up instantiating swap<std::_Bit_iterator>,
>> not swap<bool> as you might expect, so the temporary created in swap
>> isn't a bool, it's a _Bit_iterator. Now in the case of vector<bool>,
>> we've made sure this works, but some other non-true iterators might
>> mess up in this sitution. The C++ standard is generally unclear on
>> proxy iterators, but it would be nice not to break them any more than
>> we have to. You should find (hopefully!) that replacing swap(*a,*b)
>> with std::iter_swap(a,b) makes no difference to performance.
>>
>> It is true, separately, that we probably swap rather than move more
>> than we should for C++11 code. This is because when the library is
>> compiled in C++03 mode, then std::swap on vectors is O(1), but copy is
>> O(n).
>>
>> This does raise one (possible) problem, which is if your code might be
>> slower in C++03 due to copies -- to be exact, I don't care if it's
>> slower than it is in C++11, the question is if, with large types,
>> pdqsort is slower than std::sort when both are compiled C++03 (I don't
>> think it will be, but perhaps another thing to check. Should be easy
>> however!)
>>
>> Chris
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]