This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Limit std::sort comparisons on small list
- From: Paolo Carlini <paolo dot carlini at oracle dot com>
- To: FranÃois Dumont <frs dot dumont at gmail dot com>
- Cc: "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>, "chris at bubblescope dot net" <chris at bubblescope dot net>, Marc Glisse <marc dot glisse at inria dot fr>
- Date: Tue, 1 Oct 2013 23:35:39 +0200
- Subject: Re: Limit std::sort comparisons on small list
- Authentication-results: sourceware.org; auth=none
- References: <52409F6F dot 7040609 at gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1309232317070 dot 4088 at laptop-mg dot saclay dot inria dot fr> <alpine dot DEB dot 2 dot 10 dot 1309290031480 dot 4104 at laptop-mg dot saclay dot inria dot fr> <CA+jCFLvhxsbAu+3Ti07pL0hg7UJfDgBzzbAV5EMwvMUtaa1ZYQ at mail dot gmail dot com> <5249A299 dot 3060103 at oracle dot com> <CA+jCFLu-r8SCWhSv-h+vrcSbWXWAXEiFxVhfNTUpworkXMJsRw at mail dot gmail dot com> <524B322A dot 1090703 at gmail dot com>
Hi,
> Il giorno 01/ott/2013, alle ore 22:35, FranÃois Dumont <frs.dumont@gmail.com> ha scritto:
>
>> On 09/30/2013 11:45 PM, Christopher Jefferson wrote:
>>> On 30 September 2013 17:11, Paolo Carlini <paolo.carlini@oracle.com> wrote:
>>> Hi,
>>>
>>>
>>>> On 09/29/2013 10:19 PM, Christopher Jefferson wrote:
>>>> (we also perform 2 comparisons where one should be enough, but that's a
>>>> different issue)
>>>> That is worth looking at (although increasingly, I want to investigate
>>>> doing a fresh sort altogether. In my own code I use an implementation
>>>> of timsort, which shows much better performance where data is partly
>>>> sorted. However, that's a different patch/discussion!)
>>> It's definitely a different patch / discussion. What I'm trying to
>>> understand is how old the issue is, whether is a relatively recent
>>> regression or it's an old issue. Because the behavior seems first blush
>>> really weird, but if it dates back to the HP / SGI times, it can probably
>>> wait a bit more. Any ideas?
>> This comes from way back in the SGI days by the look of things to me.
>>
>> The problem arises in __insertion_sort. The logic looks something like:
>>
>> if (*__i < *__first)
>> { Then *__i has to go all the way to the start of the array. Do that! }
>> else
>> { We don't know where *__i will go, but we know that if we start
>> moving it back 1 square at a time, we will stop either at or before
>> __first }
>>
>> That second part is __unguarded_linear_insert.
>>
>> In the case where actually *__i has to go all the way back to just
>> after first, we do one comparison too many (we could stop one place
>> before). However, the whole point of unguarded linear insert is that
>> we are trading off one comparison to remove the need for a boundary
>> condition check. We could add a check in the while of
>> __unguarded_linear_insert of that form (__i > __first), but in many
>> cases that would end up being more expensive than the comparisons,
>> particularly because it is only in very rare cases that we expect to
>> hit this case.
>>
>> So the short version is: We do the extra value comparison on purpose,
>> and taking it out will make the algorithm quite a lot more in iterator
>> comparisions.
>>
>> We could add a special case std::sort for very small lists, for
>> example up to size 4 or 5, and use a 'sort network'. Not sure how
>> often these cases come up, and if it is worth the code bloat.
>>
>> Chris
>
> If you don't mind I changed the subject as we all agree that the original discussion has changed.
>
> I also had a look on my side and was about to propose this
Interesting, thanks. Numbers? Could you please see which numbers you get if you run the new performace test provided by Chris + variants of the subtest using pseudo-random numbers? I would say for shorter and longer lists and also averaging over a few different starting random seeds. In any case many repetitions to decresse the variance of the numbers. If/when you have something please send immediately over the tests themselves too: I know that even keeping fix the x86_64 architecture there are rather noticeable relative differences depending on the specific cpu, thus in any case I would like to run the same tests on my machines.
Paolo