[C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR libstdc++/83938)
François Dumont
frs.dumont@gmail.com
Tue Aug 21 20:34:00 GMT 2018
I missed a test that was failing because of this patch. So I revert a
small part of it and here is the new proposal.
Tested under Linux x86_64, ok to commit ?
François
On 24/07/2018 12:22, François Dumont wrote:
> Hi
>
> Â Â Â Any chance to review this patch ?
>
> François
>
>
> On 06/06/2018 18:39, François Dumont wrote:
>> Hi
>>
>> Â Â Â I review and rework this proposal. I noticed that the same idea
>> to limit buffer size within inplace_merge also apply to stable_sort.
>>
>> Â Â Â I also change the decision when buffer is too small to consider
>> the buffer size rather than going through successive cuts of the
>> original ranges. This way we won't cut the range more than necessary.
>> Note that if you prefer this second part could be committed seperatly.
>>
>> Â Â Â PR libstdc++/83938
>> Â Â Â * include/bits/stl_algo.h:
>> Â Â Â (__stable_partition_adaptive): When buffer to small cut range using
>> Â Â Â buffer size.
>> Â Â Â (__inplace_merge): Take temporary buffer length from smallest range.
>> Â Â Â (__merge_adaptive): When buffer too small consider smallest range
>> first
>> Â Â Â and cut based on buffer size.
>> Â Â Â (__stable_sort_adaptive): When buffer too small cut based on buffer
>> Â Â Â size.
>> Â Â Â (__stable_sort): Limit temporary buffer length.
>> Â Â Â * include/bits/stl_tempbuf.h (get_temporary_buffer): Change function
>> Â Â Â to reduce requested buffer length on allocation failure.
>>
>> Tested under Linux x86_64.
>>
>> Ok to commit ?
>>
>> François
>>
>>
>> On 25/01/2018 23:37, chang jc wrote:
>>> Hi:
>>>
>>> 1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
>>> __len = (__len == 1) ? 0 : ((__len + 1) / 2);
>>>
>>> 2. The coding gain has been shown PR c++/83938; I re-post here
>>>
>>>
>>>
>>>
>>> Â Â 21
>>> Â Â 20
>>> Â Â 19
>>> Â Â 18
>>> Â Â 17
>>> Â Â 16
>>>
>>>
>>> Â Â 0.471136
>>> Â Â 0.625695
>>> Â Â 0.767262
>>> Â Â 0.907461
>>> Â Â 1.04838
>>> Â Â 1.19508
>>>
>>>
>>> Â Â 0.340845
>>> Â Â 0.48651
>>> Â Â 0.639139
>>> Â Â 0.770133
>>> Â Â 0.898454
>>> Â Â 1.04632
>>>
>>> it means when Merge [0, 4325376, 16777216); A is a sorted integer with
>>> 4325376 & B with 12451840 elements, total with 16M integers
>>>
>>> The proposed method has the speed up under given buffer size =, ex
>>> 2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
>>> given sizof(int)*64K bytes.
>>>
>>> 3. As your suggestion, _TmpBuf __buf should be rewrite.
>>>
>>> 4. It represents a fact that the intuitive idea to split from larger
>>> part is wrong.
>>>
>>> For example, if you have an input sorted array A & B, A has 8 integers
>>> & B has 24 integers. Given tmp buffer whose capacity = 4 integers.
>>>
>>> Current it tries to split from B, right?
>>>
>>> Then we have:
>>>
>>> A1 | A2Â B1 | B2
>>>
>>> B1 & B2 has 12 integers each, right?
>>>
>>> Current algorithm selects pivot as 13th integer from B, right?
>>>
>>> If the corresponding upper bound of A is 6th integer.
>>>
>>> Then it split in
>>>
>>> A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12
>>>
>>> After rotate, we have two arrays to merge
>>>
>>> [A1 = 5 | B1 = 12]Â & [A2 = 3 | B2 = 12]
>>>
>>> Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.
>>>
>>> Sadly, [A1 = 5 | B1 = 12] CANNOT.
>>>
>>> So we do rotate again, split & merge the two split arrays from [A1 = 5
>>> | B1 = 12] again.
>>>
>>>
>>> But wait, if we always split from the smaller one instead of larger
>>> one.
>>>
>>> After rotate, it promises two split arrays both contain
>>> ceiling[small/2].
>>>
>>> Since tmp buffer also allocate by starting from sizeof(small) &
>>> recursively downgrade by ceiling[small/2^(# of fail allocate)].
>>>
>>> It means the allocated tmp buffer promises to be sufficient at the
>>> level of (# of fail allocate).
>>>
>>> Instead, you can see if split from large at level (# of fail allocate)
>>> several split array still CANNOT use tmp buffer to do buffered merge.
>>>
>>>
>>> As you know, buffered merge is far speed then (split, rotate, and
>>> merge two sub-arrays) (PR c++/83938 gives the profiling figures),
>>>
>>> the way should provide speedup.
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 24/01/2018 18:23, François Dumont wrote:
>>>
>>> Hi
>>>
>>>
>>> Â Â Â Â It sounds like a very sensitive change to make but nothing
>>> worth figures.
>>> Do you have any bench showing the importance of the gain ?
>>>
>>> Â Â Â Â At least the memory usage optimization is obvious.
>>>
>>> On 19/01/2018 10:43, chang jc wrote:
>>>
>>> Current std::inplace_merge() suffers from performance issue by
>>> inefficient
>>>
>>> logic under limited memory,
>>>
>>> It leads to performance downgrade.
>>>
>>> Please help to review it.
>>>
>>> Index: include/bits/stl_algo.h
>>> ===================================================================
>>> --- include/bits/stl_algo.h   (revision 256871)
>>> +++ include/bits/stl_algo.h   (working copy)
>>> @@ -2437,7 +2437,7 @@
>>> Â Â Â Â Â Â Â Â _BidirectionalIterator __second_cut = __middle;
>>> Â Â Â Â Â Â Â Â _Distance __len11 = 0;
>>> Â Â Â Â Â Â Â Â _Distance __len22 = 0;
>>> -Â Â Â Â Â if (__len1 > __len2)
>>> +Â Â Â Â Â if (__len1 < __len2)
>>> Â Â Â Â Â Â Â Â Â Â {
>>> Â Â Â Â Â Â Â Â Â Â Â Â __len11 = __len1 / 2;
>>> Â Â Â Â Â Â Â Â Â Â Â Â std::advance(__first_cut, __len11);
>>> @@ -2539,9 +2539,15 @@
>>> Â Â Â Â Â Â Â Â const _DistanceType __len1 = std::distance(__first, __middle);
>>> Â Â Â Â Â Â Â Â const _DistanceType __len2 = std::distance(__middle, __last);
>>>
>>> +
>>>
>>> Â Â Â Â Â Â Â Â typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
>>> _TmpBuf;
>>>
>>> -Â Â Â Â Â _TmpBuf __buf(__first, __last);
>>> -
>>> +Â Â Â Â Â _BidirectionalIterator __start, __end;
>>> +Â Â Â Â Â if (__len1 < __len2) {
>>> +Â Â Â __start = __first; __end = __middle;
>>> +Â Â Â Â Â } else {
>>> +Â Â Â __start = __middle; __end = __last;
>>> +Â Â Â Â Â }
>>> +Â Â Â Â Â _TmpBuf __buf(__start, ___end);
>>>
>>> Note another optimization, we could introduce a _Temporary_buffer<>
>>> constructor
>>> in order to write:
>>>
>>> _TmpBuf __buf(std::min(__len1, __len2), __first);
>>>
>>> So that std::distance do not need to be called again.
>>>
>>> Â Â Â Â Â Â Â Â if (__buf.begin() == 0)
>>> Â Â Â Â Â Â std::__merge_without_buffer
>>> Â Â Â Â Â Â Â Â (__first, __middle, __last, __len1, __len2, __comp);
>>> Index: include/bits/stl_tempbuf.h
>>> ===================================================================
>>> --- include/bits/stl_tempbuf.h   (revision 256871)
>>> +++ include/bits/stl_tempbuf.h   (working copy)
>>> @@ -95,7 +95,7 @@
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â std::nothrow));
>>> Â Â Â Â Â Â Â Â if (__tmp != 0)
>>> Â Â Â Â Â Â Â Â Â Â return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
>>> -Â Â Â Â Â __len /= 2;
>>> +Â Â Â Â Â __len = (__len + 1) / 2;
>>>
>>> This part is problematic, if __len is 1 and allocation fails it will
>>> loop
>>> forever.
>>>
>>> It doesn't seem really necessary for your patch.
>>>
>>>
>>> 2018-01-20 4:05 GMT+08:00 Jason Merrill<jason@redhat.com>:
>>>
>>>> This is a libstdc++ bug and patch, not the C++ front end. So I'm
>>>> adding the libstdc++ list to CC.
>>>>
>>>> On Fri, Jan 19, 2018 at 3:02 AM, chang jc<r97922153@gmail.com>Â wrote:
>>>>> Current std::inplace_merge() suffers from performance issue by
>>>> inefficient
>>>>> logic under limited memory,
>>>>>
>>>>> It leads to performance downgrade.
>>>>>
>>>>> Please help to review it.
>>>>>
>>>>> Index: include/bits/stl_algo.h
>>>>> ===================================================================
>>>>> --- include/bits/stl_algo.h    (revision 256871)
>>>>> +++ include/bits/stl_algo.h    (working copy)
>>>>> @@ -2437,7 +2437,7 @@
>>>>> Â Â Â Â Â Â Â Â Â Â _BidirectionalIterator __second_cut = __middle;
>>>>> Â Â Â Â Â Â Â Â Â Â _Distance __len11 = 0;
>>>>> Â Â Â Â Â Â Â Â Â Â _Distance __len22 = 0;
>>>>> -Â Â Â Â Â Â Â Â if (__len1 > __len2)
>>>>> +Â Â Â Â Â Â Â Â if (__len1 < __len2)
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â {
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â __len11 = __len1 / 2;
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â std::advance(__first_cut, __len11);
>>>>> @@ -2539,9 +2539,15 @@
>>>>> Â Â Â Â Â Â Â const _DistanceType __len1 = std::distance(__first,
>>>>> __middle);
>>>>> Â Â Â Â Â Â Â const _DistanceType __len2 = std::distance(__middle, __last);
>>>>>
>>>>> +
>>>>> Â Â Â Â Â Â Â typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
>>>> _TmpBuf;
>>>>> -Â Â Â Â Â _TmpBuf __buf(__first, __last);
>>>>> -
>>>>> +Â Â Â Â Â _BidirectionalIterator __start, __end;
>>>>> +Â Â Â Â Â if (__len1 < __len2) {
>>>>> +Â Â Â Â Â Â __start = __first; __end = __middle;
>>>>> +Â Â Â Â Â } else {
>>>>> +Â Â Â Â Â Â __start = __middle; __end = __last;
>>>>> +Â Â Â Â Â }
>>>>> +Â Â Â Â Â _TmpBuf __buf(__start, ___end);
>>>>> Â Â Â Â Â Â Â if (__buf.begin() == 0)
>>>>> Â Â Â Â Â Â Â Â std::__merge_without_buffer
>>>>> Â Â Â Â Â Â Â Â Â Â (__first, __middle, __last, __len1, __len2, __comp);
>>>>> Index: include/bits/stl_tempbuf.h
>>>>> ===================================================================
>>>>> --- include/bits/stl_tempbuf.h (revision 256871)
>>>>> +++ include/bits/stl_tempbuf.h (working copy)
>>>>> @@ -95,7 +95,7 @@
>>>>> std::nothrow));
>>>>> Â Â Â Â Â Â Â Â Â Â if (__tmp != 0)
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
>>>>> -Â Â Â Â Â Â Â Â __len /= 2;
>>>>> +Â Â Â Â Â Â Â Â __len = (__len + 1) / 2;
>>>>> Â Â Â Â Â Â Â Â }
>>>>> Â Â Â Â Â Â Â return std::pair<_Tp*, ptrdiff_t>(static_cast<_Tp*>(0), 0);
>>>>> Â Â Â Â Â }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pr83938.patch
Type: text/x-patch
Size: 3921 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20180821/c0900219/attachment.bin>
More information about the Libstdc++
mailing list