This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR libstdc++/83938)


On 29/10/18 07:06 +0100, François Dumont wrote:
Hi

    Some feedback regarding this patch ?

Sorry this got missed, please resubmit during stage 1.

You haven't CC'd the original patch author (chang jc) to give them a
chance to comment on your proposed changes to the patch.

The attached PDF on PR libstdc++/83938 has extensive discussion of the
performance issue, but I don't see any for your version. Some
performance benchmarks for your version would be helpful.




Thanks,
François

On 8/21/18 10:34 PM, François Dumont wrote:
I missed a test that was failing because of this patch. So I revert a small part of it and here is the new proposal.

Tested under Linux x86_64, ok to commit ?

François


On 24/07/2018 12:22, François Dumont wrote:
Hi

    Any chance to review this patch ?

François


On 06/06/2018 18:39, François Dumont wrote:
Hi

    I review and rework this proposal. I noticed that the same idea to limit buffer size within inplace_merge also apply to stable_sort.

    I also change the decision when buffer is too small to consider the buffer size rather than going through successive cuts of the original ranges. This way we won't cut the range more than necessary. Note that if you prefer this second part could be committed seperatly.

    PR libstdc++/83938
    * include/bits/stl_algo.h:
    (__stable_partition_adaptive): When buffer to small cut range using
    buffer size.
    (__inplace_merge): Take temporary buffer length from smallest range.     (__merge_adaptive): When buffer too small consider smallest range first
    and cut based on buffer size.
    (__stable_sort_adaptive): When buffer too small cut based on buffer
    size.
    (__stable_sort): Limit temporary buffer length.
    * include/bits/stl_tempbuf.h (get_temporary_buffer): Change function
    to reduce requested buffer length on allocation failure.

Tested under Linux x86_64.

Ok to commit ?

François


On 25/01/2018 23:37, chang jc wrote:
Hi:

1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
__len = (__len == 1) ? 0 : ((__len + 1) / 2);

2. The coding gain has been shown  PR c++/83938; I re-post here




   21
   20
   19
   18
   17
   16


   0.471136
   0.625695
   0.767262
   0.907461
   1.04838
   1.19508


   0.340845
   0.48651
   0.639139
   0.770133
   0.898454
   1.04632

it means when Merge [0, 4325376, 16777216); A is a sorted integer with
4325376 & B with 12451840 elements, total with 16M integers

The proposed method has the speed up under given buffer size =, ex
2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
given sizof(int)*64K bytes.

3. As your suggestion, _TmpBuf __buf should be rewrite.

4. It represents a fact that the intuitive idea to split from larger
part is wrong.

For example, if you have an input sorted array A & B, A has 8 integers
& B has 24 integers. Given tmp buffer whose capacity = 4 integers.

Current it tries to split from B, right?

Then we have:

A1 | A2  B1 | B2

B1 & B2 has 12 integers each, right?

Current algorithm selects pivot as 13th integer from B, right?

If the corresponding upper bound of A is 6th integer.

Then it split in

A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12

After rotate, we have two arrays to merge

[A1 = 5 | B1 = 12]  & [A2 = 3 | B2 = 12]

Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.

Sadly, [A1 = 5 | B1 = 12] CANNOT.

So we do rotate again, split & merge the two split arrays from [A1 = 5
| B1 = 12] again.


But wait, if we always split from the smaller one instead of larger one.

After rotate, it promises two split arrays both contain ceiling[small/2].

Since tmp buffer also allocate by starting from sizeof(small) &
recursively downgrade by ceiling[small/2^(# of fail allocate)].

It means the allocated tmp buffer promises to be sufficient at the
level of (# of fail allocate).

Instead, you can see if split from large at level (# of fail allocate)
several split array still CANNOT use  tmp buffer to do buffered merge.


As you know, buffered merge is far speed then (split, rotate, and
merge two sub-arrays) (PR c++/83938 gives the profiling figures),

the way should provide speedup.


Thanks.










On 24/01/2018 18:23, François Dumont wrote:

Hi


     It sounds like a very sensitive change to make but nothing worth figures.
Do you have any bench showing the importance of the gain ?

     At least the memory usage optimization is obvious.

On 19/01/2018 10:43, chang jc wrote:

Current std::inplace_merge() suffers from performance issue by inefficient

logic under limited memory,

It leads to performance downgrade.

Please help to review it.

Index: include/bits/stl_algo.h
===================================================================
--- include/bits/stl_algo.h    (revision 256871)
+++ include/bits/stl_algo.h    (working copy)
@@ -2437,7 +2437,7 @@
         _BidirectionalIterator __second_cut = __middle;
         _Distance __len11 = 0;
         _Distance __len22 = 0;
-      if (__len1 > __len2)
+      if (__len1 < __len2)
           {
             __len11 = __len1 / 2;
             std::advance(__first_cut, __len11);
@@ -2539,9 +2539,15 @@
         const _DistanceType __len1 = std::distance(__first, __middle);
         const _DistanceType __len2 = std::distance(__middle, __last);

+

         typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
_TmpBuf;

-      _TmpBuf __buf(__first, __last);
-
+      _BidirectionalIterator __start, __end;
+      if (__len1 < __len2) {
+    __start = __first; __end = __middle;
+      } else {
+    __start = __middle; __end = __last;
+      }
+      _TmpBuf __buf(__start, ___end);

Note another optimization, we could introduce a _Temporary_buffer<> constructor
in order to write:

_TmpBuf __buf(std::min(__len1, __len2), __first);

So that std::distance do not need to be called again.

         if (__buf.begin() == 0)
       std::__merge_without_buffer
         (__first, __middle, __last, __len1, __len2, __comp);
Index: include/bits/stl_tempbuf.h
===================================================================
--- include/bits/stl_tempbuf.h    (revision 256871)
+++ include/bits/stl_tempbuf.h    (working copy)
@@ -95,7 +95,7 @@
                               std::nothrow));
         if (__tmp != 0)
           return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
-      __len /= 2;
+      __len = (__len + 1) / 2;

This part is problematic, if __len is 1 and allocation fails it will loop
forever.

It doesn't seem really necessary for your patch.


2018-01-20 4:05 GMT+08:00 Jason Merrill<jason@redhat.com>:

This is a libstdc++ bug and patch, not the C++ front end.  So I'm
adding the libstdc++ list to CC.

On Fri, Jan 19, 2018 at 3:02 AM, chang jc<r97922153@gmail.com>  wrote:
Current std::inplace_merge() suffers from performance issue by
inefficient
logic under limited memory,

It leads to performance downgrade.

Please help to review it.

Index: include/bits/stl_algo.h
===================================================================
--- include/bits/stl_algo.h     (revision 256871)
+++ include/bits/stl_algo.h     (working copy)
@@ -2437,7 +2437,7 @@
           _BidirectionalIterator __second_cut = __middle;
           _Distance __len11 = 0;
           _Distance __len22 = 0;
-         if (__len1 > __len2)
+         if (__len1 < __len2)
             {
               __len11 = __len1 / 2;
               std::advance(__first_cut, __len11);
@@ -2539,9 +2539,15 @@
        const _DistanceType __len1 = std::distance(__first, __middle);         const _DistanceType __len2 = std::distance(__middle, __last);

+
        typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
_TmpBuf;
-      _TmpBuf __buf(__first, __last);
-
+      _BidirectionalIterator __start, __end;
+      if (__len1 < __len2) {
+       __start = __first; __end = __middle;
+      } else {
+       __start = __middle; __end = __last;
+      }
+      _TmpBuf __buf(__start, ___end);
        if (__buf.begin() == 0)
         std::__merge_without_buffer
           (__first, __middle, __last, __len1, __len2, __comp);
Index: include/bits/stl_tempbuf.h
===================================================================
--- include/bits/stl_tempbuf.h  (revision 256871)
+++ include/bits/stl_tempbuf.h  (working copy)
@@ -95,7 +95,7 @@
std::nothrow));
           if (__tmp != 0)
             return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
-         __len /= 2;
+         __len = (__len + 1) / 2;
         }
        return std::pair<_Tp*, ptrdiff_t>(static_cast<_Tp*>(0), 0);
      }




Thanks







Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]