[C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR libstdc++/83938)

François Dumont frs.dumont@gmail.com
Sun Jun 9 14:27:00 GMT 2019


On 12/21/18 9:57 PM, Jonathan Wakely wrote:
> On 29/10/18 07:06 +0100, François Dumont wrote:
>> Hi
>>
>>     Some feedback regarding this patch ?
>
> Sorry this got missed, please resubmit during stage 1.
>
> You haven't CC'd the original patch author (chang jc) to give them a
> chance to comment on your proposed changes to the patch.
>
> The attached PDF on PR libstdc++/83938 has extensive discussion of the
> performance issue, but I don't see any for your version. Some
> performance benchmarks for your version would be helpful.

Here is this patch again.

This time it is much closer to John's original one, I just kept my 
change on the size of the temporary buffer which doesn't need to be as 
large as it is currently allocated, especially with John's patch.

The performance tests are showing the good improvements, attached are 
the before/after. Surprisingly I do not see any change regarding 
allocated memory, I guess measures are not accurate enough. However 
working on them also show that current implementation is fragile. If you 
reduce the allowed allocated memory too much you end up with a stack 
overflow because of the recursive implementation.

I already have a patch to keep on trying to allocate memory as long as 
above a given threshold rather than 0 but I am also working on making 
implementation non-recursive. We'll see if even a buffer of size 1 is 
better than no buffer at all then.

     PR libstdc++/83938
     * include/bits/stl_algo.h:
     (__merge_adaptive): When buffer too small consider smallest range first
     and cut based on buffer size.
     (__inplace_merge): Take temporary buffer length from smallest range.
     (__stable_sort): Limit temporary buffer length.
     * include/bits/stl_tempbuf.h (get_temporary_buffer): Change function
     to reduce requested buffer length on allocation failure.
     * testsuite/performance/25_algorithms/inplace_merge.cc: New.
     * testsuite/performance/25_algorithms/stable_sort.cc: Rework to allow
     execution with different level of memory.

François

-------------- next part --------------
inplace_merge.cc         	reverse                  	  19r   19u    0s        48mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  18r   18u    0s         0mem    0pf 
inplace_merge.cc         	reverse                  	  10r   10u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   8r    8u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  22r   22u    0s         0mem    0pf 
inplace_merge.cc         	bench 1/2 memory         	1891r 1884u    7s       928mem    0pf 
inplace_merge.cc         	reverse                  	  19r   18u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  18r   18u    0s         0mem    0pf 
inplace_merge.cc         	reverse                  	  10r   10u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   3r    3u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  22r   22u    0s         0mem    0pf 
inplace_merge.cc         	bench 1/4 memory         	1867r 1861u    6s       192mem    0pf 
inplace_merge.cc         	reverse                  	  17r   17u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  25r   24u    0s         0mem    0pf 
inplace_merge.cc         	reverse                  	  28r   28u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	 289r  289u    0s         0mem    0pf 
inplace_merge.cc         	bench no memory          	2155r 2149u    6s       192mem    0pf 
stable_sort.cc           	reverse                  	 240r  240u    1s         0mem    0pf 
stable_sort.cc           	forwards                 	 205r  205u    0s         0mem    0pf 
stable_sort.cc           	random                   	 493r  493u    0s         0mem    0pf 
stable_sort.cc           	bench full memory        	 945r  939u    5s       752mem    0pf 
stable_sort.cc           	reverse                  	 236r  236u    0s         0mem    0pf 
stable_sort.cc           	forwards                 	 199r  199u    0s         0mem    0pf 
stable_sort.cc           	random                   	 487r  487u    0s         0mem    0pf 
stable_sort.cc           	bench 1/2 memory         	 928r  925u    3s        96mem    0pf 
stable_sort.cc           	reverse                  	 240r  240u    0s         0mem    0pf 
stable_sort.cc           	forwards                 	 203r  203u    0s         0mem    0pf 
stable_sort.cc           	random                   	 486r  487u    0s         0mem    0pf 
stable_sort.cc           	bench 1/4 memory         	 935r  932u    5s        96mem    0pf 
stable_sort.cc           	reverse                  	 829r  829u    0s         0mem    0pf 
stable_sort.cc           	forwards                 	 143r  143u    0s         0mem    0pf 
stable_sort.cc           	random                   	 487r  486u    0s         0mem    0pf 
stable_sort.cc           	bench no memory          	1465r 1461u    2s        96mem    0pf 
-------------- next part --------------
inplace_merge.cc         	reverse                  	  19r   19u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  19r   19u    0s         0mem    0pf 
inplace_merge.cc         	reverse                  	  11r   10u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   8r    8u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  22r   22u    0s         0mem    0pf 
inplace_merge.cc         	bench 1/2 memory         	1938r 1925u   12s       928mem    0pf 
inplace_merge.cc         	reverse                  	  19r   19u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  18r   19u    0s         0mem    0pf 
inplace_merge.cc         	reverse                  	  10r   11u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   3r    2u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  22r   22u    0s         0mem    0pf 
inplace_merge.cc         	bench 1/4 memory         	1922r 1915u    7s       192mem    0pf 
inplace_merge.cc         	reverse                  	  18r   17u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	  23r   24u    0s         0mem    0pf 
inplace_merge.cc         	reverse                  	  28r   27u    0s         0mem    0pf 
inplace_merge.cc         	forwards                 	   0r    0u    0s         0mem    0pf 
inplace_merge.cc         	random                   	 288r  289u    0s         0mem    0pf 
inplace_merge.cc         	bench no memory          	2204r 2196u    9s       192mem    0pf 
stable_sort.cc           	reverse                  	 234r  234u    0s         0mem    0pf 
stable_sort.cc           	forwards                 	 201r  200u    1s         0mem    0pf 
stable_sort.cc           	random                   	 485r  484u    1s         0mem    0pf 
stable_sort.cc           	bench full memory        	 927r  920u    5s       752mem    0pf 
stable_sort.cc           	reverse                  	 233r  231u    1s         0mem    0pf 
stable_sort.cc           	forwards                 	 198r  198u    0s         0mem    0pf 
stable_sort.cc           	random                   	 478r  478u    0s         0mem    0pf 
stable_sort.cc           	bench 1/2 memory         	 915r  910u    4s        96mem    0pf 
stable_sort.cc           	reverse                  	 236r  236u    0s         0mem    0pf 
stable_sort.cc           	forwards                 	 201r  200u    0s         0mem    0pf 
stable_sort.cc           	random                   	 478r  478u    1s         0mem    0pf 
stable_sort.cc           	bench 1/4 memory         	 921r  916u    4s        96mem    0pf 
stable_sort.cc           	reverse                  	 830r  830u    0s         0mem    0pf 
stable_sort.cc           	forwards                 	 146r  146u    0s         0mem    0pf 
stable_sort.cc           	random                   	 479r  478u    0s         0mem    0pf 
stable_sort.cc           	bench no memory          	1461r 1456u    5s        96mem    0pf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pr83938.patch
Type: text/x-patch
Size: 10790 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20190609/9d0753e6/attachment.bin>


More information about the Libstdc++ mailing list