libstdc++/41975

François Dumont frs.dumont@gmail.com
Thu Oct 13 20:07:00 GMT 2011


On 10/12/2011 10:42 PM, Paolo Carlini wrote:
> Hi,
>> Here is an other step forward to fix this issue. In this step nodes 
>> are now doubly linked to each other. It avoids some loops on buckets 
>> but performance are still not ok. I have created a performance test, 
>> see attached 41975.cc.
> I think we can as well commit it. I can do it.
     You are right, to see the evolution the performance test should be 
pushed first before we start changing hashtable data structure.
>> Before the patch the result was:
>>
>> 41975.cc                        Container generation               
>> 2r    1u    0s  8452192mem    0pf
>> 41975.cc                        Container erase                   
>> 21r   21u    0s -6400096mem    0pf
>> 41975.cc                        Container iteration               
>> 10r   10u    0s       32mem    0pf
>>
>> With the attached patch it is:
>>
>> 41975.cc                        Container generation               
>> 3r    2u    0s 11652176mem    0pf
>> 41975.cc                        Container erase                  
>> 185r  182u    0s -9600080mem    0pf
>> 41975.cc                        Container iteration                
>> 0r    0u    0s       48mem    0pf
>>
>> Of course there are more memory manipulated. Regarding operations CPU 
>> consumption iteration is much better but erase is much worst. I think 
>> the problem is in the invocation of std::fill that cost a lot as soon 
>> as we start having a lot of buckets to update. This is why I have 
>> already started to work on a last evolution to avoid those calls.
> Excellent. I'm afraid I still don't have much to say/help and given 
> that I don't want to distract you anyway, but where does the 
> additional complexity of erase come from? Could you describe in a few 
> sentences the steps you go through at erase time?
1. Compute erased iterator bucket index. This is necessary because 
iterators do not contain a reference to the bucket anymore so that it 
doesn't have to be updated on incrementation
2. If erased iterator is first in the bucket, look for previous 
non-empty bucket
   2.1 Fill the range from previous non-empty bucket + 1 to iterator 
bucket with the node after erased iterator
3. unlink and deallocate the erased node

     I think the expensive operation is the 2.1. As soon as we start 
having a small number of elements compared to the number of buckets the 
fill operation becomes quite heavy. This is why my last modification 
goal is to avoid this fill operation. Rather than storing bucket N last 
node in bucket N+ 1 and followings up to next non-empty bucket I am 
going to store 2 nodes in each bucket, the begin and the end.
>
> I was also thinking that it would be great if you could add comments 
> to the code, even if some have to be redone when you rework the code 
> in the following drafts.
I already hesitated adding comments because there are not a lot in 
current implementation so I guessed that as soon as code is rather easy 
to read it doesn't need comment. But of course the code I write is 
necessary easy to read for me so I will try to find the right places to 
add comments.
> And an introductory comment right before _Hashtable describing the 
> data structures. Like, eg, the one in std::deque, very well done, IMO.
>
Ok, I will use deque description to write one for hashtable.
> Tomorrow, besides committing the performance testcase, I will also add 
> those missing count testcases. Just to be of some small help ;)
>
> As soon as I'm done with a couple of pending things, I should be able 
> to assist you a bit.
Thanks.

François



More information about the Libstdc++ mailing list