hashtable local iterator

Paolo Carlini paolo.carlini@oracle.com
Fri Dec 16 11:07:00 GMT 2011


Hi,
> On 12/14/2011 01:49 AM, Paolo Carlini wrote:
>> Hi,
>>> Hi
>>>
>>>     Since modification of hashtable implementation I was quite 
>>> unhappy of the impact on local_iterator and especially on the 
>>> end(size_type bkt) method that required to loop through all bucket 
>>> nodes to access the past-the-end one. This is why I would like to 
>>> propose, if it is still time, this new local iterator implementation 
>>> that allow to implement begin(size_t)/end(size_t) in constant time. 
>>> Note that:
>> well, either we do this or, alternately, I'm afraid we have to revert 
>> the whole thing because the current behavior makes the new 
>> implementation non-conforming, right? Actually, in my opinion you 
>> should have pointed out the issue much more explicitly much earlier. 
>> If you are aware of similar issues, breaking the conformance of the 
>> implementation, please say it now, I would hate large last minute 
>> reversions.
>
>     Sorry, I hadn't noticed that it was a non Standard conformity, I 
> though that not being O(N) with N the number of elements in the 
> container or even the number of bucket was enough but you are right 
> the C++11 Standard is really clear on operation complexity and current 
> implementation is wrong.
Well, yes, I think it's wrong. I'm concerned not just because of the 
possible impact on the typical user experience, but also because the 
problem may point to a more fundamental issue with the new 
implementation which we may want to catch as early as possible. And in 
terms of regressions, since we are so aggressively redesigning the whole 
thing we should be rather strict about regressions too, in my opinion.
>     I have one other point that do not make me happy, it is in the 
> management of the iterator hint in insert and emplace_hint methods. 
> libstdc++ do not make anything of it while the Standard say that 
> insertion should be achieved as close as possible to this hint. This 
> is not however a regression of the new implementation, former one was 
> also ignoring this hint information.
Good point. But I just had a look and things like Table 103 don't seem 
that different compared to TR1?! I think we are safe to ignore the hint, 
for now. But definitely something worth further exploring in terms of 
QoI. Note, the associative containers changed quite a bit in this area 
between C++98 and C++11, via a DR, which however we already implemented 
(the emplace methods would do the same).
>>
>>> - this new iterator type needs a pointer to its parent container to 
>>> work which is quite unusual on normal iterators.
>> On the spot, I cannot see anything wrong with this. Normally, when 
>> such kinds of links are present one must be careful about what 
>> happens when one of the two changes without telling the other. Are we 
>> risking, I don't know, local iterators becoming invalid when they 
>> shouldn't?
>>> - I had to make an overload of _M_bucket_index public to use it in 
>>> local iterator implem', do you prefer to introduce a friend 
>>> declaration ?
>> I guess friendship is much better.
>     I have done the modification to use friendship but with the issue 
> raised by Christopher and you it is finally not so important. This 
> swap issue in indeed rude. With current implementation I need at least 
> the hash functor and bucket count to get the bucket index of the nodes 
> on incrementation. I will study this problem but for the moment the 
> only solution I see is to re-introduce the 2 nodes per bucket I had 
> done in one of my propositions.
Ok good. Let's keep in touch, privately too, as usual, we don't want to 
make the release managers too much nervous. About your last observation, 
I would recommend you to seriously explore the idea: the last time I 
discusses these designs with other implementers, a few months ago, I'm 
pretty sure that designs using a "forward_list" for the nodes, then had 
2 nodes per bucket (ie, two pointers to local begin and local end 
essentially). I guess that at this point we really want to keep the 
"forward_list" nature of the links between the nodes stable but for the 
buckets we can be more flexible.

Ah, one last unrelated note: while you review the implementation, please 
double check that all the data members are correctly ordered in the 
layouts, roughly from the larger to the smaller, if in doubt, assuming 
64 bits for the pointers. Remember that we don't have any optimization 
pass reordering those (the last time I checked) and we don't want to 
stupidly waste space because one day, in a rush, we write (stupid example):

     bool b1member;
     void* pmember;
     bool b2member;

instead of
     ...
     void* pmember;
     bool b1member;
     bool b2member;
     ...

Ok, stupid things, but please keep an eye.

Thanks!
Paolo.



More information about the Libstdc++ mailing list