This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Hashtable comment cleanups & renamings

From: François Dumont <frs dot dumont at gmail dot com>
To: Jonathan Wakely <jwakely at redhat dot com>
Cc: "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 20 May 2019 07:51:54 +0200
Subject: Re: Hashtable comment cleanups & renamings
References: <08e5ed09-2950-09d9-56a5-f33cf1ca60a0@gmail.com> <20190517202426.GX2599@redhat.com>

On 5/17/19 10:24 PM, Jonathan Wakely wrote:

On 17/05/19 18:19 +0200, François Dumont wrote:
Hi
I got tired of '__n' being used in _Hashtable for many differentpurposes: node, bucket, bucket count, bucket hint. It makes the codedifficult to read. This code makes sure that __n is a node except issome very limited use cases where the method name is clear enough totell what __n means.
So I'd like to commit this patch which only change that and somecomments before moving forward to more serious stuff. The only codechange is a use of auto return type on _M_allocate_node.
    My main concern is the ChangeLog entry. Is the following entry ok ?

    Rename variables and cleanup comments.
    * include/bits/hashtable_policy.h
    * include/bits/hashtable.h

    Tested under Linux x86_64 (even if it can't be otherwise)

François
@@ -350,24 +347,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
      _M_base_alloc() { return *this; }

      __bucket_type*
-      _M_allocate_buckets(size_type __n)
+      _M_allocate_buckets(size_type __bkt_count)
This is a much more helpful name, thanks.
@@ -439,30 +436,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
      { }

      explicit
-      _Hashtable(size_type __n,
+      _Hashtable(size_type __bkt_hint,
This seems less helpful. Would __num_bkts_hint be clearer?
Or for consistency, __bkt_count_hint?

I thought also about a longer name like this one but then I consideredthat I didn't want to make it too long and that '__bkt_hint' was enoughknow that this parameter was related to the buckets. But I can use__bkt_count_hint if you prefer.

@@ -1415,9 +1414,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    -> iterator
    {
      __hash_code __code = this->_M_hash_code(__k);
-      std::size_t __n = _M_bucket_index(__k, __code);
-      __node_type* __p = _M_find_node(__n, __k, __code);
-      return __p ? iterator(__p) : end();
+      std::size_t __bkt = _M_bucket_index(__k, __code);
+      __node_type* __n = _M_find_node(__bkt, __k, __code);
+      return __n ? iterator(__n) : end();


Is __n really an improvement over __p here?

Outside any context no but within _Hashtable implementation __n isalready used most of the time to indicate a node. This is patch just tryto fix this 'most of the time' part.


If you're changing it, __node or __ptr might be an improvement, but
changing __p to __n seems like unnecessary churn.

I'm not convinced that __n is a big enough improvement over __p to
bother changing dozens of lines, for not much benefit. All those
changes will make it slower to use git blame to track down when thigns
changed, and will make it harder to review diffs between trunk and
older branches.

Yes, this is why I wanted to commit it outside of any real change sothat this commit can be ignore from any git log or blame more easily.

So I can limit the patch to renaming __n occurences into __bkt,__bkt_count_hint, __bkt_count when possible and leave other __n but also__p & al untouch.

@@ -1479,17 +1478,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    -> pair<iterator, iterator>
    {
      __hash_code __code = this->_M_hash_code(__k);
-      std::size_t __n = _M_bucket_index(__k, __code);
-      __node_type* __p = _M_find_node(__n, __k, __code);
+      std::size_t __bkt = _M_bucket_index(__k, __code);
+      __node_type* __n = _M_find_node(__bkt, __k, __code);

-      if (__p)
+      if (__n)
    {
-      __node_type* __p1 = __p->_M_next();
-      while (__p1 && _M_bucket_index(__p1) == __n
-         && this->_M_equals(__k, __code, __p1))
-        __p1 = __p1->_M_next();
+      __node_type* __n1 = __n->_M_next();


__p1 is not a good name, but __n1 is no better.

At least with __p the second pointer could be __q, which is a fairly
idiomatic pairing of letters :-)

How about __first and __last? Or __n and __next?  Even __n1 and __n2
seems better than __n and __n1. Those pointers end up being used for
the 'first' and 'second' members of a pair, so __n1 and __n2 makes
more sense than setting 'first' from __n and 'second' from __n1.

But I don't feel strongly about it, so if it's just me who dislikes
__n and __n1 then it doesn't matter.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.hb/libstdc++-v3/include/bits/hashtable_policy.h

index a4d2a97f4f3..bb2e7b762ff 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -181,7 +181,7 @@ namespace __detail
   *  @tparam _Cache_hash_code  Boolean value. True if the value of
   *  the hash function is stored along with the value. This is a
   *  time-space tradeoff.  Storing it may improve lookup speed by
-   *  reducing the number of times we need to call the _Equal
+   *  reducing the number of times we need to call the _Hash


Doesn't it reduce both?

In _M_equals we don't bother calling the _Equal predicate if the
cached hash code doesn't match the one for the key we're comparing.

Sure it reduces both, I'll just add _Hash (but first)

References:
- Hashtable comment cleanups & renamings
  - From: François Dumont
- Re: Hashtable comment cleanups & renamings
  - From: Jonathan Wakely

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]