[PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop conditions
Jonathan Wakely
jwakely@redhat.com
Wed Dec 18 15:59:54 GMT 2024
On Wed, 18 Dec 2024 at 14:14, Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> > e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931a
> > > ffcf71d826c88094 100644
> > > > --- a/libstdc++-v3/include/bits/hashtable.h
> > > > +++ b/libstdc++-v3/include/bits/hashtable.h
> > > > @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > if (this->_M_equals(__k, __code, *__p))
> > > > return __prev_p;
> > > >
> > > > - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> > > > + if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > >_M_next())
> > > != __bkt, 0))
> > > > break;
> > > > __prev_p = __p;
> > > > }
> > > > @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > if (this->_M_equals_tr(__k, __code, *__p))
> > > > return __prev_p;
> > > >
> > > > - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> > > > + if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > > >_M_next()) != __bkt, 0))
> > > > break;
> > > > __prev_p = __p;
> > > > }
> > > > @@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > pointer_to(const_cast<__node_base&>(_M_before_begin));
> > > > while (__loc._M_before->_M_nxt)
> > > > {
> > > > - if (this->_M_key_equals(__k, *__loc._M_node()))
> > > > + if (__builtin_expect (this->_M_key_equals(__k, *__loc._M_node()), 1))
> > > > return __loc;
> > > > __loc._M_before = __loc._M_before->_M_nxt;
> > > > }
> > >
> > > The first two changes make sense to me: we're typically going to loop
> > > more than once when searching for an element, so the break is
> > > unlikely. The third change is a bit more surprising, because although
> > > it's likely we'll find an element *eventually* it is only likely once
> > > out of N iterations, not likely on every iteration. But if the numbers
> > > show it helps, then that just shows my intuition is probably wrong.
> > >
> > > If this is optimized for 'find' is it possible that this change
> > > pessimizes insertion (which also has to use _M_locate to find where to
> > > insert)?
> >
> > You're right there seems to be a uniform slowdown for insertions in small sized
> > hashtables of ~10 entries. They're about 10-14% slower with that change.
> >
> > As expected just the ones on _M_find_before_node does not cause any issues.
> >
> > Since the insertions into small hashtables don't run long enough I've also increased
> > the number of iterations to ~1 million even out the score a bit more but there's still
> > a sizable 5 regressions.
> >
> > I've kicked off a longer run removing the change on _M_locate and with some
> > more
> > variable size finds/inserts.
> >
> > So far it looks like the additional benefits gotten on _M_locate were
> > mainly on two tests. I'll look at the perf traces to figure out exactly why but I'll
> > respin
> > the patch without the _M_locate change and send it tomorrow morning once
> > benchmarks finish.
> >
>
> Hi,
>
> Here's a new patch and new numbers showing the improvements over the previous
> one and total improvement with the two together. This change seems to be mostly
> beneficial on larger sized hashmaps and otherwise no real losses on Insert or Find.
> (around the region of ~1% which look like noise).
>
> which results in ~0-10% extra on top of the previous patch.
>
> In table form:
>
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> | benchmark | Type | Size | Inline vs baseline | final vs baseline | final vs inline |
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
> | find many | uint64_t | 11253 | -15.67% | -22.96% | -8.65% |
> | find many | uint64_t | 11253 | -16.74% | -23.37% | -7.96% |
> | find single | uint64_t | 345 | -5.88% | -11.54% | -6.02% |
> | find many | string | 11253 | -4.50% | -9.56% | -5.29% |
> | find single | uint64_t | 345 | -4.38% | -9.41% | -5.26% |
> | find single | shared string | 11253 | -6.67% | -11.00% | -4.64% |
> | find single | shared string | 11253 | -4.63% | -9.03% | -4.61% |
> | find single | shared string | 345 | -10.41% | -14.44% | -4.50% |
> | find many | string | 11253 | -3.41% | -7.51% | -4.24% |
> | find many | shared string | 11253 | -2.30% | -5.72% | -3.50% |
> | find many | string | 13 | 2.86% | -0.30% | -3.07% |
> | find single | string | 11253 | 4.47% | 1.34% | -3.00% |
> | find many | custom string | 11253 | 0.25% | -2.75% | -2.99% |
> | find single | uint64_t | 345 | 2.99% | 0.01% | -2.90% |
> | find single | shared string | 345 | -11.53% | -13.67% | -2.41% |
> | find single | uint64_t | 11253 | 0.49% | -1.59% | -2.07% |
> +-------------+---------------+-------+--------------------+-------------------+-----------------+
>
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
OK, thanks!
>
> Thanks,
> Tamar
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/hashtable.h
> (_M_find_before_node): Make it likely that the map has at least one
> entry and so we do at least one iteration.
>
> -- inline copy of patch --
>
> diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
> index e791e52ec329277474f3218d8a44cd37ded14ac3..51437619b44ce8b93be6a1d223828fda5a4a4c45 100644
> --- a/libstdc++-v3/include/bits/hashtable.h
> +++ b/libstdc++-v3/include/bits/hashtable.h
> @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> if (this->_M_equals(__k, __code, *__p))
> return __prev_p;
>
> - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> + if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt, 0))
> break;
> __prev_p = __p;
> }
> @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> if (this->_M_equals_tr(__k, __code, *__p))
> return __prev_p;
>
> - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> + if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt, 0))
> break;
> __prev_p = __p;
> }
More information about the Libstdc++
mailing list