This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Hashtable Small size optimization

From: Jonathan Wakely <jwakely at redhat dot com>
To: François Dumont <frs dot dumont at gmail dot com>
Cc: "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Fri, 31 May 2019 12:44:31 +0100
Subject: Re: Hashtable Small size optimization
References: <efa88a3a-efc3-aade-85d6-697a7228d66c@gmail.com>

Re https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00903.html


On 15/10/18 22:46 +0200, François Dumont wrote:

I started considering PR libstdc++/68303.
First thing was to write a dedicated performance test case, it is theunordered_small_size.cc I'd like to add with this patch.


Great, more performance tests are always good.

The first runs show a major difference between tr1 and stdimplementations, tr1 being much better:
std::tr1::unordered_set<int> without hash code cached: 1st insert 9r 9u 1s 14725920mem 0pfstd::tr1::unordered_set<int> with hash code cached: 1st insert 8r 9u 0s 14719680mem 0pfstd::unordered_set<int> without hash code cached: 1st insert 17r 17u 0s 16640080mem 0pfstd::unordered_set<int> with hash code cached: 1st insert 14r 14u 0s 16638656mem 0pf
I had a look in gdb to find out why and the answer was quite obvious.For 20 insertions tr1 implementation bucket count goes through [11,23] whereas for std it is [2, 5, 11, 23], so 2 more expensive rehash.
As unordered containers are dedicated to rather important number ofelements I propose to review the rehash policy with this patch so thatstd also starts at 11 on the 1st insertion. After the patch figuresare:
std::tr1::unordered_set<int> without hash code cached: 1st insert 9r 9u 0s 14725920mem 0pfstd::tr1::unordered_set<int> with hash code cached: 1st insert 8r 8u 0s 14719680mem 0pfstd::unordered_set<int> without hash code cached: 1st insert 15r 15u 0s 16640128mem 0pfstd::unordered_set<int> with hash code cached: 1st insert 12r 12u 0s 16638688mem 0pf


OK, that seems worthwhile then.

Moreover I noticed that performance tests are built with -O2, is thatintentional ? The std implementation uses more abstractions than tr1,looks like building with -O3 optimizes away most of those abstractionsmaking tr1 and std implementation much closer:


Yes, I think it's intentional that -O2 is used, because I think

that's the most commonly-used optimisation level. We don't want to

std::tr1::unordered_set<int> without hash code cached: 1st insert 2r 1u 1s 14725952mem 0pfstd::tr1::unordered_set<int> with hash code cached: 1st insert 2r 1u 0s 14719536mem 0pfstd::unordered_set<int> without hash code cached: 1st insert 2r 2u 0s 16640064mem 0pfstd::unordered_set<int> with hash code cached: 1st insert 2r 2u 0s 16638608mem 0pf


Hmm, interesting. I wonder if we can determine what is not being
optimized at -O2, and either tweak the code or improve the compiler.

Note that this patch also rework the alternative rehash policy basedon powers of 2 so that it also starts with a larger number of bucket(16) and respects LWG2156.
Last I had to wider the memory column so that alignment is preservedeven when memory diff is negative.
Tested under Linux x86_64.

Ok to commit ?


Does this patch still apply cleanly? I think it would be good to
commit it now, early in stage 1.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]