This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Hashtable Small size optimization


Re https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00903.html


On 15/10/18 22:46 +0200, François Dumont wrote:
I started considering PR libstdc++/68303.

First thing was to write a dedicated performance test case, it is the unordered_small_size.cc I'd like to add with this patch.

Great, more performance tests are always good.

The first runs show a major difference between tr1 and std implementations, tr1 being much better:

std::tr1::unordered_set<int> without hash code cached: 1st insert       9r    9u    1s  14725920mem    0pf std::tr1::unordered_set<int> with hash code cached: 1st insert       8r    9u    0s  14719680mem    0pf std::unordered_set<int> without hash code cached: 1st insert      17r   17u    0s  16640080mem    0pf std::unordered_set<int> with hash code cached: 1st insert      14r   14u    0s  16638656mem    0pf

I had a look in gdb to find out why and the answer was quite obvious. For 20 insertions tr1 implementation bucket count goes through [11, 23] whereas for std it is [2, 5, 11, 23], so 2 more expensive rehash.

As unordered containers are dedicated to rather important number of elements I propose to review the rehash policy with this patch so that std also starts at 11 on the 1st insertion. After the patch figures are:

std::tr1::unordered_set<int> without hash code cached: 1st insert       9r    9u    0s  14725920mem    0pf std::tr1::unordered_set<int> with hash code cached: 1st insert       8r    8u    0s  14719680mem    0pf std::unordered_set<int> without hash code cached: 1st insert      15r   15u    0s  16640128mem    0pf std::unordered_set<int> with hash code cached: 1st insert      12r   12u    0s  16638688mem    0pf

OK, that seems worthwhile then.

Moreover I noticed that performance tests are built with -O2, is that intentional ? The std implementation uses more abstractions than tr1, looks like building with -O3 optimizes away most of those abstractions making tr1 and std implementation much closer:

Yes, I think it's intentional that -O2 is used, because I think
that's the most commonly-used optimisation level. We don't want to
std::tr1::unordered_set<int> without hash code cached: 1st insert       2r    1u    1s  14725952mem    0pf std::tr1::unordered_set<int> with hash code cached: 1st insert       2r    1u    0s  14719536mem    0pf std::unordered_set<int> without hash code cached: 1st insert       2r    2u    0s  16640064mem    0pf std::unordered_set<int> with hash code cached: 1st insert       2r    2u    0s  16638608mem    0pf

Hmm, interesting. I wonder if we can determine what is not being
optimized at -O2, and either tweak the code or improve the compiler.

Note that this patch also rework the alternative rehash policy based on powers of 2 so that it also starts with a larger number of bucket (16) and respects LWG2156.

Last I had to wider the memory column so that alignment is preserved even when memory diff is negative.

Tested under Linux x86_64.

Ok to commit ?

Does this patch still apply cleanly? I think it would be good to
commit it now, early in stage 1.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]