This is the mail archive of the
mailing list for the libstdc++ project.
Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
- From: Paolo Carlini <paolo dot carlini at oracle dot com>
- To: FranÃois Dumont <frs dot dumont at gmail dot com>
- Cc: Jonathan Wakely <jwakely dot gcc at gmail dot com>, "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>
- Date: Tue, 4 Dec 2012 23:10:52 +0100
- Subject: Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
- References: <email@example.com/bugzilla/> <bug-54075-19885-8bKDxmpStr@http.gcc.gnu.org/bugzilla/> <509ADA7E.firstname.lastname@example.org> <CAH6eHdRHCefZq1FQF64nJ04-OMgMBb8TeUa5y9Lvi-2H=xE0Ww@mail.gmail.com> <509B1130.email@example.com> <509B1834.firstname.lastname@example.org> <509C21B9.email@example.com> <509CE00A.firstname.lastname@example.org> <50A2BE5B.email@example.com> <50A2CF71.firstname.lastname@example.org> <50A40CB4.email@example.com> <50A41FA7.firstname.lastname@example.org> <50A557EB.email@example.com> <CAH6eHdRS58LUaRTD2ienwMc05sE4mXZrbW4rYGSUPdnuQ_YzUg@mail.gmail.com> <50A9448D.firstname.lastname@example.org> <CAH6eHdSoCtwsrys5KWX7LAEURKiw7afTVDEBmo1MJ2sAiJ4_rQ@mail.gmail.com> <50AFF1DF.email@example.com> <50AFF78D.firstname.lastname@example.org> <50B7D4F3.email@example.com> <F0468BEB-6C83-4F28-861E-9B945D1A62D2@oracle.com> <50BE6C9F.firstname.lastname@example.org>
> I spent some more time on checking why performance changed.
> After having reverted hash policy modification and use original Foo implementation I can't confirm original numbers. the difference between hash code cached and not is indeed smaller in the std implementation than in the tr1 one. I can't explain why as I can't explain why tr1 unordered_multiset is behaving so badly compared to the std one.
> I want to signal that I experiment some reliability issues with performance tests. When I started to rollback stuff I saw major performance issue on first std container bench. But when I switch std and tr1 benches then the performance issue appeared in tr1 implementation. Attached you will find the modifications I have done to 54075 to let me challenge the performance. If you want to run test on your side don't hesitate.
Thanks for the additional effort on this. I believe that for 4.8 we should really thoroughly double check what we are changing: my point is that we don't want to change the caching policy in the light of numbers which aren't sound, even if theoretically we may find the idea appealing. Over the next days I will rerun the test on my machines, anyway.
Note, *very important*, that new performance testcases should be always carefully checked outside the testsuite (before tweaking it to use the performance infrastructure and adding it) to make sure the optimizers aren't optimizing away whole loops to nothing and similar unwanted effects. Please do those checks with me, please see if the numbers you get become more stable if you make the tests more complex to counteract the effect of the optimizers (eg, a typical pattern: have main returning a sum obtained from partial sums computed in each iteration of the code under tests). It may be also useful to run the tests at -O1: the relative performance differences should be much more trustworthy.