This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: [ GSoC ] Bit Iterators, trivial relocatability, and performance
- From: Marc Glisse <marc dot glisse at inria dot fr>
- To: JeanHeyd Meneide <phdofthehouse at gmail dot com>
- Cc: libstdc++ at gcc dot gnu dot org
- Date: Mon, 4 Feb 2019 11:59:32 +0100 (CET)
- Subject: Re: [ GSoC ] Bit Iterators, trivial relocatability, and performance
- References: <CANHA4Og1r27tkcnp1B_i=7M3v86BDikTf__NZYKGKPPYr0oi3A@mail.gmail.com>
- Reply-to: libstdc++ at gcc dot gnu dot org
On Mon, 4 Feb 2019, JeanHeyd Meneide wrote:
Dear libstdc++-dev Community,
I hope this e-mail finds you all doing well. My name is JeanHeyd
Meneide and I am interested in participating in the Google Summer of Code,
2019.
I have put together a preliminary proposal for working on
_Bit_iterator and overloading several standard algorithms to improve their
performance characteristics, following the wisdom of the 7 year old Howard
Hinnant writeup on vector<bool> (
https://howardhinnant.github.io/onvectorbool.html). I also want to look
into algorithmic performance improvements related to
is_trivially_relocatable. The proposal can be found at:
https://github.com/ThePhD/gsoc-2019-bit/blob/master/proposal/gsoc-2019.pdf
I know the student application section for GSoC isn't open yet, but I
wanted to gain early feedback on this and was advised by people to post
here. (If this is the wrong place, I can post it elsewhere, such as the
typical gcc-dev list?)
I am not available enough during the summer to co-mentor, but the topic is
interesting and if you post to this list I'll likely be able to help
sometimes.
Some comments:
2§1 it is good to test against preexisting tests, but it wouldn't hurt to
add some new ones. The pitfalls will not be exactly the same, and while
the existing tests are fine for a naive implementation, they may not
include large consecutive groups of false (where some intrinsics have
undefined behavior), etc. I never tried to run the llvm checks on
libstdc++, but trying it once (without spending weeks on it if it is hard)
could be a nice sanity check.
2§3 code review really happens on the list, bugzilla is for reporting bugs
2.2 it isn't clear what you would optimize about iter_move/iter_swap,
which handle a single bit. If there is work to be done here, it is more
likely in the optimizers.
3 Don't oversell Arthur's proposal. It was one amongst several that
appeared along the years, and it doesn't seem to have been received much
better (i.e. people are interested but it isn't clear the approach is the
one we want).
What we have in libstdc++ now is much more limited than what Arthur
proposes. Although it is not included in it, since the original goal was
to speed up vector<string> and string is not trivially relocatable in
libstdc++.
The algorithms you propose to optimize seem suspicious.
std::copy: relocation destructs the original object, and std::copy
doesn't. We already use memmove for std::copy of trivial types, it may be
possible to refine the concept, but trivially relocatable isn't it.
std::find, std::equal: this is about trivial comparability, not
relocation, again a different concept.
I guess you could either look for other places to take advantage of
trivial relocatability, or investigate other trivial behavior traits.
I wouldn't call the work I did in libstdc++ considerable ;-)
One low-hanging fruit is to specialize __is_trivially_relocatable (or
whatever we rename that to) for pair (might as well do tuple at the same
time), which in libstdc++ is never trivial for ABI compatibility reasons.
I intended to do it myself (the patch shouldn't take more than 5 minutes
to write, writing tests and testing is the more costly part), but if
someone is doing a GSoC, I can leave it to them.
The remark above about optimizers is one I think is interesting when
contributing to a compiler, although others may find it less relevant: it
is good to tweak the library to improve performance, but when possible it
is even better to tweak the optimizers to generate the same good asm from
the old readable code. The second is often harder or even impossible, so
it is worth doing the first, but it should be kept in mind when you see a
low hanging fruit. And for this, it may be good to take a look at what the
compiler (with -O2 or -O3) writes with -fdump-tree-optimized. It is good
anyway because you will have a clearer picture of what your code does, you
will notice if popcount is surrounded by tons of tests or in a tight loop,
etc.
--
Marc Glisse