This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [ GSoC ] Bit Iterators, trivial relocatability, and performance


On Mon, 4 Feb 2019, JeanHeyd Meneide wrote:

Dear libstdc++-dev Community,

    I hope this e-mail finds you all doing well. My name is JeanHeyd
Meneide and I am interested in participating in the Google Summer of Code,
2019.

    I have put together a preliminary proposal for working on
_Bit_iterator and overloading several standard algorithms to improve their
performance characteristics, following the wisdom of the 7 year old Howard
Hinnant writeup on vector<bool> (
https://howardhinnant.github.io/onvectorbool.html). I also want to look
into algorithmic performance improvements related to
is_trivially_relocatable. The proposal can be found at:
https://github.com/ThePhD/gsoc-2019-bit/blob/master/proposal/gsoc-2019.pdf

    I know the student application section for GSoC isn't open yet, but I
wanted to gain early feedback on this and was advised by people to post
here. (If this is the wrong place, I can post it elsewhere, such as the
typical gcc-dev list?)

I am not available enough during the summer to co-mentor, but the topic is interesting and if you post to this list I'll likely be able to help sometimes.

Some comments:

2§1 it is good to test against preexisting tests, but it wouldn't hurt to add some new ones. The pitfalls will not be exactly the same, and while the existing tests are fine for a naive implementation, they may not include large consecutive groups of false (where some intrinsics have undefined behavior), etc. I never tried to run the llvm checks on libstdc++, but trying it once (without spending weeks on it if it is hard) could be a nice sanity check.

2§3 code review really happens on the list, bugzilla is for reporting bugs

2.2 it isn't clear what you would optimize about iter_move/iter_swap, which handle a single bit. If there is work to be done here, it is more likely in the optimizers.

3 Don't oversell Arthur's proposal. It was one amongst several that appeared along the years, and it doesn't seem to have been received much better (i.e. people are interested but it isn't clear the approach is the one we want).

What we have in libstdc++ now is much more limited than what Arthur proposes. Although it is not included in it, since the original goal was to speed up vector<string> and string is not trivially relocatable in libstdc++.

The algorithms you propose to optimize seem suspicious.

std::copy: relocation destructs the original object, and std::copy doesn't. We already use memmove for std::copy of trivial types, it may be possible to refine the concept, but trivially relocatable isn't it.

std::find, std::equal: this is about trivial comparability, not relocation, again a different concept.

I guess you could either look for other places to take advantage of trivial relocatability, or investigate other trivial behavior traits.

I wouldn't call the work I did in libstdc++ considerable ;-)

One low-hanging fruit is to specialize __is_trivially_relocatable (or whatever we rename that to) for pair (might as well do tuple at the same time), which in libstdc++ is never trivial for ABI compatibility reasons. I intended to do it myself (the patch shouldn't take more than 5 minutes to write, writing tests and testing is the more costly part), but if someone is doing a GSoC, I can leave it to them.


The remark above about optimizers is one I think is interesting when contributing to a compiler, although others may find it less relevant: it is good to tweak the library to improve performance, but when possible it is even better to tweak the optimizers to generate the same good asm from the old readable code. The second is often harder or even impossible, so it is worth doing the first, but it should be kept in mind when you see a low hanging fruit. And for this, it may be good to take a look at what the compiler (with -O2 or -O3) writes with -fdump-tree-optimized. It is good anyway because you will have a clearer picture of what your code does, you will notice if popcount is surrounded by tons of tests or in a tight loop, etc.

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]