This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Proposal to contribute Intel’s implementation of C++17 parallel algorithms


On 29/11/17 12:29 +0000, Kukanov, Alexey wrote:
Hello all,

At Intel, we have developed an implementation of C++17 execution policies
for algorithms (often referred to as Parallel STL). We hope to contribute it
to libstdc++/GCC, so would like to ask the community for comments on this.

That's very welcome news!

As I'm sure you know, those algos are one of the biggest pieces of
C++17 support that is still completely missing from libstdc++.

The code is already published at GitHub (https://github.com/intel/parallelstl).
It supports the C++17 standard execution policies (seq, par, par_unseq) as well as
the experimental unsequenced policy (unseq) for SIMD execution. At the moment,
about half of the C++17 standard algorithms that must support execution policies
are implemented; a few more will be ready soon, and the work continues.
The tests that we use are also available at GitHub; needless to say we will
contribute those as well.

The implementation is not specific to Intel’s hardware. For thread-level parallelism
it uses TBB* (https://www.threadingbuildingblocks.org/) but abstracts it with
an internal API which can be implemented on top of other threading/parallel solutions –
so it is for the community to decide which ones to use. For SIMD parallelism
(unseq, par_unseq) we use #pragma omp simd directives; it is vendor-neutral
and does not require any OpenMP runtime support.

Great.

The current implementation meets the spirit but not always the letter
of the standard, because it has to be separate from but also coexist with
implementations of standard C++ libraries. While preparing the contribution,
we will address inconsistencies, adjust the code to meet community standards,
and better integrate it into the standard library code.

Also great :-)

We are also proposing that our implementation is included into libstdc++/GCC.
Compatibility between the implementations seems useful as it can potentially
reduce the amount of work for everyone. We hope to keep the code mostly identical,
and would like to know if you think it’s too optimistic to expect.

I think if you're planning to do the ongoing maintenance for the code
in both libstdc++ and in libc++ (and upstream in your own third-party
github repo too?) then we can be more flexible about requiring coding
style, naming etc. to meet the usual libstdc++ conventions. Where
reasonable I think we could treat the copy in libstdc++ as a
downstream fork of your upstream project, and we should try to make it
as easy as possible to sync changes from upstream into our copy (and
also ensure that changes are pushed upstream whenever possible).

Obviously we plan to use appropriate open source licenses to meet the different
projects’ requirements.

That's great. I'll investigate what the options are here. It might be
possible to simply take the code with the exiting Apache license, and
treat it as a fairly distinct chunk of code within libstdc++ that is
licensed differently (again, similar to how libsanitizer is handled).
Or it might be better to have it licensed as GPLv3+exception, like the
rest of libstdc++. Either way, this is something we can work out.

We expect to keep developing the code and will take the responsibility for
maintaining it (with community contributions, of course). If there are other
community efforts to implement parallel algorithms, we are willing to collaborate.

The only other effort I know of to implement this for libstdc++ is by
Pekka Jääskeläinen and relies on GCC's offloading support for GPGPUs
and other compute devices. He gave a talk about it at this year's GNU
Cauldron, the slides are at https://gcc.gnu.org/wiki/cauldron2017?action=AttachFile&do=get&target=parallel-stl-on-hsa-and-gcc-offloading-infra.pdf

The ideal might be for the two options to coexist, so that users of
heterogeneous devices can offload parallel tasks to them. I have no
idea if that would be something to provide as a backend for your
library's internal API, or if something else would need to be done.
I've CC'd Pekka on this email.

We look forward to your feedback, both for the overall idea and – if supported –
for the next steps we should take.

I'll look into the licensing options and get back to you. It's
possible this is too late to be added to GCC 8 (as our dev phase for
GCC 8 just finished). If it's entirely new C++17 code that doesn't
affect the default C++14 mode we might be able to bend the rules a
bit. We'd need to move quickly if we're going to do that though.
If it is too late then you could work in a branch that can be
merged to trunk after the GCC 8 release, so that you don't need to
wait for next April before adding anything to libstdc++.

I'll try to get you some answers about licensing and timetables, but
those are details we can sort out. I'm very happy to hear of your
plans and Ithink we definitely want to accept Intel's generous offer!


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]