This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GSOC Question about the parallelization project


>On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm <dmalcolm@redhat.com> wrote:
>> On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote:
>>> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters <sebaspe9
>>> > 7@hotmail.com> wrote:
>>> > > > The goal should be to extend TU wise parallelism via make to
>>> > > > function
>>> > >
>>> > > wise parallelism within GCC.
>>> > >
>>> > > Could you please elaborate more on this?
>>> >
>>> > In the abstract sense you'd view the compilation process separated
>>> > into N stages, each function being processed by each. You'd assign
>>> > a thread to each stage and move the work items (the functions)
>>> > across the set of threads honoring constraints such as an IPA stage
>>> > needing all functions completed the previous stage. That allows you
>>> > to easier model the constraints due to shared state (like no pass
>>> > operating on two functions at the same time) compared to a model
>>> > where you assign a thread to each function.
>>> >
>>> > You'll figure that the easiest point in the pipeline to try this
>>> > 'pipelining' is after IPA has completed and until RTL is generated.
>>> >
>>> > Ideally the pipelining would start as early as the front ends
>>> > finished parsing a function and ideally we'd have multiple
>>> > functions in the RTL pipeline.
>>> >
>>> > The main obstacles will be the global state in the compiler of
>>> > which there is the least during the GIMPLE passes (mostly cfun and
>>> > current_function_decl plus globals in the individual passes which
>>> > is easiest dealt with by not allowing a single pass to run at the
>>> > same time in multiple threads). TLS can be used for some of the
>>> > global state plus of course some global data structures need
>>> > locking.

This would mean that all the passes have to be individually analyzed for which global state
they use, and lock/schedule them accordingly?

If this is the case, is there any documentation that describes the pre-reqs for each pass?
I have looked at the internal documentation, however it seems that all of this still has to be created?

As to how this would be implemented, it seems the easiest way would be to extend the macros to
accept a pre-req pass. This would encourage more documentation since the dependencies
become explicit instead of the current implicit ordering.

Assuming the dependencies for the all the tree-ssa passes have to be individually analyzed.
Currently I have this as my timeline:
    - Parallelize the execution of the post-IPA pre-RTL, and a few tree-ssa passes (mid-may - early june)
    - Test for possible reproducibility issues for the binary/debug info (early june - late june)
    - Parallelize the rest of tree-ssa (late june - late july)
    - Update documentation and test again for reproducibility issues (late july - early august)

Would this be acceptable?

>>> Oh, and just to mention - there are a few things that may block
>>> adoption in the end
>>> like whether builds are still reproducible (we allocate things like
>>> DECL_UID from
>>> global pools and doing that somewhat randomly because of threading
>>> might - but not
>>> must - change code generation).  Or that some diagnostics will appear
>>> in
>>> non-deterministic order, or that dump files are messed up (both
>>> issues could be
>>> solved by code dealing with the issue, like buffering and doing a re-
>>> play in
>>> program order).  I guess reproducability is important when it comes
>>> down to
>>> debugging code-generation issues - I'd prefer to debug gcc when it
>>> doesn't run
>>> threaded but if that doesn't reproduce an issue that's bad.
>>>
>>> So the most important "milestone" of this project is to identify such
>>> issues and
>>> document them somewhere.
>>
>> One issue would be the garbage-collector: there are plenty of places in
>> GCC that have hidden assumptions that "a collection can't happen here"
>> (where we have temporaries that reference GC-managed objects, but which
>> aren't tracked by GC-roots).
>>
>> I had some patches for that back in 2014 that I think I managed to drop
>> on the floor (sorry):
>>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html
>>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html
>>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html

Would there be a way to easily create a static analyzer to find these untracked temporaries?

A quick look at registered passes makes me count ~135 tree-ssa passes,
So your code on analyzing what globals are referenced where might come in
handy while analyzing if passes are easily parallelized.

>> The GC's allocator is used almost everywhere, and is probably not
>> thread-safe yet.
>Yes.  There's also global tree modification like chaining new
>pointer types into TYPE_POINTER_TO and friends so some
>helpers in tree.c need to be guarded as well.
>> FWIW I gave a talk at Cauldron 2013 about global state in GCC.  Beware:
>> it's five years out-of-date, but maybe is still relevant in places?
>>   https://dmalcolm.fedorapeople.org/gcc/global-state/
>>   https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html
>> (I tackled this for libgccjit by instead introducing a mutex, a "big
>> compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever
>> thread is calling into the rest of the compiler sources).
>>
>> Hope this is helpful
>> Dave
>>
>> [...]


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]