This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GSOC Question about the parallelization project


>On Wed, Mar 21, 2018 at 8:04 PM, Sebastiaan Peters
><sebaspe97@hotmail.com> wrote:
>>>On Wed, Mar 21, 2018 at 9:50 AM, Sebastiaan Peters
>>><sebaspe97@hotmail.com> wrote:
>>>>>On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm <dmalcolm@redhat.com> wrote:
>>>>>> On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote:
>>>>>>> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters <sebaspe9
>>>>>>> > 7@hotmail.com> wrote:
>>>>>>> > > > The goal should be to extend TU wise parallelism via make to
>>>>>>> > > > function
>>>>>>> > >
>>>>>>> > > wise parallelism within GCC.
>>>>>>> > >
>>>>>>> > > Could you please elaborate more on this?
>>>>>>> >
>>>>>>> > In the abstract sense you'd view the compilation process separated
>>>>>>> > into N stages, each function being processed by each. You'd assign
>>>>>>> > a thread to each stage and move the work items (the functions)
>>>>>>> > across the set of threads honoring constraints such as an IPA stage
>>>>>>> > needing all functions completed the previous stage. That allows you
>>>>>>> > to easier model the constraints due to shared state (like no pass
>>>>>>> > operating on two functions at the same time) compared to a model
>>>>>>> > where you assign a thread to each function.
>>>>>>> >
>>>>>>> > You'll figure that the easiest point in the pipeline to try this
>>>>>>> > 'pipelining' is after IPA has completed and until RTL is generated.
>>>>>>> >
>>>>>>> > Ideally the pipelining would start as early as the front ends
>>>>>>> > finished parsing a function and ideally we'd have multiple
>>>>>>> > functions in the RTL pipeline.
>>>>>>> >
>>>>>>> > The main obstacles will be the global state in the compiler of
>>>>>>> > which there is the least during the GIMPLE passes (mostly cfun and
>>>>>>> > current_function_decl plus globals in the individual passes which
>>>>>>> > is easiest dealt with by not allowing a single pass to run at the
>>>>>>> > same time in multiple threads). TLS can be used for some of the
>>>>>>> > global state plus of course some global data structures need
>>>>>>> > locking.
>>>>
>>>> This would mean that all the passes have to be individually analyzed for
>>>> which global state
>>>> they use, and lock/schedule them accordingly?
>>>
>>>Their private global state would be excempt by assuring that a pass never
>>>runs twice at the same time.
>>>
>>>The global state that remains should be the same for all passes we are talking
>>>about (during the late GIMPLE optimization phase which I'd tackle).
>>>
>>>> If this is the case, is there any documentation that describes the pre-reqs
>>>> for each pass?
>>>> I have looked at the internal documentation, however it seems that all of
>>>> this still has to be created?
>>>
>>>The prereqs are actually the same and not very well documented (if at all).
>>>There's the global GC memory pool where we allocate statements and
>>>stuff like that from (and luckyly statements themselves are function private).
>>>Then there's global trees like types ('int') where modification needs to be
>>>appropriately guarded.  Note that "modification" means for example
>>>building a new type for the address of 'int' given that all different
>>>pointer types
>>>to 'int' are chained in a list rooted in the tree for 'int'.  That
>>>means (a few?)
>>>tree building helpers need to be guarded with locks.  I don't have a great
>>>idea how to identify those apart from knowing them in advance or running
>>>into races ... my gut feeling is that there's not a lot to guard but I may
>>>be wrong ;)
>>
>> What does it mean to be a node of a type tree?
>> Does it describe information about that type,
>> or does it keep a reference to where something
>> of that type has been declared?
>
>On the GIMPLE level we have two main kinds of data objects, one is
>'gimple' (and derived clases) for statements and 'tree' for operands,
>types, declarations.  'tree's is also what GENERIC is represented in, the
>IL that gets produced by the frontends which is lowered to GIMPLE by
>the gimplifier.
>
>On GENERIC everything is a 'tree' (there's a class hierarchy of trees
>implemented in C ways via unions), types are, and so are decls
>and expressions (in GENERIC).  You can look at tree.h (and tree-core.h
>for implementation details) and for example see
>
>#define TYPE_POINTER_TO(NODE) (TYPE_CHECK (NODE)->type_common.pointer_to)
>
>where tree.c:build_pointer_type_for_mode does
>
  >/* First, if we already have a type for pointers to TO_TYPE and it's
     >the proper mode, use it.  */
  >for (t = TYPE_POINTER_TO (to_type); t; t = TYPE_NEXT_PTR_TO (t))
    >if (TYPE_MODE (t) == mode && TYPE_REF_CAN_ALIAS_ALL (t) == can_alias_all)
      >return t;
>
>so if a pass builds a new pointer type that doesn't already exist we
>modify this list.
>This means an existing type tree is modified even if the type itself
>isn't modified.
>
>>>> As to how this would be implemented, it seems the easiest way would be to
>>>> extend the macros to
>>>> accept a pre-req pass. This would encourage more documentation since the
>>>> dependencies
>>>> become explicit instead of the current implicit ordering.
>>>
>>>Actually the order is quite explicit.  Maybe I now understand your
>>>question - no,
>>>passes do not "communicate" between each other via global state, all such
>>>state is per function and the execution order of passes on a given function
>>>is hard-coded in passes.def.
>>>
>>>> Assuming the dependencies for the all the tree-ssa passes have to be
>>>> individually analyzed.
>>>> Currently I have this as my timeline:
>>>>     - Parallelize the execution of the post-IPA pre-RTL, and a few tree-ssa
>>>> passes (mid-may - early june)
>>>>     - Test for possible reproducibility issues for the binary/debug info
>>>> (early june - late june)
>>>>     - Parallelize the rest of tree-ssa (late june - late july)
>>>>     - Update documentation and test again for reproducibility issues (late
>>>> july - early august)
>>>>
>>>> Would this be acceptable?
>>>
>>>Sounds ambitious ;)  But yes, it sounds reasonable.  I don't exactly
>>>understand what "Parallelize the rest of tree-ssa" means though.  If
>>>you want to tackle a tiny bit more I'd rather include "RTL" by treating
>>>the whole RTL part as a single "pass" (as said only one function can
>>>be in RTL right now).
>>>
>>
>> I was under the assumption that passes had to be indivdually analysed
>> when I wrote that. The timeline is now updated to include some extra
>> time in the beginning to familiarize myself a bit more with the project,
>> and added the RTL as an optional deliverable:)
>> then
>> I added a draft of the proposal[0], any feedback would be highly appreciated.
>
>The poposal looks good to me!
>
>Thanks,
>Richard.

Awesome, I'll submit this is the coming days.

Thanks for all your help, it has helped tremendously.

I'm looking forward to (hopefully) work with you this summer.

Kind regards,

Sebastiaan Peters

>>>>>>> Oh, and just to mention - there are a few things that may block
>>>>>>> adoption in the end
>>>>>>> like whether builds are still reproducible (we allocate things like
>>>>>>> DECL_UID from
>>>>>>> global pools and doing that somewhat randomly because of threading
>>>>>>> might - but not
>>>>>>> must - change code generation).  Or that some diagnostics will appear
>>>>>>> in
>>>>>>> non-deterministic order, or that dump files are messed up (both
>>>>>>> issues could be
>>>>>>> solved by code dealing with the issue, like buffering and doing a re-
>>>>>>> play in
>>>>>>> program order).  I guess reproducability is important when it comes
>>>>>>> down to
>>>>>>> debugging code-generation issues - I'd prefer to debug gcc when it
>>>>>>> doesn't run
>>>>>>> threaded but if that doesn't reproduce an issue that's bad.
>>>>>>>
>>>>>>> So the most important "milestone" of this project is to identify such
>>>>>>> issues and
>>>>>>> document them somewhere.
>>>>>>
>>>>>> One issue would be the garbage-collector: there are plenty of places in
>>>>>> GCC that have hidden assumptions that "a collection can't happen here"
>>>>>> (where we have temporaries that reference GC-managed objects, but which
>>>>>> aren't tracked by GC-roots).
>>>>>>
>>>>>> I had some patches for that back in 2014 that I think I managed to drop
>>>>>> on the floor (sorry):
>>>>>>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html
>>>>>>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html
>>>>>>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html
>>>>
>>>> Would there be a way to easily create a static analyzer to find these
>>>> untracked temporaries?
>>>
>>>I don't think so.
>>>
>>>> A quick look at registered passes makes me count ~135 tree-ssa passes,
>>>> So your code on analyzing what globals are referenced where might come in
>>>> handy while analyzing if passes are easily parallelized.
>>>
>>>I think the "solution" to the garbage collecting issue is to simply keep
>>>that serialized.  It's currently done on-demand anyway at certain
>>>safe collection points so the work scheduler can simply hold off
>>>scheduling more work when the collector would want to run, waiting for
>>>running jobs to complete.
>>>
>>>Richard.
>>>
>>>>>> The GC's allocator is used almost everywhere, and is probably not
>>>>>> thread-safe yet.
>>>>>Yes.  There's also global tree modification like chaining new
>>>>>pointer types into TYPE_POINTER_TO and friends so some
>>>>>helpers in tree.c need to be guarded as well.
>>>>>> FWIW I gave a talk at Cauldron 2013 about global state in GCC.  Beware:
>>>>>> it's five years out-of-date, but maybe is still relevant in places?
>>>>>>   https://dmalcolm.fedorapeople.org/gcc/global-state/
>>>>>>   https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html
>>>>>> (I tackled this for libgccjit by instead introducing a mutex, a "big
>>>>>> compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever
>>>>>> thread is calling into the rest of the compiler sources).
>>>>>>
>>>>>> Hope this is helpful
>>>>>> Dave
>>>>>>
>>>>>>
>>
>> [0] https://docs.google.com/document/d/1YkKYI3J-pKFfHLdxljWVoJ89l2oVGibxslIW3ikluz8/edit?usp=sharing

<https://docs.google.com/document/d/1YkKYI3J-pKFfHLdxljWVoJ89l2oVGibxslIW3ikluz8/edit?usp=sharing>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]