GSOC

Richard Biener rguenther@suse.de
Tue May 7 13:18:00 GMT 2019


On Mon, 6 May 2019, Giuliano Belinassi wrote:

> Hi,
> 
> On 03/29, Richard Biener wrote:
> > On Thu, 28 Mar 2019, Giuliano Belinassi wrote:
> > 
> > > Hi, Richard
> > > 
> > > On 03/28, Richard Biener wrote:
> > > > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> > > > <giuliano.belinassi@usp.br> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On 03/26, Richard Biener wrote:
> > > > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > > > >
> > > > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > > > Greetings All,
> > > > > > > >
> > > > > > > > I would like to take up parallelize compilation using threads or make
> > > > > > > > c++/c
> > > > > > > > memory issues not automatically promote. I did ask about this before
> > > > > > > > but
> > > > > > > > not get a reply. When someone replies I'm just a little concerned as
> > > > > > > > my writing for proposals has never been great so if someone just
> > > > > > > > reviews
> > > > > > > > and doubt checks that's fine.
> > > > > > > >
> > > > > > > > As for the other things building gcc and running the testsuite is
> > > > > > > > fine. Plus
> > > > > > > > I already working on gcc so I've pretty aware of most things and this
> > > > > > > > would
> > > > > > > > be a great steeping stone into more serious gcc development work.
> > > > > > > >
> > > > > > > > If sample code is required that's in mainline gcc I sent out a trial
> > > > > > > > patch
> > > > > > > > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Nick
> > > > > > >
> > > > > > > It's good to see that you've gotten as far as attaching a patch to BZ
> > > > > > > [1]
> > > > > > >
> > > > > > > I think someone was going to attempt the "parallelize compilation using
> > > > > > > threads" idea last year, but then pulled out before the summer; you may
> > > > > > > want to check the archives (or was that you?)
> > > > > >
> > > > > > There's also Giuliano Belinassi who is interested in the same project
> > > > > > (CCed).
> > > > >
> > > > > Yes, I will apply for this project, and I will submit the final version
> > > > > of my proposal by the end of the week.
> > > > >
> > > > > Currently, my target is the `expand_all_functions` routine, as most of
> > > > > the time is spent on it according to the experiments that I performed as
> > > > > part of my Master's research on compiler parallelization.
> > > > > (-O2, --disable-checking)
> > > > 
> > > > Yes, more specifically I think the realistic target is the GIMPLE part
> > > > of   execute_pass_list (cfun, g->get_passes ()->all_passes);  done in
> > > > cgraph_node::expand.  If you look at passes.def you'll see all_passes
> > > > also contains RTL expansion (pass_expand) and the RTL optimization
> > > > queue (pass_rest_of_compilation).  The RTL part isn't a realistic target.
> > > > Without changing the pass hierarchy the obvious part that can be
> > > > handled would be the pass_all_optimizations pass sub-queue of
> > > > all_passes since those are all passes that perform transforms on the
> > > > GIMPLE IL where we have all functions in this state at the same time
> > > > and where no interactions between the functions happen anymore
> > > > and thus functions can be processed in parallel (as much as make
> > > > processes individual translation units in parallel).
> > > > 
> > > 
> > > Great. So if I understood correctly, I will need to split
> > > cgraph_node::expand() into three parts: IPA, GIMPLE and RTL, and then
> > > refactor `expand_all_functions` so that the loop
> > > 
> > >      for (i = new_order_pos - 1; i >= 0; i--)
> > > 
> > >  use these three functions, then partition
> > > 
> > >      g->get_passes()->all_passes
> > > 
> > > into get_passes()->gimple_passes and get_passes()->rtl_passes, so I
> > > can run RTL after GIMPLE is finished, to finally start the
> > > paralellization of per function GIMPLE passes.
> > 
> > Yes, it involves refactoring of the loop - you may notice that
> > parts of the compilation pipeline are under control of the
> > pass manager (passes.c) but some is still manually driven
> > by symbol_table::compile.  Whether it's more convenient to
> > get more control stuffed to the pass manager and perform the
> > threading under its control (I'd say that would be the cleaner
> > design) or to try do this in the current ad-hoc parts remains
> > to be seen.  You can see symbol_table::compile hands over
> > control to the pass manager multiple times, first ipa_passes ()
> > then all_late_ipa_passes and finally the expand_all_functions code.
> > 
> > I guess it would simplify things if you'd split pass_all_passes
> > in passes.def at pass_expand like so:
> > 
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 2fcd80e53a3..bb0453b36a7 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -403,11 +403,10 @@ along with GCC; see the file COPYING3.  If not see
> >    NEXT_PASS (pass_spectrev1);
> >    NEXT_PASS (pass_warn_function_noreturn);
> >    NEXT_PASS (pass_gen_hsail);
> > +  TERMINATE_PASS_LIST (all_passes)
> >  
> > -  NEXT_PASS (pass_expand);
> > -
> > -  NEXT_PASS (pass_rest_of_compilation);
> > -  PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation)
> > +  INSERT_PASSES_AFTER (pass_rest_of_compilation)
> > +      NEXT_PASS (pass_expand);
> >        NEXT_PASS (pass_instantiate_virtual_regs);
> >        NEXT_PASS (pass_into_cfg_layout_mode);
> >        NEXT_PASS (pass_jump);
> > @@ -505,6 +504,5 @@ along with GCC; see the file COPYING3.  If not see
> >           NEXT_PASS (pass_final);
> >        POP_INSERT_PASSES ()
> >        NEXT_PASS (pass_df_finish);
> > -  POP_INSERT_PASSES ()
> >    NEXT_PASS (pass_clean_state);
> > -  TERMINATE_PASS_LIST (all_passes)
> > +  TERMINATE_PASS_LIST (pass_rest_of_compilation)
> > 
> > where to make things "work" again w/o threading you'd invoke
> > execute_pass_list (cfun, g->get_passes ()->pass_rest_of_compilation)
> > right after the all_passes invocation in cgraph_node::expand.
> > 
> > You then can refactor things so the loop over the 'order' array
> > is done twice, once over all_passes (the set you then parallelize)
> > and once over pass_rest_of_compilation (which you can't parallelize
> > because of being in RTL).
> >
> 
> I managed to get it working today. However, I found an issue with the
> statistics_fini_pass() and pass_init_dump_file(), which I had to
> comment, and force a `return false` for every case, respectively. Then I
> managed to compile some programs correctly with -O2. I have no idea why
> yet, but I will keep searching. I've attached my patch here.

It may be that you need to adjust the GCC_PASS_LISTS define in
pass_manager.h, changing pass_rest_of_compilation to a pass list
and also remove its "old" definition in passes.c.  Or it might
be simpler to not re-use pass_rest_of_compilation but wrap
the tail in a new all_passes2 or so.  You'll also see
a call to register_dump_files (all_passes) in passes.c where
you probably need to do the same for the new tail.

As usual grep is your best friend when figuring out what to do
(doing that right now myself).

Richard.

> 
> 
> > The above patch needs more changes in pass manager code - a chance
> > to dive into it a little since that's where you'd change code.
> > 
> > > > To simplify the taks further a useful constraint is to not have
> > > > a single optimization pass executed multiple times at the same time
> > > > (otherwise you have to look at pass specific global states as well),
> > > > thus the parallel part could be coded in a way keeping per function
> > > > the state of what pass to execute next and have a scheduler pick
> > > > a function its next pass is "free", scheduling that to a fixed set of
> > > > worker threads.  There's no dependences between functions
> > > > for the scheduling but each pass has only one execution resource
> > > > in the pipeline.  You can start processing an arbitrarily large number
> > > > of functions but slow functions will keep others from advancing across
> > > > the pass it executes on.
> > > >
> > > 
> > > Something like a pipeline? That is certainly a start, but if one pass is
> > > very slow wouldn't it bottleneck everything?
> > 
> > Yes, something like a pipeline.  It's true a slow pass would
> > bottleneck things - as said, we can selectively make passes
> > thread safe in such cases.
> > 
> > > > Passes could of course be individually marked as thread-safe
> > > > (multiple instances execute concurrently).
> > > > 
> > > > Garbage collection is already in control of the pass manager which
> > > > would also be the thread scheduler.  For GC the remaining issue
> > > > is allocation which passes occasionally do.  Locking is the short
> > > > term solution for GSoC I guess, long-term per-thread GC pools
> > > > might be better (to not slow down non-threaded parts of the compiler).
> > > > 
> > > > Richard.
> > > > 
> > > > >
> > > > > Thank you,
> > > > > Giuliano.
> > > > >
> > > > >
> > > > > >
> > > > > > > IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - but
> > > > > > > I don't know if he's still interested/able to spare the cycles.
> > > > > >
> > > > > > I've offered mentoring to Giuliano, so yes.
> > > > > >
> > > > > > > That said, the parallel compilation one strikes me as very ambitious;
> > > > > > > it's not clear to me what could realistically be done as a GSoC
> > > > > > > project.  I think a good proposal on that would come up with some
> > > > > > > subset of the problem that's doable over a summer, whilst also being
> > > > > > > useful to the project.  The RTL infrastructure has a lot of global
> > > > > > > state, so maybe either focus on the gimple passes, or on fixing global
> > > > > > > state on the RTL side?  (I'm not sure)
> > > > > >
> > > > > > That was the original intent for the experiment.  There's also
> > > > > > the already somewhat parallel WPA stage in LTO compilation mode
> > > > > > (but it simply forks for the sake of simplicity...).
> > > > > >
> > > > > > > Or maybe a project to be more
> > > > > > > explicit about regions of the code that assume that the garbage-
> > > > > > > collector can't run within them?[3] (since the GC is state that would
> > > > > > > be shared by the threads).
> > > > > >
> > > > > > The GC will be one obstackle.  The original idea was to drive
> > > > > > parallelization on the pass level by the pass manager for the
> > > > > > GIMPLE passes, so serialization points would be in it.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Hope this is constructive/helpful
> > > > > > > Dave
> > > > > > >
> > > > > > > [1] though typically our workflow involved sending patches to the gcc-
> > > > > > > patches mailing list
> > > > > > > [2] as libgccjit maintainer I have an interest in global state within
> > > > > > > the compiler
> > > > > > > [3] I posted some ideas about this back in 2013 IIRC; probably
> > > > > > > massively bit-rotted since then.  I also gave a talk at Cauldron 2013
> > > > > > > about global state in the compiler (with a view to gcc-as-a-shared-
> > > > > > > library); likewise I expect much of the ideas there to be out-of-date);
> > > > > > > for libgccjit I went with a different approach
> > > 
> > > Thank you,
> > > Giuliano.
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)


More information about the Gcc mailing list