[RFC] Getting LTO incremental linking work

Richard Biener rguenther@suse.de
Thu Nov 26 10:15:00 GMT 2015


On Wed, 25 Nov 2015, Jan Hubicka wrote:

> > > 
> > >  1) linker plugin is modified to pass -flinker-output to lto wrapper
> > >     linker-output is either dyn (.so), pie or exec
> > >     for incremental linking I added .rel for 3) and noltorel for 1)
> > > 
> > >     currently it does rel because 3) (nor 2) can not be done when incremnetal
> > >     linking is done on both LTO and non-LTO objects.
> > 
> > That's because the result would be a "fat" object where both pieces
> > would be needed.  Btw, I wonder why you are not running into the
> 
> Yep, we woud end up with both LTO and non-LTO in one object and because
> we have no way to claim just part of it in next linking, the non-LTO will
> be ignored (just as is the case with far objects)
> 
> > same issues as me when producing linker plugin output (the "merged"
> > LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
> > and thus all special-handling of LTO sections is disabled.
> 
> Yep, i just throw in the LTO IL and linker passes it through .
> > 
> > >     In this case linker
> > >     plugin output warings about code quality loss and switch to
> > >     noltorel.
> > >  2) with -flinker-ouptut the lto wrapper behaves same way as with
> > >     -flto-partition=none.
> > >  3) lto frontend parses -flinker-output and sets our internal flags accordingly.
> > >     I added new flag_incremental_linking to inform middle-end about the fact
> > >     that the output is going to be statically linked again.  This disables
> > >     the privatization of hidden symbols and if set to 2 it also triggers
> > >     the LTO IL streaming
> > 
> > I wonder why it behaves like -flto-partition=none in the case it does
> > not need to do LTO IL streaming (which I hope does LTO IL streaming
> > only?  or does this implement fat objects "correctly"?).  Can't
> 
> Yes, I do stream LTO il into assembler file, like normal -flto build would do
> for non-lto1 frontend.  So I produce one .s file that I need assembler to be
> called on.  By default lto-wrapper thinks we do WPA and it would look for list
> of ltrans partitions and execute ltranses that I do not want to happen.
> 
> Since no codegen is done we have no use for ltranses.  It would be nice to spit
> the .o file through simple-object interface.  Sadly we can't do that because
> simple-object won't put the LTO marker symbols in.  Something I want to track
> and drop assembler stage from LTO generaltion in general
> https://gcc.gnu.org/ml/gcc/2014-09/msg00340.html
> 
> Well, one case where WPA would help is production of fat-objects.  Currently
> it works (by compiling the LTO data into assembly again) but it is not done
> in parallel.  I suppose we could deal with this later - it is non-critical.
> My longer term plan is to make WPA parallelization independent of LTO - it
> makes sense when you build one large non-LTO object, too.
> 
> > we still parallelize the build via LTRANS and then incrementally
> > link the result (I suppose the linker will do that for us with the
> > linker plugin outputs already?)?
> > 
> > -flto-partition=none itself isn't more memory intensive than
> > WPA in these days, it's only about compile-time, correct?
> 
> It is.  Just by streaming everything in and out we "compress" the memory layout
> noticeably.  -flto-partition=one has smaller peak than -flto-partition=none.
> But again, here all this triggers with -ffat-objects only.
> > 
> > Your patch means that Andis/HJs work is no longer needed and we can
> > drop the section suffixes again?
> 
> Maybe. It is different implementation of same thing. They can be both used,
> though I suppose real incremental linking is better in longer term than
> section merging.
> > > 
> > > Does anyone see problems with this approach? I think this is easy enough 
> > > and fixes PR67548 so it may still get to mainline?
> > 
> > Yes, it would be a very nice feature to have indeed.
> > 
> > I don't see anything trying to change things with the collect2 path?
> 
> Hmm, with collect2 we don't even support static libraries, do we need to support
> incremental link?  I suppose collect2 can recognize -r and LTO objects and spawn
> the linker same way.
> > 
> > > I need to do more testing, but in general I think the implemntation is OK 
> > > as it is.  We need a way to force noltorel model for testsuite, as the
> > > new default will bypass codegen for all our -r -nostdlib testcases.
> > 
> > Maybe we can turn most of them to -shared?
> 
> Would that work on all targets? (i.e. mingw?).

We do have some testcases using -shared already, they require us to use
PIC flags though AFAICS.  -shared isn't 1:1 equivalent to -r -nostdlib...

> For testing purposes I suppose I will add a flag. It should also silence 
> the linker plugin warning about generating assembly early. -rno-lto 
> perhaps?

what about allowing -flinker-output=XXX at link time as a driver option
and avoiding to override it if already present?

> > >        struct cgraph_node *node = order[i];
> > >  
> > > -      if (node->has_gimple_body_p ())
> > > +      if (gimple_has_body_p (node->decl))
> > 
> > ?
> 
> node->has_gimple_body_p returns true for if gimple body is available, but not neccesarily
> read to memory (in WPA), while gimple_has_body_p returns true only when body is in memory.
> The statement renumbering which is guarded is not needed if we only shuffle the sections
> (and will ICE)
> > > +      /* It would be cool to produce .o file directly, but our current
> > > +	 simple objects does not contain the lto symbol markers.  Go the slow
> > > +	 way through the asm file.  */
> > 
> > We should get away from the symbol markers and instead rely on section
> > names.  Not in this patch of course.
> 
> Yes, we need to get simple-object interface somehow working here.  The symbols
> markers are documented by the LTO specification.  I do not mind that much of changing
> it. 
> For your debug work, I think simple-object will need quite some work to output
> dwarf anyway.  Perhaps something that can be done as part of SoC?

Well, my plan is still to not rely on simple-object but have binutils
fixed for the issues I encounter.

But yes, if we go the simple-object route we need to handle adding
symbols and parsing and rewriting relocations (ugh).  Basically
simple-object needs to handle full partial linking under the
constraint of all relocations being involed not needing resolving
but only (offset) rewriting.

Thus it needs a relocation representation and parsing and generating
code for them as well as the same for symbols.

> > 
> > > +      lang_hooks.lto.begin_section = lhd_begin_section;
> > > +      lang_hooks.lto.append_data = lhd_append_data;
> > > +      lang_hooks.lto.end_section = lhd_end_section;
> > > +      if (flag_ltrans)
> > > +	error ("-flinker-output=rel and -fltrans are mutually exclussive");
> > > +      break;
> > > +
> > > +    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
> > > +      flag_whole_program = 0;
> > > +      flag_incremental_link = 1;
> > > +      break;
> > > +
> > > +    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> > > +      /* On some targets, like i386 it makes sense to build PIC library wihout
> > > +	 -fpic for performance reasons.  So no need to adjust flags.  */
> > > +      break;
> > > +
> > > +    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> > > +      /* If -fPIC or -fPIE was used at compile time, be sure that
> > > +         flag_pie is 2.  */
> > > +      if (!flag_pie && flag_pic)
> > > +	flag_pie = flag_pic;
> > > +      flag_pic = 0;
> > 
> > The code doesn't seem to do what the comment says...
> 
> Hmm, indeed we want flag_pie = MAX (flag_pie, flag_pic)
> 
> > >  enum ld_plugin_status
> > > @@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
> > >  	case LDPT_GOLD_VERSION:
> > >  	  gold_version = p->tv_u.tv_val;
> > >  	  break;
> > > +	case LDPT_LINKER_OUTPUT:
> > > +	  add_linker_output_option (p->tv_u.tv_val);
> > > +	  break;
> > >  	default:
> > >  	  break;
> > >  	}
> > 
> > I wonder what this does to old toolchains using the linker plugin
> > with this change.  I suppose it will fail with an "unknown option"
> > error.
> > 
> > Not sure what to do about this though given the plugin doesn't
> > really know which GCC it is targeting.  An idea would be to
> > spawn another enviroment from the driver like
> > COLLECT_GCC_LTO_WRAPPER_VER=2 and only adding this option if
> > that is present and >= 2?
> 
> I tough every GCC version ships its own linker plugin, so there should
> be no conflicts?

Well, "ships", yes.  But with plugin auto-loading in ld we end up
with a single lto-plugin.so file in the auto-load path and choosing
the "newest" is supposed to work with older GCC as well ...

Of course plugin auto-loading is used for ar and friends which
might not be affected here.  auto-loading isn't important for
ld itself (it won't work without using the GCC driver and the
GCC driver indeed explicitely loads its own plugin).

So maybe it's an on-issue...

Richard.



More information about the Gcc-patches mailing list