[RFC] Getting LTO incremental linking work

Jan Hubicka hubicka@ucw.cz
Wed Nov 25 18:54:00 GMT 2015

> > 
> >  1) linker plugin is modified to pass -flinker-output to lto wrapper
> >     linker-output is either dyn (.so), pie or exec
> >     for incremental linking I added .rel for 3) and noltorel for 1)
> > 
> >     currently it does rel because 3) (nor 2) can not be done when incremnetal
> >     linking is done on both LTO and non-LTO objects.
> That's because the result would be a "fat" object where both pieces
> would be needed.  Btw, I wonder why you are not running into the

Yep, we woud end up with both LTO and non-LTO in one object and because
we have no way to claim just part of it in next linking, the non-LTO will
be ignored (just as is the case with far objects)

> same issues as me when producing linker plugin output (the "merged"
> LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
> and thus all special-handling of LTO sections is disabled.

Yep, i just throw in the LTO IL and linker passes it through .
> >     In this case linker
> >     plugin output warings about code quality loss and switch to
> >     noltorel.
> >  2) with -flinker-ouptut the lto wrapper behaves same way as with
> >     -flto-partition=none.
> >  3) lto frontend parses -flinker-output and sets our internal flags accordingly.
> >     I added new flag_incremental_linking to inform middle-end about the fact
> >     that the output is going to be statically linked again.  This disables
> >     the privatization of hidden symbols and if set to 2 it also triggers
> >     the LTO IL streaming
> I wonder why it behaves like -flto-partition=none in the case it does
> not need to do LTO IL streaming (which I hope does LTO IL streaming
> only?  or does this implement fat objects "correctly"?).  Can't

Yes, I do stream LTO il into assembler file, like normal -flto build would do
for non-lto1 frontend.  So I produce one .s file that I need assembler to be
called on.  By default lto-wrapper thinks we do WPA and it would look for list
of ltrans partitions and execute ltranses that I do not want to happen.

Since no codegen is done we have no use for ltranses.  It would be nice to spit
the .o file through simple-object interface.  Sadly we can't do that because
simple-object won't put the LTO marker symbols in.  Something I want to track
and drop assembler stage from LTO generaltion in general

Well, one case where WPA would help is production of fat-objects.  Currently
it works (by compiling the LTO data into assembly again) but it is not done
in parallel.  I suppose we could deal with this later - it is non-critical.
My longer term plan is to make WPA parallelization independent of LTO - it
makes sense when you build one large non-LTO object, too.

> we still parallelize the build via LTRANS and then incrementally
> link the result (I suppose the linker will do that for us with the
> linker plugin outputs already?)?
> -flto-partition=none itself isn't more memory intensive than
> WPA in these days, it's only about compile-time, correct?

It is.  Just by streaming everything in and out we "compress" the memory layout
noticeably.  -flto-partition=one has smaller peak than -flto-partition=none.
But again, here all this triggers with -ffat-objects only.
> Your patch means that Andis/HJs work is no longer needed and we can
> drop the section suffixes again?

Maybe. It is different implementation of same thing. They can be both used,
though I suppose real incremental linking is better in longer term than
section merging.
> > 
> > Does anyone see problems with this approach? I think this is easy enough 
> > and fixes PR67548 so it may still get to mainline?
> Yes, it would be a very nice feature to have indeed.
> I don't see anything trying to change things with the collect2 path?

Hmm, with collect2 we don't even support static libraries, do we need to support
incremental link?  I suppose collect2 can recognize -r and LTO objects and spawn
the linker same way.
> > I need to do more testing, but in general I think the implemntation is OK 
> > as it is.  We need a way to force noltorel model for testsuite, as the
> > new default will bypass codegen for all our -r -nostdlib testcases.
> Maybe we can turn most of them to -shared?

Would that work on all targets? (i.e. mingw?).
For testing purposes I suppose I will add a flag. It should also silence the linker
plugin warning about generating assembly early. -rno-lto perhaps?
> >        struct cgraph_node *node = order[i];
> >  
> > -      if (node->has_gimple_body_p ())
> > +      if (gimple_has_body_p (node->decl))
> ?

node->has_gimple_body_p returns true for if gimple body is available, but not neccesarily
read to memory (in WPA), while gimple_has_body_p returns true only when body is in memory.
The statement renumbering which is guarded is not needed if we only shuffle the sections
(and will ICE)
> > +      /* It would be cool to produce .o file directly, but our current
> > +	 simple objects does not contain the lto symbol markers.  Go the slow
> > +	 way through the asm file.  */
> We should get away from the symbol markers and instead rely on section
> names.  Not in this patch of course.

Yes, we need to get simple-object interface somehow working here.  The symbols
markers are documented by the LTO specification.  I do not mind that much of changing
For your debug work, I think simple-object will need quite some work to output
dwarf anyway.  Perhaps something that can be done as part of SoC?
> > +      lang_hooks.lto.begin_section = lhd_begin_section;
> > +      lang_hooks.lto.append_data = lhd_append_data;
> > +      lang_hooks.lto.end_section = lhd_end_section;
> > +      if (flag_ltrans)
> > +	error ("-flinker-output=rel and -fltrans are mutually exclussive");
> > +      break;
> > +
> > +    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
> > +      flag_whole_program = 0;
> > +      flag_incremental_link = 1;
> > +      break;
> > +
> > +    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> > +      /* On some targets, like i386 it makes sense to build PIC library wihout
> > +	 -fpic for performance reasons.  So no need to adjust flags.  */
> > +      break;
> > +
> > +    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> > +      /* If -fPIC or -fPIE was used at compile time, be sure that
> > +         flag_pie is 2.  */
> > +      if (!flag_pie && flag_pic)
> > +	flag_pie = flag_pic;
> > +      flag_pic = 0;
> The code doesn't seem to do what the comment says...

Hmm, indeed we want flag_pie = MAX (flag_pie, flag_pic)

> >  enum ld_plugin_status
> > @@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
> >  	  gold_version = p->tv_u.tv_val;
> >  	  break;
> > +	  add_linker_output_option (p->tv_u.tv_val);
> > +	  break;
> >  	default:
> >  	  break;
> >  	}
> I wonder what this does to old toolchains using the linker plugin
> with this change.  I suppose it will fail with an "unknown option"
> error.
> Not sure what to do about this though given the plugin doesn't
> really know which GCC it is targeting.  An idea would be to
> spawn another enviroment from the driver like
> COLLECT_GCC_LTO_WRAPPER_VER=2 and only adding this option if
> that is present and >= 2?

I tough every GCC version ships its own linker plugin, so there should
be no conflicts?
> I don't think LTO will work properly when you invoke ld directly
> as lto-wrapper expects COLLECT_GCC[_OPTIONS] to be set.

I will look into this incrementally. Linker plugin should be able to execute
GCC itself. We do not need to pass any options around and all we need to know
is where to find the wrapper.

> Otherwise the patch looks straight-forward to me...
> Thanks,
> Richard.

More information about the Gcc-patches mailing list