[RFC] Getting LTO incremental linking work
Thu Nov 26 10:15:00 GMT 2015
On Wed, 25 Nov 2015, Jan Hubicka wrote:
> > >
> > > 1) linker plugin is modified to pass -flinker-output to lto wrapper
> > > linker-output is either dyn (.so), pie or exec
> > > for incremental linking I added .rel for 3) and noltorel for 1)
> > >
> > > currently it does rel because 3) (nor 2) can not be done when incremnetal
> > > linking is done on both LTO and non-LTO objects.
> > That's because the result would be a "fat" object where both pieces
> > would be needed. Btw, I wonder why you are not running into the
> Yep, we woud end up with both LTO and non-LTO in one object and because
> we have no way to claim just part of it in next linking, the non-LTO will
> be ignored (just as is the case with far objects)
> > same issues as me when producing linker plugin output (the "merged"
> > LTO IL) that is LTO IL. Ah, possibly because the link is incremental,
> > and thus all special-handling of LTO sections is disabled.
> Yep, i just throw in the LTO IL and linker passes it through .
> > > In this case linker
> > > plugin output warings about code quality loss and switch to
> > > noltorel.
> > > 2) with -flinker-ouptut the lto wrapper behaves same way as with
> > > -flto-partition=none.
> > > 3) lto frontend parses -flinker-output and sets our internal flags accordingly.
> > > I added new flag_incremental_linking to inform middle-end about the fact
> > > that the output is going to be statically linked again. This disables
> > > the privatization of hidden symbols and if set to 2 it also triggers
> > > the LTO IL streaming
> > I wonder why it behaves like -flto-partition=none in the case it does
> > not need to do LTO IL streaming (which I hope does LTO IL streaming
> > only? or does this implement fat objects "correctly"?). Can't
> Yes, I do stream LTO il into assembler file, like normal -flto build would do
> for non-lto1 frontend. So I produce one .s file that I need assembler to be
> called on. By default lto-wrapper thinks we do WPA and it would look for list
> of ltrans partitions and execute ltranses that I do not want to happen.
> Since no codegen is done we have no use for ltranses. It would be nice to spit
> the .o file through simple-object interface. Sadly we can't do that because
> simple-object won't put the LTO marker symbols in. Something I want to track
> and drop assembler stage from LTO generaltion in general
> Well, one case where WPA would help is production of fat-objects. Currently
> it works (by compiling the LTO data into assembly again) but it is not done
> in parallel. I suppose we could deal with this later - it is non-critical.
> My longer term plan is to make WPA parallelization independent of LTO - it
> makes sense when you build one large non-LTO object, too.
> > we still parallelize the build via LTRANS and then incrementally
> > link the result (I suppose the linker will do that for us with the
> > linker plugin outputs already?)?
> > -flto-partition=none itself isn't more memory intensive than
> > WPA in these days, it's only about compile-time, correct?
> It is. Just by streaming everything in and out we "compress" the memory layout
> noticeably. -flto-partition=one has smaller peak than -flto-partition=none.
> But again, here all this triggers with -ffat-objects only.
> > Your patch means that Andis/HJs work is no longer needed and we can
> > drop the section suffixes again?
> Maybe. It is different implementation of same thing. They can be both used,
> though I suppose real incremental linking is better in longer term than
> section merging.
> > >
> > > Does anyone see problems with this approach? I think this is easy enough
> > > and fixes PR67548 so it may still get to mainline?
> > Yes, it would be a very nice feature to have indeed.
> > I don't see anything trying to change things with the collect2 path?
> Hmm, with collect2 we don't even support static libraries, do we need to support
> incremental link? I suppose collect2 can recognize -r and LTO objects and spawn
> the linker same way.
> > > I need to do more testing, but in general I think the implemntation is OK
> > > as it is. We need a way to force noltorel model for testsuite, as the
> > > new default will bypass codegen for all our -r -nostdlib testcases.
> > Maybe we can turn most of them to -shared?
> Would that work on all targets? (i.e. mingw?).
We do have some testcases using -shared already, they require us to use
PIC flags though AFAICS. -shared isn't 1:1 equivalent to -r -nostdlib...
> For testing purposes I suppose I will add a flag. It should also silence
> the linker plugin warning about generating assembly early. -rno-lto
what about allowing -flinker-output=XXX at link time as a driver option
and avoiding to override it if already present?
> > > struct cgraph_node *node = order[i];
> > >
> > > - if (node->has_gimple_body_p ())
> > > + if (gimple_has_body_p (node->decl))
> > ?
> node->has_gimple_body_p returns true for if gimple body is available, but not neccesarily
> read to memory (in WPA), while gimple_has_body_p returns true only when body is in memory.
> The statement renumbering which is guarded is not needed if we only shuffle the sections
> (and will ICE)
> > > + /* It would be cool to produce .o file directly, but our current
> > > + simple objects does not contain the lto symbol markers. Go the slow
> > > + way through the asm file. */
> > We should get away from the symbol markers and instead rely on section
> > names. Not in this patch of course.
> Yes, we need to get simple-object interface somehow working here. The symbols
> markers are documented by the LTO specification. I do not mind that much of changing
> For your debug work, I think simple-object will need quite some work to output
> dwarf anyway. Perhaps something that can be done as part of SoC?
Well, my plan is still to not rely on simple-object but have binutils
fixed for the issues I encounter.
But yes, if we go the simple-object route we need to handle adding
symbols and parsing and rewriting relocations (ugh). Basically
simple-object needs to handle full partial linking under the
constraint of all relocations being involed not needing resolving
but only (offset) rewriting.
Thus it needs a relocation representation and parsing and generating
code for them as well as the same for symbols.
> > > + lang_hooks.lto.begin_section = lhd_begin_section;
> > > + lang_hooks.lto.append_data = lhd_append_data;
> > > + lang_hooks.lto.end_section = lhd_end_section;
> > > + if (flag_ltrans)
> > > + error ("-flinker-output=rel and -fltrans are mutually exclussive");
> > > + break;
> > > +
> > > + case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm */
> > > + flag_whole_program = 0;
> > > + flag_incremental_link = 1;
> > > + break;
> > > +
> > > + case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> > > + /* On some targets, like i386 it makes sense to build PIC library wihout
> > > + -fpic for performance reasons. So no need to adjust flags. */
> > > + break;
> > > +
> > > + case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> > > + /* If -fPIC or -fPIE was used at compile time, be sure that
> > > + flag_pie is 2. */
> > > + if (!flag_pie && flag_pic)
> > > + flag_pie = flag_pic;
> > > + flag_pic = 0;
> > The code doesn't seem to do what the comment says...
> Hmm, indeed we want flag_pie = MAX (flag_pie, flag_pic)
> > > enum ld_plugin_status
> > > @@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
> > > case LDPT_GOLD_VERSION:
> > > gold_version = p->tv_u.tv_val;
> > > break;
> > > + case LDPT_LINKER_OUTPUT:
> > > + add_linker_output_option (p->tv_u.tv_val);
> > > + break;
> > > default:
> > > break;
> > > }
> > I wonder what this does to old toolchains using the linker plugin
> > with this change. I suppose it will fail with an "unknown option"
> > error.
> > Not sure what to do about this though given the plugin doesn't
> > really know which GCC it is targeting. An idea would be to
> > spawn another enviroment from the driver like
> > COLLECT_GCC_LTO_WRAPPER_VER=2 and only adding this option if
> > that is present and >= 2?
> I tough every GCC version ships its own linker plugin, so there should
> be no conflicts?
Well, "ships", yes. But with plugin auto-loading in ld we end up
with a single lto-plugin.so file in the auto-load path and choosing
the "newest" is supposed to work with older GCC as well ...
Of course plugin auto-loading is used for ar and friends which
might not be affected here. auto-loading isn't important for
ld itself (it won't work without using the GCC driver and the
GCC driver indeed explicitely loads its own plugin).
So maybe it's an on-issue...
More information about the Gcc-patches