This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LTO inlining of transactional builtins



a) If a user provides a builtin implementation to LTO, it is discarded,
since by design LTO prefers builtins to user-provided versions of them.  In
LTO, builtins are their own prevailing decl.  There is an enhancement
request PR here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997

It definitely should be the other way around and builtins should get their proper entry in the now existent symbol table.

Well, the way we stream builtin decls as special cases is indeed weird. I recall I once tried to remove that code and it lead to some regressions, but in general it should no tbe neccesary.

Yes, we seem to special-case builtins all over the place. I have a kludge disabling this, just to work on (b) below.



b) LTO streaming happens before TMMARK.  Since the TMMARK pass is the one
that instruments memory operations into __builtin_ITM_* calls, even if (a)
was fixed, LTRANS would have nothing to inline.

Which also means that this has nothing to do with LTO per-se, just that you'd need LTO to see the bodies of the "builtins". Use a small C testcase where you provide the implementation of one of the builtins (well, the one you end up using) and face the same issue.

Do I understand correctly that inlining the builtin at expansion time is not
good because the implementation detail may depend on how libitm was
configured?

Unfortunately, the tmmark pass can't be moved earlier, because the point is
to delay its work so memory and loop optimizations can do its thing before
memory operations are irreconcilably transformed into function calls.

This is the main problem however. As Richi pointed out, even in C this won't work. We decide inlining at WPA time and since then no inlining is possible and all unreachable functions are removed. So when you invent new calls to builtins on the way you can't expect them to be resonably inlinined.

Yes, I have been playing with marking any such provided builtins with cgraph_mark_force_output_node() in the IPA-tm pass. I assume that anyone linking with implementations of the TM builtins must either want them inlined, or want them in the final link. But your idea of a new inline attribute is cleaner and far more generic.



Also you would not have the TM builtin bodies available in your ltrans unit
because nothing calls them.  So anything that requires LTO (to see the
bodies in the first place) but does not expose the calls before LTO bytecode
output is not going to work.

Marking with cgraph_mark_force_output_node() in the IPA-tm pass fixes this.


Well, only way I see here is to

a) have special purpose local inlining pass to handle these newly born bultins.
Basically you can re-purpose early inliner for this and run it after your pass
(and we can generalize the machinery for other kind of beasts if needed)
The early inliner fits better for this than late inliner.

Yes, this is what I've been doing, but I paused for yall's input when I had to either rematerialize the gimple bodies, or keep the gimple optimizations from removing them as each function got compiled.



b) introduce new kind of functions that are those builtins. You need Sort of combination of always_inline, extern and used attributes but not quite. The new kind of function must 1) make partitioner to ship the functions into every partition, 2) make unreachable function removal to not remove them even if they seem useless, 3) make code generation to never produce offline copies of them even if they are not removed by the unreachable function pass. 4) make the final check happy that this type of function may be kept in memory till end of compilation.

If this seems neccesary I can implement this for you, but I am always hesitant
to add a new type of function into the machinery - we already face the complexity
of having quite few of them.

I would be delighted if you could work on this, if you think a more general solution to just forcing the node to be outputted is necessary. But first let's get rth's input, because I'm still unsure whether the payoff for inlining so late is sufficient to merit all this work.


Aldy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]