This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc-c-api


On Tue, 2012-09-11 at 11:17 +0200, Richard Guenther wrote:

Sorry for the belated response; various comments inline throughout...

> On Tue, Sep 11, 2012 at 3:10 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> > On Mon, 2012-09-10 at 17:20 +0200, Michael Matz wrote:
> >> Hi David,
> >>
> >> On Mon, 10 Sep 2012, David Malcolm wrote:
> >>
> >> > Is it possible for you to post your work-in-progress code somewhere?
> >>
> >> Attached.
> >
> > Many thanks for posting this!  Various comments inline below.
> >
> >> > I know that you don't feel it's ready for committing, but I would find
> >> > it helpful - I'm interested in understanding the general approach,
> >> > rather than seeing completeness or perfection.
> >>
> >> Some sort of brain dump follows:
> >>
> >> The idea is as follows: as first cut an introspection API that is tied to
> >> compiler IR concepts rather than GCC specifics.  As such it should be
> >> implementable also for other compilers, at least the trivial things that
> >> every traditional compiler will have.  So, we have functions, basic
> >> blocks, instructions, operands and operators.  Nothing of that should
> >> relate to tree or gimple or RTL.
> >
> > I see.  So there's a terminology issue here: we shouldn't refer to
> > "gimple" or "rtl", we should refer to "instructions" or "statements".
> >
> > [Possibly crazy idea: should the API actually refer to itself as GCC?
> > (with "gcc_" prefixes etc) If it's implementable by other compilers,
> > would another prefix be suitable?  I don't think this is a good idea,
> > but it makes for an interesting thought experiment]
> 
> I think the API shouldn't refer to GCC itself - in fact I was hoping
> that someone
> implemented the very same API for LLVM or Open64.  At least introspection
> should be compiler agnostic (in my tiny ideal world ;)).

I rather regret mentioning my "possibly crazy idea" - what I really need
is a stable API/ABI for a shim layer that hides differences between GCC
versions.  I worry that too much abstraction will prevent my plugin from
getting its job done.  Or, to put it another way, I'd rather have a good
API for talking to GCC than a mediocre API for talking to any compiler.

How many other plugin authors are out there, and what do they need?  I
may be in a strange position in that my python plugin has the direct
objective of exposing gcc's internals (albeit in an easier-to-use form
that C/C++), so my perspective may be skewed here.


> I also think that we can easily backport plugin API changes (read: additions,
> the API of course never "changes") to active release branches, so plugins
> using this API should run against all released GCC versions (for additions
> the API needs a way to identify its "version" though) and other compilers
> without re-compiling the plugin itself.

That would be useful, though do you mean 4.8 and 4.7, or do you also
include 4.6?  Would the initial creation of the plugin API count as "a
plugin API change", or is that stretching the meaning of your words too
far? :)


> >> Take for instance the (included) dump-plugin.  The goal would be, that
> >> depending on where you'd put that dumper in the pass pipeline it would
> >> work _unchanged_ on GENERIC, on GIMPLE and on RTL.  That goal isn't
> >> reached yet, once because the internal iteration just isn't
> >> implemented for e.g. RTL instruction stream, and once because the operand
> >> iterator API isn't well suited to the tree-like nesting in GENERIC and
> >> RTL currently.
> >
> > Interesting idea.
> >
> > I prefer having a little more type-safety, but it's a pain to achieve it
> > in C.  If you look at my proposed API [1], there are dozens of tiny
> > casting functions.  I like that it's typesafe, but it's somewhat
> > inelegant.
> 
> I suppose one could wrap a more type-safe C++ interface around the
> C API (well, or simply wrap a nice python API around it ;)).
> 
> >> [The intermediate goal was to redo the operand API to be tree-like at the
> >> base, and possibly write small wrappers to again expose the nicer
> >> interface that GIMPLE would provide (i.e. direct access to all read
> >> operands of an instruction).]
> >>
> >> Another thing I want is simplicity.  E.g. only the bare minimum of types
> >> should be exposed.  Note how the API itself for instance doesn't expose
> >> different types of collections, only a general Range which can enumerate
> >> all things, depending on how it's used (though the implementation has
> >> runtime checks for wrong usage).
> >
> > I seem to remember from earlier mailing list discussion there being a
> > preference for explicit iterator objects, rather than for_each functions
> > taking callbacks (my API uses the latter approach).
> 
> Yes (I didn't look into Michas patch ... but I believe it uses iterators).

It does.

> > FWIW I don't expose any iterators directly in my plugin, I simply
> > generate a list of wrapper objects and return that.  But I suppose
> > others might.
> >
> >> There are some questions to be solved, e.g. memory management for those
> >> objects that aren't directly tied to GCC objects, e.g. Ranges right now.
> >> I do have a strong feeling about the relation of e.g. plugin Instructions
> >> and GCC gimple/rtx, in the sense that plugin authors should _not_ be
> >> required to manage memory for those things (same for BBs, functions,
> >> operands).
> >
> > I'm not sure how to parse what you wrote above, so I'm not quite sure
> > what your preference here is.  I see that you have range-creation
> > functions (e.g. "gcc_stmts"), which return a Range that's owned by the
> > caller, together with a cleanup function ("gcc_free_range") that must be
> > called exactly once assuming the Range was successfully created.  The
> > other entities (e.g. "Function") are in fact "really" just gcc structs
> > internally (e.g.
> >    "Function" = "struct Function_*" = "cast to (struct function *)" )
> > and those are GC-managed.
> 
> Yes, they are "handlers" that can be passed/returned by value and thus
> need no memory management.  That they relate to the internal GC pointer
> is an implementation detail.
> 
> > Currently in my proposed API I assume that all objects are GC-managed,
> > and that the user is required to register a callback hook to mark any
> > objects that they're referring to.  Does that seem like a sane strategy?
> > If so, would if make sense to make the iterators be GC-managed also?
> > I'm not sure about this, but it seems worth discussing.  It strikes me
> > that it's a common use-case to use an iterator within code, and it would
> > be a pain to have to track stack frames explicitly as GC roots, so
> > having iterators have their own lifetime-management may make that
> > simpler.  (Awkward questions: what happens if a GC collection happens
> > during a loop?  does the iterator hold a reference on the current
> > element? etc)
> 
> I think relying on GC is bad if you think of re-using the same API for
> other compilers - it puts restrictions on the API implementation side.
> If, then I would prefer explicit reference counting ... but that can be tedious
> to use (again, easily abstracted with a C++ interface ontop).
> 
> > [FWIW In my plugin I explicitly keep track of all of my live wrapper
> > objects in a linked list, so that when the GCC GC runs I can simply walk
> > the list and call the appropriate marking function on the underlying GCC
> > objects, so that they don't get swept away from under me]
> 
> There need to be rules about what state plugins can keep "live" across
> invocations of their per-function hook or other hook they are being invoked on.
> Simple rule: nothing ;)

I believe that such a rule is overly restrictive.  For example,
gcc-python-plugin supports creating custom attributes:
  http://gcc-python-plugin.readthedocs.org/en/latest/attributes.html
which leads to a callback being invoked by the parser when it encounters
various declarations marked with the attribute.  At this point it's
natural to want to store the declarations somewhere for use by a later
pass.  For example, I use this to mark functions with non-standard
reference-counting behavior in my reference-count checker:
http://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html#marking-functions-that-return-borrowed-references

Similarly consider the case of passes that gather per-function data,
then use that information within an interprocedural pass.  The extra
data needs to be either be stored in a new field on the functions
(where?), or in a mapping from function->data, at which point you need
the keys to refer to FUNCTION_DECL instances, and thus need to be able
to store a reference to them, hence participate in the GC and mark them
as needed.

I'd prefer the assumption that the underlying API does garbage
collection, given that this is an API to GCC which does garbage
collection.  (I'm assuming that GCC intends to continue using GC for
memory-management).

Alternatively to explicit GC and to explicit reference counting, there's
the acquire/release model, where the client has ownership of a handle to
the underlying object and must explicitly release the handle, but there
may be multiple handles to the same object.

To consider a variant of your question: what is the state that plugins
can keep "live" *within* an invocation of their per-function hook or
other hook: what happens if the garbage collector runs during such a
hook?  Or is the collector disabled at entry to the hook and reenabled
upon return from the plugin?

> >> There are also other things missing: e.g. operators.  The current thing
> >> doesn't have access to the TREE_CODE/gimple_expr_code/RTX_CODE, and hence
> >> can't differ between an add and a mul.  Obviously that's less than
> >> optimal.  But the codes exposed for the plugin should have no relation to
> >> GCC codes, but again be general concepts.  So, externally the plugin would
> >> export codes like "GCC_ADD/SUB/MUL", whose enum values will remain stable
> >> forever, and internally they're mapped from the tree/rtx codes (that
> >> mapping can changes as we add/remove some enum values from those).
> >
> > Right.  Currently I have a huge number of "subclasses" (implemented via
> > my inheritance-in-C hack), but having some new enums instead would make
> > more sense: your approach is much simpler.
> >
> >> Also types aren't included.  Though some things would be obvious: a
> >> gcc_type (Operand op) function, and some accessors like "gcc_arithmetic_p
> >> (Type)", "gcc_width (Type)", "gcc_unsigned_p (Type)" and so on.
> >
> > Yes.
> >
> > Some naming bikesheddery: I prefer a function name to identify what it
> > works on.  So I'd prefer "gcc_type_width(Type)" to "gcc_width(Type)";
> > similarly, I'd prefer "gcc_block_stmts(Block)" to "gcc_stmts(Block)".
> > For that matter I prefer "get", "iter", and "is", giving:
> >    int gcc_type_get_width(Type t);
> >
> >    Range gcc_block_iter_stmts(Block b);
> >
> >    bool gcc_type_is_unsigned(Type t);
> >
> > I think the improved clarity outweighs the extra typing.
> >
> > I notice that you capitalized the types, and that they don't have a
> > prefix.  I'm nervous about naming collisions in such a scheme.  In my
> > proposed API the types have names of the form "gcc_basic_block" and
> > "gcc_type".  (I got the impression from earlier discussions that the GNU
> > coding conventions eschew capitalization - but that could be because I
> > leapt in with my CamelCaseCraziness in an earlier thread [2]).
> >
> > Another naming nitpick: I've deliberately been thinking of this as a
> > "gcc api" rather than a "gcc plugin api".  My hope is that eventually
> > (and I know that this is long way off) that GCC will be embeddable and
> > suitable for use for JIT compilation, given that this seems to be the
> > one area where I find myself needing LLVM.
> 
> The primary goal of the C API should be simplicity and easy re-targeting.
> I would expect that most introspection plugin writers would use a nicer
> interface like python or C++.  Those can also provide high level functionality
> that builds upon pieces of the C API.
> 
> >> Obviously most useful accessors to the individual objects are missing.
> >> Those accessors again should be fairly unrelated to GCC specific concepts,
> >> but I would envision things like "gcc_volatile_p (Instruction i)" or
> >> perhaps "gcc_reg_p (Operand)".
> >>
> >> > In particular, as the maintainer of the gcc-python-plugin I want
> >> > something that will make my life easier.
> >>
> >> Well, the current version 0.0 certainly will not make your life easier.
> >> It misses almost everything ;)
> >
> > :)
> >
> >> But my goal was to set a ground interface
> >> that I would be pleased to work with as plugin author, _and_ that is
> >> easily maintainable by GCC authors (in a way this is actually more
> >> important), with the hope that actual plugin authors would extend it
> >> according to above principles as they need.
> >
> > Looking at your work, I'm thinking it might use it as inspiration to
> > rework the API I've written to be more like yours in some places.  I
> > think my work covers more ground than yours does, but it sounds like
> > some of the approaches you're taking are more likely to be acceptable by
> > the rest of the GCC maintainers.
> 
> Certainly a non-modifying API is a no-brainer and way easier to "design".
> 
> I suspect we will try to develop a instrumentation-kind modifying API as
> well (like, add a global variable, add an instruction that increments
> that, etc.).
> At some point you will run into the issue that such API will be no longer
> re-targetable as it (necessarily) exposes GCC implementation details.  At
> that point I will say - well, do not write a plugin, write a piece of
> GCC itself!
> (people of course may choose to disagree with me here)

To give some context, I'm primarily interested in static analysis: I'm
writing one-off passes that may only be of interest to an individual
project, or may slow down the compiler by an order-of-magnitude or
worse.  It's acceptable to do that if the analysis results are useful
enough, but I wouldn't expect such a patch to be merged into the main
GCC project.

[The python plugin doesn't allow any modification yet, but it should be
possible to do so if someone has a compelling use-case.]

[Though embedding Python inside GCC also allows for lots of fun side
projects: I've successfully embedded Django within gcc, giving gcc a
"serve-http" pass, where it stops compiling and instead serves http on a
port (potentially exposing the internal representation as dynamic HTML);
it never returns from this pass, serving HTTP until it's killed]


> >> > By that I mean: something that will make it easier to keep my plugin
> >> > compatible against "as many gcc versions as possible" (which currently
> >> > means 4.6 and 4.7, but I want to add 4.8 and so on), and minimize the
> >> > amount of recompiling I have to do over ABI issues within a GCC release.
> >>
> >> Yes, that's definitely the goal of my approach.  Few opaque types, few
> >> high level accessors, never to be changed again (ABI wise) in the future,
> >
> > Sounds good.
> >
> >>
> >> > My plugin is currently implemented in C (requiring C++ would be a pain
> >> > for me, but would be doable, I think).
> >>
> >> ... and the API C only, yes, definitely.
> >>
> >> I hope the above made some of the principles clear, even though the actual
> >> implementation is terribly lacking in features.
> >
> > I think so (with some clarifications as noted above).
> >
> > Is it worth me trying to take my proposed API in the direction you've
> > described, and trying to meet in the middle?  I have my static analysis
> > code working on top of my API, so I hope it will be relatively easy to
> > gradually tweak it to more closely match the desired API.
> 
> What I'd like to see for 4.8 is the core introspection plugin API be included.
> We will declare it BETA, which means we will likely turn it upside-down for
> 4.9 after getting feedback from users and re-targeters (I really hope someone
> will try ;)).  For 4.9 the goal would be to have a forever(!) stable API which
> we can then backport.
> 
> Input and help from everybody is appreciated, I'd say we can even start with
> committing the current state to trunk and do development in the public.
By "current state", are you referring to Michael's work, or to mine? (or
to whoever gets a patch in first? :) )

> From a RM perspective the plugin API work is really non-interesting (apart from
> that it should not break GCC build), so you can continue changing/enhancing
> the API throughout stage3 at least (which means at least 4 months from now).

Hope this is helpful
Dave


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]