This is the mail archive of the
mailing list for the GCC project.
On Wed, Sep 19, 2012 at 11:32 PM, David Malcolm <firstname.lastname@example.org> wrote:
> On Tue, 2012-09-11 at 11:17 +0200, Richard Guenther wrote:
> Sorry for the belated response; various comments inline throughout...
>> On Tue, Sep 11, 2012 at 3:10 AM, David Malcolm <email@example.com> wrote:
>> > On Mon, 2012-09-10 at 17:20 +0200, Michael Matz wrote:
>> >> Hi David,
>> >> On Mon, 10 Sep 2012, David Malcolm wrote:
>> >> > Is it possible for you to post your work-in-progress code somewhere?
>> >> Attached.
>> > Many thanks for posting this! Various comments inline below.
>> >> > I know that you don't feel it's ready for committing, but I would find
>> >> > it helpful - I'm interested in understanding the general approach,
>> >> > rather than seeing completeness or perfection.
>> >> Some sort of brain dump follows:
>> >> The idea is as follows: as first cut an introspection API that is tied to
>> >> compiler IR concepts rather than GCC specifics. As such it should be
>> >> implementable also for other compilers, at least the trivial things that
>> >> every traditional compiler will have. So, we have functions, basic
>> >> blocks, instructions, operands and operators. Nothing of that should
>> >> relate to tree or gimple or RTL.
>> > I see. So there's a terminology issue here: we shouldn't refer to
>> > "gimple" or "rtl", we should refer to "instructions" or "statements".
>> > [Possibly crazy idea: should the API actually refer to itself as GCC?
>> > (with "gcc_" prefixes etc) If it's implementable by other compilers,
>> > would another prefix be suitable? I don't think this is a good idea,
>> > but it makes for an interesting thought experiment]
>> I think the API shouldn't refer to GCC itself - in fact I was hoping
>> that someone
>> implemented the very same API for LLVM or Open64. At least introspection
>> should be compiler agnostic (in my tiny ideal world ;)).
> I rather regret mentioning my "possibly crazy idea" - what I really need
> is a stable API/ABI for a shim layer that hides differences between GCC
> versions. I worry that too much abstraction will prevent my plugin from
> getting its job done. Or, to put it another way, I'd rather have a good
> API for talking to GCC than a mediocre API for talking to any compiler.
> How many other plugin authors are out there, and what do they need? I
> may be in a strange position in that my python plugin has the direct
> objective of exposing gcc's internals (albeit in an easier-to-use form
> that C/C++), so my perspective may be skewed here.
I see. Still I'd like us to start from this high-level perspective - we can add
GCC specific things later. Also I'm pretty confident that basic introspection
(thus iterating over sth like a callgraph, a CFG or statements and its operands)
should be compiler independent.
>> I also think that we can easily backport plugin API changes (read: additions,
>> the API of course never "changes") to active release branches, so plugins
>> using this API should run against all released GCC versions (for additions
>> the API needs a way to identify its "version" though) and other compilers
>> without re-compiling the plugin itself.
> That would be useful, though do you mean 4.8 and 4.7, or do you also
> include 4.6? Would the initial creation of the plugin API count as "a
> plugin API change", or is that stretching the meaning of your words too
> far? :)
I'm not against adding it to 4.6.x if 4.6 is still maintained when we decide
the API is sufficiently stable (but I guess at that point we will have 4.7.x,
4.8.x in maintainance and 4.9 in development).
>> >> Take for instance the (included) dump-plugin. The goal would be, that
>> >> depending on where you'd put that dumper in the pass pipeline it would
>> >> work _unchanged_ on GENERIC, on GIMPLE and on RTL. That goal isn't
>> >> reached yet, once because the internal iteration just isn't
>> >> implemented for e.g. RTL instruction stream, and once because the operand
>> >> iterator API isn't well suited to the tree-like nesting in GENERIC and
>> >> RTL currently.
>> > Interesting idea.
>> > I prefer having a little more type-safety, but it's a pain to achieve it
>> > in C. If you look at my proposed API , there are dozens of tiny
>> > casting functions. I like that it's typesafe, but it's somewhat
>> > inelegant.
>> I suppose one could wrap a more type-safe C++ interface around the
>> C API (well, or simply wrap a nice python API around it ;)).
>> >> [The intermediate goal was to redo the operand API to be tree-like at the
>> >> base, and possibly write small wrappers to again expose the nicer
>> >> interface that GIMPLE would provide (i.e. direct access to all read
>> >> operands of an instruction).]
>> >> Another thing I want is simplicity. E.g. only the bare minimum of types
>> >> should be exposed. Note how the API itself for instance doesn't expose
>> >> different types of collections, only a general Range which can enumerate
>> >> all things, depending on how it's used (though the implementation has
>> >> runtime checks for wrong usage).
>> > I seem to remember from earlier mailing list discussion there being a
>> > preference for explicit iterator objects, rather than for_each functions
>> > taking callbacks (my API uses the latter approach).
>> Yes (I didn't look into Michas patch ... but I believe it uses iterators).
> It does.
>> > FWIW I don't expose any iterators directly in my plugin, I simply
>> > generate a list of wrapper objects and return that. But I suppose
>> > others might.
>> >> There are some questions to be solved, e.g. memory management for those
>> >> objects that aren't directly tied to GCC objects, e.g. Ranges right now.
>> >> I do have a strong feeling about the relation of e.g. plugin Instructions
>> >> and GCC gimple/rtx, in the sense that plugin authors should _not_ be
>> >> required to manage memory for those things (same for BBs, functions,
>> >> operands).
>> > I'm not sure how to parse what you wrote above, so I'm not quite sure
>> > what your preference here is. I see that you have range-creation
>> > functions (e.g. "gcc_stmts"), which return a Range that's owned by the
>> > caller, together with a cleanup function ("gcc_free_range") that must be
>> > called exactly once assuming the Range was successfully created. The
>> > other entities (e.g. "Function") are in fact "really" just gcc structs
>> > internally (e.g.
>> > "Function" = "struct Function_*" = "cast to (struct function *)" )
>> > and those are GC-managed.
>> Yes, they are "handlers" that can be passed/returned by value and thus
>> need no memory management. That they relate to the internal GC pointer
>> is an implementation detail.
>> > Currently in my proposed API I assume that all objects are GC-managed,
>> > and that the user is required to register a callback hook to mark any
>> > objects that they're referring to. Does that seem like a sane strategy?
>> > If so, would if make sense to make the iterators be GC-managed also?
>> > I'm not sure about this, but it seems worth discussing. It strikes me
>> > that it's a common use-case to use an iterator within code, and it would
>> > be a pain to have to track stack frames explicitly as GC roots, so
>> > having iterators have their own lifetime-management may make that
>> > simpler. (Awkward questions: what happens if a GC collection happens
>> > during a loop? does the iterator hold a reference on the current
>> > element? etc)
>> I think relying on GC is bad if you think of re-using the same API for
>> other compilers - it puts restrictions on the API implementation side.
>> If, then I would prefer explicit reference counting ... but that can be tedious
>> to use (again, easily abstracted with a C++ interface ontop).
>> > [FWIW In my plugin I explicitly keep track of all of my live wrapper
>> > objects in a linked list, so that when the GCC GC runs I can simply walk
>> > the list and call the appropriate marking function on the underlying GCC
>> > objects, so that they don't get swept away from under me]
>> There need to be rules about what state plugins can keep "live" across
>> invocations of their per-function hook or other hook they are being invoked on.
>> Simple rule: nothing ;)
> I believe that such a rule is overly restrictive. For example,
> gcc-python-plugin supports creating custom attributes:
> which leads to a callback being invoked by the parser when it encounters
> various declarations marked with the attribute. At this point it's
> natural to want to store the declarations somewhere for use by a later
> pass. For example, I use this to mark functions with non-standard
> reference-counting behavior in my reference-count checker:
But all data kept "live" here is associated with GCC internal data. There isn't
any data blob inside pluign allocated memory that points to GCC internals.
> Similarly consider the case of passes that gather per-function data,
> then use that information within an interprocedural pass. The extra
> data needs to be either be stored in a new field on the functions
> (where?), or in a mapping from function->data, at which point you need
> the keys to refer to FUNCTION_DECL instances, and thus need to be able
> to store a reference to them, hence participate in the GC and mark them
> as needed.
So the plugin API should offer a way to associate data with objects in the IL
(but the plugin shouldn't maintain data pointing to IL objects).
> I'd prefer the assumption that the underlying API does garbage
> collection, given that this is an API to GCC which does garbage
> collection. (I'm assuming that GCC intends to continue using GC for
> Alternatively to explicit GC and to explicit reference counting, there's
> the acquire/release model, where the client has ownership of a handle to
> the underlying object and must explicitly release the handle, but there
> may be multiple handles to the same object.
> To consider a variant of your question: what is the state that plugins
> can keep "live" *within* an invocation of their per-function hook or
> other hook: what happens if the garbage collector runs during such a
> hook? Or is the collector disabled at entry to the hook and reenabled
> upon return from the plugin?
The garbage collector is disabled at entry and reenabled at return
(all collections are explicit via calls to ggc_collect ()).
>> >> There are also other things missing: e.g. operators. The current thing
>> >> doesn't have access to the TREE_CODE/gimple_expr_code/RTX_CODE, and hence
>> >> can't differ between an add and a mul. Obviously that's less than
>> >> optimal. But the codes exposed for the plugin should have no relation to
>> >> GCC codes, but again be general concepts. So, externally the plugin would
>> >> export codes like "GCC_ADD/SUB/MUL", whose enum values will remain stable
>> >> forever, and internally they're mapped from the tree/rtx codes (that
>> >> mapping can changes as we add/remove some enum values from those).
>> > Right. Currently I have a huge number of "subclasses" (implemented via
>> > my inheritance-in-C hack), but having some new enums instead would make
>> > more sense: your approach is much simpler.
>> >> Also types aren't included. Though some things would be obvious: a
>> >> gcc_type (Operand op) function, and some accessors like "gcc_arithmetic_p
>> >> (Type)", "gcc_width (Type)", "gcc_unsigned_p (Type)" and so on.
>> > Yes.
>> > Some naming bikesheddery: I prefer a function name to identify what it
>> > works on. So I'd prefer "gcc_type_width(Type)" to "gcc_width(Type)";
>> > similarly, I'd prefer "gcc_block_stmts(Block)" to "gcc_stmts(Block)".
>> > For that matter I prefer "get", "iter", and "is", giving:
>> > int gcc_type_get_width(Type t);
>> > Range gcc_block_iter_stmts(Block b);
>> > bool gcc_type_is_unsigned(Type t);
>> > I think the improved clarity outweighs the extra typing.
>> > I notice that you capitalized the types, and that they don't have a
>> > prefix. I'm nervous about naming collisions in such a scheme. In my
>> > proposed API the types have names of the form "gcc_basic_block" and
>> > "gcc_type". (I got the impression from earlier discussions that the GNU
>> > coding conventions eschew capitalization - but that could be because I
>> > leapt in with my CamelCaseCraziness in an earlier thread ).
>> > Another naming nitpick: I've deliberately been thinking of this as a
>> > "gcc api" rather than a "gcc plugin api". My hope is that eventually
>> > (and I know that this is long way off) that GCC will be embeddable and
>> > suitable for use for JIT compilation, given that this seems to be the
>> > one area where I find myself needing LLVM.
>> The primary goal of the C API should be simplicity and easy re-targeting.
>> I would expect that most introspection plugin writers would use a nicer
>> interface like python or C++. Those can also provide high level functionality
>> that builds upon pieces of the C API.
>> >> Obviously most useful accessors to the individual objects are missing.
>> >> Those accessors again should be fairly unrelated to GCC specific concepts,
>> >> but I would envision things like "gcc_volatile_p (Instruction i)" or
>> >> perhaps "gcc_reg_p (Operand)".
>> >> > In particular, as the maintainer of the gcc-python-plugin I want
>> >> > something that will make my life easier.
>> >> Well, the current version 0.0 certainly will not make your life easier.
>> >> It misses almost everything ;)
>> > :)
>> >> But my goal was to set a ground interface
>> >> that I would be pleased to work with as plugin author, _and_ that is
>> >> easily maintainable by GCC authors (in a way this is actually more
>> >> important), with the hope that actual plugin authors would extend it
>> >> according to above principles as they need.
>> > Looking at your work, I'm thinking it might use it as inspiration to
>> > rework the API I've written to be more like yours in some places. I
>> > think my work covers more ground than yours does, but it sounds like
>> > some of the approaches you're taking are more likely to be acceptable by
>> > the rest of the GCC maintainers.
>> Certainly a non-modifying API is a no-brainer and way easier to "design".
>> I suspect we will try to develop a instrumentation-kind modifying API as
>> well (like, add a global variable, add an instruction that increments
>> that, etc.).
>> At some point you will run into the issue that such API will be no longer
>> re-targetable as it (necessarily) exposes GCC implementation details. At
>> that point I will say - well, do not write a plugin, write a piece of
>> GCC itself!
>> (people of course may choose to disagree with me here)
> To give some context, I'm primarily interested in static analysis: I'm
> writing one-off passes that may only be of interest to an individual
> project, or may slow down the compiler by an order-of-magnitude or
> worse. It's acceptable to do that if the analysis results are useful
> enough, but I wouldn't expect such a patch to be merged into the main
> GCC project.
> [The python plugin doesn't allow any modification yet, but it should be
> possible to do so if someone has a compelling use-case.]
> [Though embedding Python inside GCC also allows for lots of fun side
> projects: I've successfully embedded Django within gcc, giving gcc a
> "serve-http" pass, where it stops compiling and instead serves http on a
> port (potentially exposing the internal representation as dynamic HTML);
> it never returns from this pass, serving HTTP until it's killed]
I think static analysis is a perfect candidate for the proposed
and it _should_ be pretty compiler independent.
>> >> > By that I mean: something that will make it easier to keep my plugin
>> >> > compatible against "as many gcc versions as possible" (which currently
>> >> > means 4.6 and 4.7, but I want to add 4.8 and so on), and minimize the
>> >> > amount of recompiling I have to do over ABI issues within a GCC release.
>> >> Yes, that's definitely the goal of my approach. Few opaque types, few
>> >> high level accessors, never to be changed again (ABI wise) in the future,
>> > Sounds good.
>> >> > My plugin is currently implemented in C (requiring C++ would be a pain
>> >> > for me, but would be doable, I think).
>> >> ... and the API C only, yes, definitely.
>> >> I hope the above made some of the principles clear, even though the actual
>> >> implementation is terribly lacking in features.
>> > I think so (with some clarifications as noted above).
>> > Is it worth me trying to take my proposed API in the direction you've
>> > described, and trying to meet in the middle? I have my static analysis
>> > code working on top of my API, so I hope it will be relatively easy to
>> > gradually tweak it to more closely match the desired API.
>> What I'd like to see for 4.8 is the core introspection plugin API be included.
>> We will declare it BETA, which means we will likely turn it upside-down for
>> 4.9 after getting feedback from users and re-targeters (I really hope someone
>> will try ;)). For 4.9 the goal would be to have a forever(!) stable API which
>> we can then backport.
>> Input and help from everybody is appreciated, I'd say we can even start with
>> committing the current state to trunk and do development in the public.
> By "current state", are you referring to Michael's work, or to mine? (or
> to whoever gets a patch in first? :) )
For some weird reason I refered to Michaels work ;)) (possibly with the
operand iterator fixed first)
>> From a RM perspective the plugin API work is really non-interesting (apart from
>> that it should not break GCC build), so you can continue changing/enhancing
>> the API throughout stage3 at least (which means at least 4 months from now).
> Hope this is helpful
Yes, it is. Especially your focus on static analysis fits my focus on