This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc-c-api


On Mon, 2012-09-10 at 17:20 +0200, Michael Matz wrote:
> Hi David,
> 
> On Mon, 10 Sep 2012, David Malcolm wrote:
> 
> > Is it possible for you to post your work-in-progress code somewhere?
> 
> Attached.

Many thanks for posting this!  Various comments inline below.

> > I know that you don't feel it's ready for committing, but I would find 
> > it helpful - I'm interested in understanding the general approach, 
> > rather than seeing completeness or perfection.
> 
> Some sort of brain dump follows:
> 
> The idea is as follows: as first cut an introspection API that is tied to 
> compiler IR concepts rather than GCC specifics.  As such it should be 
> implementable also for other compilers, at least the trivial things that 
> every traditional compiler will have.  So, we have functions, basic 
> blocks, instructions, operands and operators.  Nothing of that should 
> relate to tree or gimple or RTL.

I see.  So there's a terminology issue here: we shouldn't refer to
"gimple" or "rtl", we should refer to "instructions" or "statements".

[Possibly crazy idea: should the API actually refer to itself as GCC?
(with "gcc_" prefixes etc) If it's implementable by other compilers,
would another prefix be suitable?  I don't think this is a good idea,
but it makes for an interesting thought experiment]

> Take for instance the (included) dump-plugin.  The goal would be, that 
> depending on where you'd put that dumper in the pass pipeline it would 
> work _unchanged_ on GENERIC, on GIMPLE and on RTL.  That goal isn't 
> reached yet, once because the internal iteration just isn't
> implemented for e.g. RTL instruction stream, and once because the operand 
> iterator API isn't well suited to the tree-like nesting in GENERIC and 
> RTL currently.

Interesting idea.

I prefer having a little more type-safety, but it's a pain to achieve it
in C.  If you look at my proposed API [1], there are dozens of tiny
casting functions.  I like that it's typesafe, but it's somewhat
inelegant.

> [The intermediate goal was to redo the operand API to be tree-like at the 
> base, and possibly write small wrappers to again expose the nicer 
> interface that GIMPLE would provide (i.e. direct access to all read 
> operands of an instruction).]
> 
> Another thing I want is simplicity.  E.g. only the bare minimum of types 
> should be exposed.  Note how the API itself for instance doesn't expose 
> different types of collections, only a general Range which can enumerate 
> all things, depending on how it's used (though the implementation has 
> runtime checks for wrong usage).

I seem to remember from earlier mailing list discussion there being a
preference for explicit iterator objects, rather than for_each functions
taking callbacks (my API uses the latter approach).

FWIW I don't expose any iterators directly in my plugin, I simply
generate a list of wrapper objects and return that.  But I suppose
others might.

> There are some questions to be solved, e.g. memory management for those 
> objects that aren't directly tied to GCC objects, e.g. Ranges right now.  
> I do have a strong feeling about the relation of e.g. plugin Instructions 
> and GCC gimple/rtx, in the sense that plugin authors should _not_ be 
> required to manage memory for those things (same for BBs, functions, 
> operands).

I'm not sure how to parse what you wrote above, so I'm not quite sure
what your preference here is.  I see that you have range-creation
functions (e.g. "gcc_stmts"), which return a Range that's owned by the
caller, together with a cleanup function ("gcc_free_range") that must be
called exactly once assuming the Range was successfully created.  The
other entities (e.g. "Function") are in fact "really" just gcc structs
internally (e.g. 
   "Function" = "struct Function_*" = "cast to (struct function *)" )
and those are GC-managed.

Currently in my proposed API I assume that all objects are GC-managed,
and that the user is required to register a callback hook to mark any
objects that they're referring to.  Does that seem like a sane strategy?
If so, would if make sense to make the iterators be GC-managed also?
I'm not sure about this, but it seems worth discussing.  It strikes me
that it's a common use-case to use an iterator within code, and it would
be a pain to have to track stack frames explicitly as GC roots, so
having iterators have their own lifetime-management may make that
simpler.  (Awkward questions: what happens if a GC collection happens
during a loop?  does the iterator hold a reference on the current
element? etc)

[FWIW In my plugin I explicitly keep track of all of my live wrapper
objects in a linked list, so that when the GCC GC runs I can simply walk
the list and call the appropriate marking function on the underlying GCC
objects, so that they don't get swept away from under me]

> There are also other things missing: e.g. operators.  The current thing 
> doesn't have access to the TREE_CODE/gimple_expr_code/RTX_CODE, and hence 
> can't differ between an add and a mul.  Obviously that's less than 
> optimal.  But the codes exposed for the plugin should have no relation to 
> GCC codes, but again be general concepts.  So, externally the plugin would 
> export codes like "GCC_ADD/SUB/MUL", whose enum values will remain stable 
> forever, and internally they're mapped from the tree/rtx codes (that 
> mapping can changes as we add/remove some enum values from those).

Right.  Currently I have a huge number of "subclasses" (implemented via
my inheritance-in-C hack), but having some new enums instead would make
more sense: your approach is much simpler.

> Also types aren't included.  Though some things would be obvious: a 
> gcc_type (Operand op) function, and some accessors like "gcc_arithmetic_p 
> (Type)", "gcc_width (Type)", "gcc_unsigned_p (Type)" and so on.

Yes.

Some naming bikesheddery: I prefer a function name to identify what it
works on.  So I'd prefer "gcc_type_width(Type)" to "gcc_width(Type)";
similarly, I'd prefer "gcc_block_stmts(Block)" to "gcc_stmts(Block)".
For that matter I prefer "get", "iter", and "is", giving:
   int gcc_type_get_width(Type t);

   Range gcc_block_iter_stmts(Block b);

   bool gcc_type_is_unsigned(Type t);

I think the improved clarity outweighs the extra typing.

I notice that you capitalized the types, and that they don't have a
prefix.  I'm nervous about naming collisions in such a scheme.  In my
proposed API the types have names of the form "gcc_basic_block" and
"gcc_type".  (I got the impression from earlier discussions that the GNU
coding conventions eschew capitalization - but that could be because I
leapt in with my CamelCaseCraziness in an earlier thread [2]).

Another naming nitpick: I've deliberately been thinking of this as a
"gcc api" rather than a "gcc plugin api".  My hope is that eventually
(and I know that this is long way off) that GCC will be embeddable and
suitable for use for JIT compilation, given that this seems to be the
one area where I find myself needing LLVM.

> Obviously most useful accessors to the individual objects are missing. 
> Those accessors again should be fairly unrelated to GCC specific concepts, 
> but I would envision things like "gcc_volatile_p (Instruction i)" or 
> perhaps "gcc_reg_p (Operand)".
> 
> > In particular, as the maintainer of the gcc-python-plugin I want 
> > something that will make my life easier.
> 
> Well, the current version 0.0 certainly will not make your life easier.  
> It misses almost everything ;)  

:)

> But my goal was to set a ground interface 
> that I would be pleased to work with as plugin author, _and_ that is 
> easily maintainable by GCC authors (in a way this is actually more 
> important), with the hope that actual plugin authors would extend it 
> according to above principles as they need.

Looking at your work, I'm thinking it might use it as inspiration to
rework the API I've written to be more like yours in some places.  I
think my work covers more ground than yours does, but it sounds like
some of the approaches you're taking are more likely to be acceptable by
the rest of the GCC maintainers.

> > By that I mean: something that will make it easier to keep my plugin 
> > compatible against "as many gcc versions as possible" (which currently 
> > means 4.6 and 4.7, but I want to add 4.8 and so on), and minimize the 
> > amount of recompiling I have to do over ABI issues within a GCC release.  
> 
> Yes, that's definitely the goal of my approach.  Few opaque types, few 
> high level accessors, never to be changed again (ABI wise) in the future, 

Sounds good.

> 
> > My plugin is currently implemented in C (requiring C++ would be a pain 
> > for me, but would be doable, I think).
> 
> ... and the API C only, yes, definitely.
> 
> I hope the above made some of the principles clear, even though the actual 
> implementation is terribly lacking in features.

I think so (with some clarifications as noted above).

Is it worth me trying to take my proposed API in the direction you've
described, and trying to meet in the middle?  I have my static analysis
code working on top of my API, so I hope it will be relatively easy to
gradually tweak it to more closely match the desired API.

Many thanks again
Dave

[1]
http://git.fedorahosted.org/cgit/gcc-python-plugin.git/tree/gcc-c-api?h=proposed-plugin-api

[2] I intend to keep the CamelCase in my *plugin* - it makes it easy to
spot the boundary between the plugin itself vs the API it's talking to,
rather like having two different handwriting styles in written material
- but that's just my plugin, of course.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]