This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

XML output from GCC


A couple weeks ago, I posted to the list the idea of adding support to g++
to write out information about a program in XML.  I got several positive
responses, so I decided to look further into the idea.

Last Thursday I finally got a chance to start on a full implementation
of the XML output addition to g++.  I've made quite a bit of progress,
and would like to get some feedback.

I decided to go with XML because it is an existing standard, so there are
plenty of parsers available in several languages.  The whole idea is to
make C++ programs easy to parse for a variety of tools.  Binary formats
for fast lookups can be generated by another layer that reads the XML.

I also thought about Artem's suggestion of making this a separate
front-end, but unfortunately that would require maintenance of a
separate parser.  By making it part of g++, we get all the template
specialization and instantiation (along with other nasty things) done
for us.  Perhaps instead someone could write a new back-end that
generates language-independent cross references instead of executable
code?  I can't imagine right now how it would be done, but who knows?

See below for the details of my implementation.  For now, I've just
dumped all the declarations in a translation unit in a nested structure
of XML tags (elements).  I'd like to split up the output by namespace
and class into multiple files (and one index file).  This way if the
same class is parsed across many translation units, it only has to be
output once (like istream).

I was thinking along the lines of:

index.xml:
<NamespaceFile namespace="::" file="ns001.xml"/>
<ClassFile class="::istream" file="class001.xml"/>
<ClassFile class="::ostream" file="class002.xml"/>
.....

ns001.xml:
<Namespace name="::">
  <ClassLink name="istream"/>
  <ClassLink name="ostream"/>
  ....
</Namespace>

class001.xml:
<Class name="istream" context="::" source_file="iostream"
       source_line="343">
  ....
</Class>

...and so on.  Anyone have suggestions on this?

Also, at the bottom of the message (after the example), I ask for a bit
of help on a couple of details.  Read it if you are feeling kind today.

-Brad

-----------------------------------------------------------------------

The attached xml.c is added to the gcc/cp directory.  In semantics.c, at
the end of finish_translation_unit(), call do_xml_output(), whose
prototype is added to cp-tree.h.  I also added a -fxml flag to control
whether this call is made.

Attached is example.xml, which was generated by creating an example.cxx
file with only the line

#include <iostream>

and running my customized gcc on it with this command line:

g++ -fxml -c example.cxx >example.xml

It isn't really a valid XML file right now since there is no header, and
no DTD, but it is useful as an example.

Also attached is example.sds.xml, which was generated from the same
example.cxx, but by SDS (http://sds.sourceforge.net/), the "cppp2csf"
tool.  When I first compared the outputs, I was amazed at the similarity,
as I hadn't looked at SDS at all before writing xml.c.  Anyone have
thoughts on the advantages/disadvantages of each format?  I haven't
implemented the cross references (or function bodies), but this could
be added.

-----------------------------------------------------------------------

A bit of help, please:

I'm working with the 2.95.2 source distribution, but have successfully
run it with a 2.96 snapshot.  There seems to be a problem with some
function declarations in the 2.95.2 version.

With this program as input:

template <typename T>
void f(void) {}

template <>
void f<char>(void) {}

template void f<int>(void);

int main(int argc, char*argv[])
{ f<char>(); return 0; }

I just get an empty global namespace as output.  Even with the checks for
not outputting compiler-generated declarations turned off, I don't get
main or f at all when looping over cp_namespace_decls, not even a
TEMPLATE_DECL.  Anyone know why (it works fine in 2.96)?

Also, the DECL_SOURCE_LINE for the name of a class results in the line
of the end of the definition ("};").  The closest I can get to the
real line is by looking at the DECL_SOURCE_LINE of the self-typedef
of the class name that the compiler puts in the scope of the class
(at the "{").  Is there a better way to get the line on which the
identifier of the class name was placed?

Thanks.


xml_gcc.tar.gz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]