This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Base new module format on XML - RFC and some questions


Hi list,

after some non-discouraging remarks last Sunday, I have spent some
brain cycles on thinking about the benefits and drawbacks of basing a
new module file format on xml.
Before delving deeper into module.c in its full 5 KLOC glory, I would
greatly appreciate some feedback and answers to a few questions. I
thought a kind of Q&A style could help structure it a bit:

0) Why change the module format anyway?
You guys know better than me, I guess.
Salvatore had some things to say about bloat with complex class
hierarchies (factor 100 module/source IIRC).

1) Why do I think XML is suitable?
a. I think it is easiest to map the content of a module to something
which has an inherent tree-like structure - xml does.
b. By choosing a suitable layout, one can generate backward-compatible
module files (I always confuse myself over which direction that is -
Mikael knows :-)
c. XML files are human-readable; one could argue that the same applies
to the current format, but ... well, just look at the beauties.
d. XML files can be edited, displayed, transformed, hacked apart,
you-name-it, by masses of software out there, for whatever cool
applications people may think of. Apache FOP can generate pdf (think
documentation) from xml documents, for instance. Just saying; I
personally am not into this fancy stuff.
e. XML (and the whole w3c ecosystem) are established standards and
won't go away (not saying that Lisp will [though I think it should]
;-)

2) What are the benefits of using XML?
a. Expressiveness. Think of a straightforward format like <subroutine
pure only="qux" is_bind_c="Qux"
symbol="__mod_bar_qux">foo</subroutine> with children <arg pos="1"
type="GFC_REAL_4", intent="inout" optional>fooarg</arg>. Not
elaborated, just from the top of my head.
b. Canonic single-pass parsing by reading a whole file into a DOM
tree. I gathered that one of the weaknesses of the current format is
the need to repeatedly search around in it - right? Same for XML, but
done in memory, possibly using XPath or such.
c. Bloat reduction:
c1: Do not include USEd modules. Instead use the XInclude facility to
do that. Think "pointer to other module file" (OTOH this makes it
necessary to determine the top-level module(s) of all USEd modules and
only parse that one - bit tricky)
c2: Do not copy super-class definitions (Salvatore thinks this is one
reason for bloat). I think this can be done by one of XLink, XPointer
or XWhatever. There definitely is some standard ("w3c recommendation")
syntax to express internal links.
c3: Compress the output xml files. libxml2 can be made to zip output
files by a simple switch. Saves 80-90% space in my experience.
d. XML is UTF-8. Not sure, if this might ever be relevant. Nice
though, if it is.
e. Cleaner and more concise module.c, because file handling is
out-sourced to external xml lib; parsing by DOM-parsing routine,
locating stuff by XPath or SAX (depends on what should be done), and
the like.
f. XML files should be much more robust than the current Lisp'ish
ones. In conjunction with a DTD or Scheme, module files can be checked
for correctness, and errors can be spotted pretty easily. No more need
for "you get what you deserve" and md5 sig.

3) What are the drawbacks?
a. Dependency on an XML library. libxml2 seems like a good choice,
because it is very mature, has regular releases and seems to be
portable to almost anything, probably including the NetBSD toaster.
Linux and I guess *nix, BSD and Mac users won't even notice this (at
least it shouldn't be an issue), but the mingw guys might not be
amused. Then again, I recently built libxml2 with mingw and it worked.
In a nutshell: Yet another dependency, but an extremely portable one
(if libxml2).
b. Uncompressed XML is somewhat more verbose than the current Lispish
format; libxml2 offers on-the-fly compression using zlib, which
probably reverses that relation.


As for my questions:
- Does the module stuff have to do anything than more than translate
between one of the ("big, hairy") gfc_* stuctures and a file?
- How is the module info accessed by the compiler? Is it polling for
symbols (like "What is this foo thing?"), or does it search around in
a tree of those gfc_* things? I could not grok this from browsing the
module.c source.
- Finally, while this is all nice and peachy, starting this only makes
sense, if someone would be willing to carry on, if I drop the ball due
to time constraints (what is this dissertation thing anyway?)
At the very least, I will try and record this and the answers on the
gfortran wiki, so my ideas and your comments on them won't get lost.

Phew, long email - enough for now!

Cheers,
Dennis


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]