This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Base new module format on XML - RFC and some questions

From: Janne Blomqvist <blomqvist dot janne at gmail dot com>
To: salvatore dot filippone at uniroma2 dot it
Cc: fortran at gcc dot gnu dot org
Date: Wed, 29 Sep 2010 14:51:20 +0300
Subject: Re: Base new module format on XML - RFC and some questions
References: <1285753638.2843.121.camel@localhost.localdomain>

On Wed, Sep 29, 2010 at 12:47, Salvatore Filippone
<salvatore.filippone@uniroma2.it> wrote:
> Jerry DeLisle wrote:
>>c. Bloat reduction:
> Â Â Â Â>c1: Do not include USEd modules. Instead use the XInclude facility to
> Â Â Â Â>do that. Think "pointer to other module file" (OTOH this makes it
> Â Â Â Â>necessary to determine the top-level module(s) of all USEd modules and
> Â Â Â Â>only parse that one - bit tricky)
>>
>>I don't quite understand what you are saying above. We don't "include",
>>we "use" Or, are you referring to a syntax that transparently opens and
>>accesses files "behind the scenes" so to speak? Or you refeering to
>>some sort of abstraction layer? I am just not familiar with what
>>XInclude is.
>
> I think Dennis is referring to a phenomenon that I hinted at during the
> IRC meeting.
> When you have a module A that USEs another one B, what happens in
> the .mod file?
> You have basically two alternatives:
> 1. Include a copy of B.mod inside A.mod

Well, you don't need to copy B.mod verbatim. Say for things like
renaming symbols, and USE ... ONLY: ... A.mod can contain only the
symbols that are imported, and with the renamed names. FWIW, gfortran
already does this.

> 2. Just put a directive in A.mod pointing to B.mod
>
> Solution number 1 means that any code that USEs A only needs access to
> the A.mod file, solution 2 requires access to both .mod files; in this
> respect, 1 is better. Different compilers go for different strategies;
> if I remember correctly, I found that Intel (at least in one of the
> older versions) was for 2, XLF is 1, NAG is 2.
>
> When you have a complex inheritance hierarchy, things can compound
> spectacularly.

IMHO option #1 is still the better one here, in that we need to read
fewer files. Yes, there might be some disk space bloat, but a lot of
the bloat can probably be fixed by reducing the amount of redundant
information in the mod files, without having to break the current
design where we incorporate transitive dependencies.

For another example of this, consider the newish Google Go programming
language, where one of the features that are explicitly mentioned is
fast compilation. Part of this is done by including transitive
dependencies in the generated interfaces. Another clever thing Go does
is that the interfaces are placed in a special ELF section in the
object files, so there is no need for separate .mod files. Though I'm
not proposing this for gfortran; for better or worse Fortran
programmers are already used to dealing with .mod files.

See pages 7 & 8 in

http://assets.en.oreilly.com/1/event/45/Another%20Go%20at%20Language%20Design%20Presentation.pdf

> Consider the total size Âin bytes of the .mod files for the F95 version
> of my library:
> ÂIntel Â Â GNU 46 Â Â Â Â Â ÂNag
> 4090544 Â Â6883556 Â Â Â Â 447891
>
> And if you look at the F03 version:
> Â Â Â Â Â ÂGNU 46 Â Â Â Â Â ÂNag
> Â Â Â Â 120325072 Â Â Â Â 590488
>
> Now, because of the copying feature, in the current GNU implementation I
> can get away with making accessible only a subset of those modules, for
> a total of about 10 MB instead of 120, but during the build phase the
> GNU compiler is still pushing around all those bytes.
> So, this is one of those cases where you have a trade-off between
> memory/disk and time. I would think that more feedback is required
> before deciding (the golden rule is make the common case fast, and the
> rare case correct); I don't particularly care whether the final format
> is XML or something else, after all I am not supposed to read it, the
> compiler is.
> Anyway, the Nag module files are plain text, they are quite readable,
> and still small.

I'm slightly negative about the XML thing myself. Yes, there are
issues with how gfortran handles module files, but changing from the
current s-expr like syntax to XML will not fix those, per se. And then
there's the thing about requiring yet another build dependency
(libxml2). While XML, and the software ecosystem around it, certainly
has value as a data interchange format for structured data, in the
case of gfortran where the only thing that reads and writes module
files is the compiler, so I'm not convinced about the value of XML
here.

That being said, Dennis idea for compressing the module files might be
good. And the GCC tree already has a copy of zlib, so no issue there.
E.g. if the size of the module file is over 4 KB (the typical size of
a disk block), then compress it, otherwise leave it uncompressed.
Apart from saving space, it might even make compiling faster by
reducing the amount of disk I/O. Wrt module size, the PR is
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40958

Another big issue with making compilation faster would be to cache a
parsed module, rather than rereading and parsing the same mod file
over and over again. See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25708 . As for the lazy
loading thing, I'm not sure it has such a big effect when one includes
the transitive dependencies, but it might still be a minor
optimization (i.e. read and parse used modules (if not already cached)
when needing symbols rather than when encountering a USE statement).

-- 
Janne Blomqvist

References:
- Re: Base new module format on XML - RFC and some questions
  - From: Salvatore Filippone

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]