This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Parsing Fortran

From: Steven Bosscher <stevenb dot gcc at gmail dot com>
To: fortran at gcc dot gnu dot org
Cc: Joe Krahn <krahn at niehs dot nih dot gov>
Date: Sat, 15 Jul 2006 01:14:55 +0200
Subject: Re: Parsing Fortran
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:message-id:from; b=TpNNuqdrLGbFpMvsMMgUajcyf1Cqx7tjdHJhX3UBj1fDXvh684Be1a/1uRisldn/wBRUdYcL8I49iVdS9WYLXR4oHNkzam4Tvn1RD9LGBaNq1LqlGAiNODSkGLjHdWjHS1c35vLljupyHAG2jSgvRBukWNDHbYFLo47j0hK7fB8=
References: <44B80421.8090404@niehs.nih.gov>

On Friday 14 July 2006 22:52, Joe Krahn wrote:
> Parsing Fortran is difficult. A hand-made parser, like the one developed
> by Andy V. can be fairly ugly.

Can be.  But generally Fortran parsers are hand-made parsers.  For
example the Cray front end is a hand-crafted parser.  The "problem"
with the parsers of g95 and gfortran is that they are basically
just template matchers.  On the other hand, this does appear to
work reasonably well for Fortran.  In fact the first ever Fortran
compilers were of this style.

> In thinking about the problem, I came up with a decent scheme for
> getting most of the work done with a YACC style parser. The general idea
> is to split the parsing into two layers. The first layer tokenizes data,
> the second layer does the actual interpreting. I have it partly 
> implemented in Perl using Parse::Yapp. My idea is to make a small and
> simple but accurate Fortran parser, that can be used as a base tool for
> Fortran source code processing, for re-formatting, 'lint'-ing, etc.
>
> With the GFortran team having dealt with parsing, how effective does
> this parsing scheme sound? 

This sounds like just the normal lexing/parsing separation that one
would usually use for a language that can be tokenized at the lexical
level, and described with a context-free grammar.

Fortran can't be tokenized without parser feedback.  Think for example
about "MODULE PROCEDURE".  This could be the declaration of a module
called "procedure", or it could be the keyword MODULEPROCEDURE, or it
could be the identifier "MODULEPROCEDURE".  There is no way to tell
without parser feedback.  Likewise, "DO I = 1,2" could be "DOI=1" or
"DO I=1" and there is no way to tell what it should be until you see
the comma.  The typical algorithm to work around this mess is Sale's
algorithm.

Context-free LALR(1) grammars for Fortran also do not exist, so you
can never write a complete Fortran parser with YACC.  The only parser
generator I know of that can _almost_ handle Fortran is Eli, which has
a Fortran grammar developed by Bill Clodius.  But this grammar is also
not complete, and it's got tricks too to couple the scanner and parser
so that feedback can be passed from one to the other.

Your tool can probably handle all sane Fortran input, but not every
ugly little detail that the standard allows.  But for most jobs, being
able to handle e.g. only free form source with sanely named identifiers
is good enough.

A "good" setup for parsing Fortran would be:
* some kind of pre-lexing with Sale's algorithm
* tokenizer
* recursive-descent parser with backtracking.

But writing something that handles all the cases is not an easy task.

> Are others here interested in a Perl-based
> code manipulation tool?

Always nice to have.

Gr.
Steven

Follow-Ups:
- Re: Parsing Fortran
  - From: Joe Krahn

References:
- Parsing Fortran
  - From: Joe Krahn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]