This is the mail archive of the
mailing list for the GCC project.
Re: C++ lexer (GCC 3.1.1) requires knowledge of other C dialects
- From: dewar at gnat dot com (Robert Dewar)
- To: gary at Intrepid dot Com, per at bothner dot com
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 1 Aug 2002 06:21:15 -0400 (EDT)
- Subject: Re: C++ lexer (GCC 3.1.1) requires knowledge of other C dialects
<<There are tricks to solve most of these problems, but a
recursive-descent parser is just so much easier to understand, to debug,
and to handle special quirks that I much prefer it. Yes, the actual
grammar isn't as cleanly separated, and the source is slightly more
verbose, but I much prefer it.
When we use the phrase "recursive descent" what we mean in practice is
not a straight LL(1) parser. If we wanted a straight LL(1) parser, then
we could use an appropriate parser generator (though why would we do this
when LRK PG's are clearly preferable).
The point about recursive descent is that it is hand written code that can
use any techniques it likes to parse the source. It can do a bit of backup
or extra look ahead if it wants, it can keep information on the side and
roam around the tree etc. This is particularly use for two purposes:
1. The grammar used does not have to be exactly LL(1) or LR(1) or anything
else. In GNAT, we take advantage of this to use almost exactly the grammar
in the Ada RM which is certainly neither. But using the exact grammar in
the RM means that semantic analysis is much much cleaner. Since for a
language like Ada or C++, semantic analysis is far more complex than
parsing, that's a real advantage.
2. Error detection and correction is potentially much more effective. Now it
is true that YACC and BISON are perfectly awful examples of decades old
junk parsing technology, and that automatic parsing technology has got
hugely better in the last 20 years (it's a mystery to me why people would
even think of using such antiquated tools). Nevertheless, even with the
best table generated techniques (reference for example the work of Gerry
Fisher in connection with the Ada/Ed project, early -> mid 80's) I still
think you cannot match a hand written parser for error detection and
recovery, and the GNAT parser is intended to demonstrate this.
One bottom line here is that parsers are a pretty simple part of the process
no matter how written. In the case of the Ada parser, it's a relatively small
and easy fraction of the total front end, and actually if you look at the
parser (par-xxx.ad? files), much more than half the code has to do with
error detection and recovery, and that's the complicated part.