This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: C++ lexer (GCC 3.1.1) requires knowledge of other C dialects

From: "Gary Funck" <gary at Intrepid dot Com>
To: <gcc at gcc dot gnu dot org>
Date: Fri, 2 Aug 2002 08:59:27 -0700
Subject: RE: C++ lexer (GCC 3.1.1) requires knowledge of other C dialects


> -----Original Message-----
> From: Robert Dewar [mailto:dewar@gnat.com]
> Sent: Friday, August 02, 2002 7:54 AM
> To: gary@Intrepid.Com; gcc@gcc.gnu.org
> Subject: RE: C++ lexer (GCC 3.1.1) requires knowledge of other C
> dialects
>
>
> >(though they are explicitly public domain with a BSD-like
> copyright notice).
>
> That's nonsense. Something is either in the public domain or it
> is copyrighted.
> It cannot be both. If it has a copyright notice, then it is not
> in the public
> domain. People on this list of all lists should not misuse the phrase!
>

Thanks for the clarification. Here is their rights statement:
 http://www.antlr.org/rights.html
It is public-domain (not copyrighted), they ask that users preserve the
header comments and cite the use of the tool in research reports and
documentation.



> I still see a problem from a technical point of view. First I think that
> LL parser generators are inherently inferior to good LR parser generators.
> If you want to use automatic parser generation, a modern LR parser
> generator is a much better choice (that by the way excludes YACC and
> BISON :-)
>

This is an age-old debate. I've used both LALR and LL parser generators, and
prefer LL (well, modified LL, as implemented in PCCTS) partly because the
top-down nature of the parse is more intuitive, and actions added internally
to RHS rules do not run the risk of introducing new parse conflicts.
Generally, the resulting parsers are easier to understand and require fewer
tricks, though the need to perform left-factoring can make the resulting LL
grammar not as clear and intuitive as it might've been in its original
(non-factored) form.

Terence Parr's thesis discusses some of the pros/cons of using LL vs. LALR
and motivates the need to use k>1 look ahead techniques for LL(k) parsers:
http://www.antlr.org/papers/parr.phd.thesis.ps.gz

I am interested in knowing, which LALR parser generators are superior to
bison/yacc?


> The allegation that a given parser generator cannot parse a given language
> is always incorrect. Why? Because in practice what you do is to adjust the
> grammar to be suitable for the generator. Such adjustment is
> always possible.

I agree. I was going to say, relative to the ANTLR FAQ that said it was
impossible to use ANTLR to build a C++ parser, that this statement was
likely false. I think the author was saying that some unpleasant damage
would have to be done to the C++ BNF reference grammar description to build
a working C++ parser, to get it working with ANTLR.

There's some interesting work which improves the technology used to build
recursive descent parsers:
http://www.cs.rhul.ac.uk/research/languages/projects/grd.shtml

"GRD is a new style of parsing algorithm that allows you to protoype new
languages without regard to the underlying parsing technology. Once a
working prototype is obtained, GRD based parser generators can be used to
refine the language specification to improve parser performance.

The heart of GRD is a new backtracking recursive descent parsing algorithm
that can handle grammars requiring arbitrary amounts of lookahead and which
even behaves correctly when presented with ambiguous grammars. A faster
version uses a new property called follow determinism to decide when to
truncate the back tracking search."


>
> I still think that writing the parser by hand makes much better sense
> for the reasons I stated previously.
>

In my opinion, a parser built using a parser generator is generally easier
to understand, maintain, and extend than a hand built parser. I like the
fact that a parser generator will tell me if I've introduced ambiguities,
and point out where they are. I also generally like automatically generated
error recovery, as long as the parser generator gives me the tools to
augment it.

References:
- RE: C++ lexer (GCC 3.1.1) requires knowledge of other C dialects
  - From: Robert Dewar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]