This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: C++, libstdc++-v3 and, well, error messages

On 18 Nov 2000, Gabriel Dos_Reis wrote:
> CodeSourcery announced that it will turn the C++ parser in a recursive
> descent one -- which undoubtely will allow better error diagnostics
> and error recovery.

Just a quick note here. It is far from obvious that recursive descent
parsers allow better error diagnostics and error recovery than table
driven LR parsers. Both approaches can produce excellent results. I
believe that recursive descent parsers do have a slight edge in dealing
with very specialized situations, and the recursive descent parser in
GNAT was actually written to demonstrate that. In the case of Ada,
GNAT was competing here with the excellent diagnostics generated
in our previous Ada 83 compiler (Ada/Ed) which used a table generated
parser, with advanced error recovery and diagnostic techniques developed
by Fisher at NYU and Charles at IBM.

That being said, it is definitely true that the error messages currently
generated by GNU C and g++ are simply appallingly bad compared to the
standard of what can be achieved by *either* technique. This is because
the table generated parser technology used in these compilers is decades
out of date, and really horrible.

It's an interesting question which way to go. On the one hand, I do think
that hand written parsers (it is really the writing by hand as opposed
to table generation that is the issue, there is no inherent advantage
of LL1 over LR methods here, quite the opposite in fact) can handle some
complex situations which table generated methods have difficulty with
(e.g. distinguishing semicolon from IS in Ada).

But there is a real cost in maintainability and flexibility, there is
no question that a hand written parser, *particularly* one with very
sophisticated error detection and recovery, is going to be far harder
to maintain than a table driven one, where basically you simply maintain
the grammar.

As I say, my bottom line is that the gain is worth this increase in
difficulty of modification and maintenance, since

  a) the parser is a comparatively simple of any compiler

  b) the language is not changing rapidly in any case

But the "undoubtedly" in the quoted para above, while most certainly
true given the horrible starting point, should not lightly be taken
as meaning that everyone agrees that error detection and recovery
can better be done by hand.

In particular, one very important technique (of course NOT used in current
gcc front ends), is to use trial parse ahead to determine the best choice
of token deletion/insertion/replacement. THis can be a very powerful method,
and is really ONLY practical in a table driven environment.

By the way, in accordance with the normal terminology in the compiler world,
I use "parser" to mean solely the phase of the compiler that generates a
syntactic tree, and I exclude the semantics analysis (I mention this because
I note that messages on Fortran 90 seem to use the term parser to include
both aspects -- that can be a bit confusing, since for modern complex
languages, the parsing (in the sense I use the term) still remains a
basically simple process (although error detection can complexify it
considerabley -- well over half the code in the GNAT parser is for
error handling), whereas static semantic analysis can be a very complex

Robert Dewar

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]