This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

new parser: error recovery needs work


The new parser doesn't produce useful diagnostics in the presence of
common errors.  Since the old parser did better, this is a regression.
Some examples:

1) mis-spelling a type name.

unsinged i;

with the old parser produces

pe.cpp:1: error: 'unsinged' is used as a type, but is not defined as a type.

with the new parser we get

pe.cpp:1: error: expected constructor, destructor, or type conversion
pe.cpp:1: error: expected `,' or `;'

2) forgetting to provide a template argument list

#include <vector>
std::vector foo;

with the old parser produces

v1.cpp:3: invalid use of template-name 'std::vector' in a declarator
v1.cpp:3: syntax error before `;' token

and with the new parser produces

v1.cpp:3: error: expected constructor, destructor, or type conversion
v1.cpp:3: error: expected `,' or `;'

3) forgetting the std:: namespace (common for gcc-2.95.x code)

#include <vector>
vector<int> foo;

with the new parser:

v1.cpp:3: error: expected constructor, destructor, or type conversion
v1.cpp:3: error: expected `,' or `;'

with the old parser:
v2.cpp:3: error: 'vector' is used as a type, but is not defined as a type.
(not quite correct, but at least it's a good hint)

The old parser's treatment of #1 and #3 is due to an error recovery rule
that I came up with based on Gerald pointing out that the 2.x -> 3.x
conversion would be a disaster without it (as the only thing the 3.0-pre
compiler would say to typical 2.95.x code was lots of "syntax error
before `;` token" messages.  The hack was pretty simple, attempting to
match

IDENTIFIER optional_template_arg_list IDENTIFIER optional_arg_list ';'

which is a sequence that cannot occur in legal C++.  This was good enough
to catch a number of common mistakes.

The new parser's structure should make it possible to do better.  The key
is to attempt a reasonable guess as to what might have been intended;
in some cases, there is really only one possibility.

For example, if we have two unknown identifiers in a row, the only way the
code could be legal is if the first is a type, declaring the second, and
it is mis-spelled or the declaration was forgotten.

If we have a template identifier followed by another identifier, then
the likelihood is that the template argument list was forgotten.

If an symbol is unknown but there is a matching symbol in the std::
namespace, it's possible to print a "did you mean std::foo?" message.

The new parser might want to use a strategy that goes something like this:
make a guess as to what was intended.  If a complete statement can be
parsed according to that guess, then keep it.  Optionally try a second
guess, if there is one available, otherwise skip to some synchronizing
token.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]