Suggestion for improving C++ parser memory usage

Roger Sayle roger@eyesopen.com
Thu Dec 30 01:47:00 GMT 2004


I have a suggestion/technical question to ask of the C++ parser gurus.
The idea came from investigating some of the memory usage regressions
inherent in the new recursive descent parser, particularly PRs 10349
and 12454.

The issue is that in deep {if-then-else}* nests, the parse of successive
clauses is within the context of the preceding clause.  This leads to
a parse stack and memory usage that is linear in the number statements.

The suggestion/question is whether it is possible to avoid the allocation
and deallocation of "scopes" in cp_parser_implicitly_scoped_statement by
using a single token lookahead to confirm that the following statement
doesn't/can't need an enclosing scope?  The calls to begin_compound_stmt
and finish_compound_stmt account for much of the memory usage in these
pathological cases, but avoiding these calls should also help most C++
code.


The first observation is that cp_parser_implicitly_scoped_statement
can be tweaked such that the return type is void rather than tree,
as its result is never used.  Next a call to cp_lexer_peek_token can be
used to enter a switch statement that determines if the following keyword
or token can't possibly need its own scope.  For example, ";" but
hopefully as RID_IF and many of the remaining C++ constructs.

>From my limited understanding of C++ parsing, statements such as
RID_IF, RID_WHILE, RID_DO and RID_FOR don't/can't use an immediately
enclosing scope, as they open new scopes for their conditions and
bodies.


Is such an optimization possible/permissible?

Roger
--



More information about the Gcc mailing list