This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

cpplib: redoing the lexer


This is a rough outline, in stages, of changes I'm planning to make to
the lexer.  Comments are welcome.

1) Return lexing to a forward-looking process rather than
backward-looking.  This will still be single-pass.

2) 1) enables moving to token-at-a-time rather than line-at-a-time
lexing.

3) 2) enables better tracking of lexer state within the lexer itself,
e.g. by processing directives as they are lexed.  With this, we should
be able to e.g. take various information out of the header files,
optimize false conditional skipping, move the check for use of
poisioned identifiers to one place, cleanup handling of <...> in
#include directives etc.

4) Rework memory management, to distinguish better between permanent
allocations and temporary allocations, probably using obstacks.  The
lexer state tracking makes this easier, e.g. the expansion of a macro
when in #define should go into permanent storage.  This should save us
having to re-allocate memory in handlers like do_define.

5) With improved memory management, move towards N-token lookahead and
/ or lookback, where N is a fixed constant.  This is useful within
cpplib itself when looking for '(' during testing for macros, but
should be more important to front-end parsers when cpplib is
integrated, particularly C++ I suspect.

6) Move more functionality into the preprocessor stage, e.g. ISO
string concatenation, and maybe interpretation of integers and / or
floats.

7) (Longer term) I think the above changes should enable us to
describe token streams, for e.g. precompiled headers, in a much more
compact format than the current 16-bytes-per-token + string /
identifier overhead.  I think it should be possible to reach less than
4 bytes per token, with many tokens just being a single "type" byte,
with a special whitespace token.

Neil.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]