This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Very Fast: Directly Coded Lexical Analyzer
- From: "Frank Schaefer" <frank_r_schaefer at gmx dot net>
- To: "J.C. Pizarro" <jcpiza at gmail dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Fri, 01 Jun 2007 09:28:28 +0200
- Subject: Re: Very Fast: Directly Coded Lexical Analyzer
- References: <998d0e4a0705311334w7eba684cwcfe1ed70c4b63add@mail.gmail.com>
> Like e.g. the generated code
> IF match-char1 THEN ..
> ELSIF match-char2 THEN ..
> ELSIF match-char3 THEN ..
> ..
> END
> ?
Similar. It identifies ranges of character point sets and
brackets them via binary bracketting.
> Why to complicate the things? The determinist finite automaton (DFA)
> table-driven helps us to express efficiently this with regular
> expressions (RE).
Because the directly coded approach avoids overhead of housekeeping
for the lexer state. As you might have realized, the RE in quex
follow the syntax of flex/lex.
> Too, with the Ken Thompson & Kleene theories, it's not easy to be
> hand-written the implementation of an analyzer or analyzer generator.
The analyzer generator does, indeed, do the Thomson Construction and
repetition of state machines according to Kleene Closure with
epsilon transition. And it is done. If you have any critics about
the coverage of the unit tests in core_engine/state_machine/TEST,
you are free to add some new. Let me know.
> It's possible to be unstable your quex analyzer generator while the
> lex/flex generator is stable.
Well, I tried to write unit tests for almost all functionalities
and features of the core engine. This gives me confidence that
the thing is to a high degree stable.
> To obtain 200-250% in speed gain won't be possible for this GCC
> optimizing compiler because of http://en.wikipedia.org/wiki/Amdahl%27s_law
Amdahl's Law talks about paralellism. That is not the case here.
He apply a different approach for lexical analysis as mentioned
above. If you talk about the influence of a new lexer to the
speed of the whole compiler, I would aggree that the a speed up
factor 2 is not possible if lexical analysis constitutes less
than 50% of the total computation effort (simple math).
> GCC throws more time optimizing than analyzing lexically unless it
> deactivates the optimizing option -O.
Here I aggree, and this supports the previous point.
> The basic transformations of the Ken Thomson & Kleene theories like
> the NFA <-> DFA transformations are hard too.
Well, textbooks pinpoint the process with enough clarity.
If you doubt the functionality of my subset construction,
again, please argue against the unit tests.
Best Regards
Frank
--
// Dr.-Ing. Frank-René Schäfer, Bodelschwinghstr. 28
// D-50170 Kerpen
// Tel.: 49+176/22 02 58 59;