This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Very Fast: Directly Coded Lexical Analyzer


On 5/31/07, Joseph S. Myers <joseph@codesourcery.com> wrote:
Zack had some ideas a few years ago (I don't think they were ever posted
to a public list) about how to speed up _cpp_clean_line in particular, and
some or all of translation phases 1 to 3 in general.  The idea is that you
have several Mealy machines (state machines where all the work happens in
transitions), where edges apply for a given set of input characters in a
given state and describe the actions to be taken in that case.  Actions
include both passing output to another machine, and emitting diagnostics.
So you start with converting character sets to UTF-8, then strip trailing
whitespace and canonicalise newlines, then convert trigraphs, then remove
backslash-newline pairs, then strip comments, then split the file into
preprocessing tokens.

This is an interesting idea. I spent some time around the beginning of the year trying to optimize the preprocessor to improve distcc performance. I've got some baseline oprofile results available at http://docs.google.com/Doc?id=ddn4ddq4_34fsdqhv. This is taken with a Google modified version of gcc-4.1.1, so your mileage may vary. It shows that _cpp_lex_direct, _cpp_clean_line, and lex_identifier collectively account for about 35% of preprocessing time.

I tried to improve performance without substantial changes to
functionality, but was only able to grab a few percentage points here
and there.  Ultimately, I implemented an optimized "directives-only"
preprocessor, which implements the bare minimum preprocessing required
to correctly implement include directives.  It gives about a 30%
decrease in preprocessor time.  The patch is available at
http://gcc.gnu.org/ml/gcc-patches/2007-02/msg02178.html with an
updated summary at
http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00786.html.  It's still
pending review at this time.

For comparison, the preprocessor in gcc-2.95 operates in about 1/3rd
the time of the preprocessor in gcc-4.1.1 (regardless of whether or
not traditional mode is used).  A good chunk of that time, though,
seems to be spent in the long tail of routines which consume
sub-percentage quantities of time.

Ollie


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]