This is the mail archive of the
mailing list for the GCC project.
Re: gcc compile-time performance
- From: dewar at gnat dot com (Robert Dewar)
- To: neil at daikokuya dot demon dot co dot uk, zack at codesourcery dot com
- Cc: aoliva at redhat dot com, chip dot cuntz at earthling dot net, davem at redhat dot com,dewar at gnat dot com, gcc at gcc dot gnu dot org, jh at suse dot cz
- Date: Sun, 19 May 2002 14:03:17 -0400 (EDT)
- Subject: Re: gcc compile-time performance
<<We shouldn't be doing mbchar calls in the fast path, though. I
thought we were going to pre-convert to UTF8 a line at a time, then
Preconverting to UTF-8 is a pretty inefficient way of handling things.
For example, take a typical Shift-JIS encoded source. Relatively few
characters will actually be encoded and Shift-JIS is far easier to deal
with than UTF-8. I see how it is easier to have to deal with only one
encoding in the lexer, but really the experience with GNAT showed that
it is not that hard to do things efficiently and deal with all the
Basically the approach is the following:
for codes in the range 16#00# to 16#7F#, take code as it is (this covers
the vast majority of cases in practice).
if that code is ESC, then jump to the section of code dealing with the
for codes in the range 16#80# to 16#FF# we incur one test, to see if we
are encoding, if not, take the character as it is (there won't be many
of these in practice). If so go interpret the encoding. This is really
not hard at all.