This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gcc compile-time performance

From: dewar at gnat dot com (Robert Dewar)
To: neil at daikokuya dot demon dot co dot uk, zack at codesourcery dot com
Cc: aoliva at redhat dot com, chip dot cuntz at earthling dot net, davem at redhat dot com,dewar at gnat dot com, gcc at gcc dot gnu dot org, jh at suse dot cz
Date: Sun, 19 May 2002 14:03:17 -0400 (EDT)
Subject: Re: gcc compile-time performance

<<We shouldn't be doing mbchar calls in the fast path, though.  I
thought we were going to pre-convert to UTF8 a line at a time, then
>>

Preconverting to UTF-8 is a pretty inefficient way of handling things.
For example, take a typical Shift-JIS encoded source. Relatively few
characters will actually be encoded and Shift-JIS is far easier to deal
with than UTF-8. I see how it is easier to have to deal with only one
encoding in the lexer, but really the experience with GNAT showed that
it is not that hard to do things efficiently and deal with all the
different encodings. 

Basically the approach is the following:

for codes in the range 16#00# to 16#7F#, take code as it is (this covers
the vast majority of cases in practice).

if that code is ESC, then jump to the section of code dealing with the
escaped sequences.

for codes in the range 16#80# to 16#FF# we incur one test, to see if we
are encoding, if not, take the character as it is (there won't be many
of these in practice). If so go interpret the encoding. This is really
not hard at all.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]