This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc compile-time performance


Robert Dewar wrote:-

> Again, I don't see that affecting basic lexical scanning, and why would
> one ever want to look ahead or backwards in the character stream for the
> lexical analyzer (I agree it makes life harder for general text processing,
> and indeed the peek functions in Ada.Text_IO (which supports all these
> encoding methods) are a pain, but I see no impact on the lexical analysis.
> 
> I do not understand why trigraphs make life harder here, so I probably am
> missing some key point.

You need to look ahead many times, such as when seeing '.' you need two
chars to see if it's '...'.  But that can be arbitrarily long because
it could be '.\\n.\\n.".  If you're using the mb functions, what do you
do with the chars you've just read in if the 3rd one wasn't a dot?
You can't just go back to after the initial dot, because the mb functions
have state.  So I imagine you have to buffer them elsewhere, and that
means maintaining a buffer that needs to be checked whenever you read
a character.  It gets nasty.

I think a better solution is to scan logical lines in before doing
any kind of tokenization, possibly to UTF-8 since then lookahead and
look-back are not a problem, but that then leads to other issues like
knowing what line and column in the physical source file any given
character from the logical line is.  [And how do you get this info if
using iconv()?].

I can't see a really clean solution to these issues.  However, I'm
no expert on the mb stuff, so I could be missing something.

Neil.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]