This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Solaris -vs- iconv
Tom Tromey <tromey@redhat.com> writes:
> * We want to use UCS-2 in the lexer.
> Well, ok, we probably don't really *need* to. We currently do
> because I didn't feel like rewriting the whole lexer. However using
> UCS-2 here is reasonable since it makes parts of the code cleaner.
My intuition would be that the lexer should use UTF-8. This what
we should be using for identifiers and assembler labels. It is also
(what at some point should be) the preferred encoding for input files.
So I think we should optimize for UTF-8 input files.
> Using something like UTF-8 would mean returning strings and such.
I don't understand this. Do you mean what the lexer returns as the
value of a character token? That seems like an issue completely
unrelated issue to what kind of buffers the lexer and parser uses.
I don't see any difference in terms of programming complexity or
performance between a buffer of UCS-2 characters and a buffer of
bytes in UTF-8 encoding.
--
--Per Bothner
per@bothner.com http://www.bothner.com/~per/