This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Solaris -vs- iconv


Tom Tromey <tromey@redhat.com> writes:

> * We want to use UCS-2 in the lexer.
>   Well, ok, we probably don't really *need* to.  We currently do
>   because I didn't feel like rewriting the whole lexer.  However using
>   UCS-2 here is reasonable since it makes parts of the code cleaner.

My intuition would be that the lexer should use UTF-8.  This what
we should be using for identifiers and assembler labels.  It is also
(what at some point should be) the preferred encoding for input files.
So I think we should optimize for UTF-8 input files.

>   Using something like UTF-8 would mean returning strings and such.

I don't understand this.  Do you mean what the lexer returns as the
value of a character token?  That seems like an issue completely
unrelated issue to what kind of buffers the lexer and parser uses.
I don't see any difference in terms of programming complexity or
performance between a buffer of UCS-2 characters and a buffer of
bytes in UTF-8 encoding.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/~per/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]