This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: thoughts on martin's proposed patch for GCC and UTF-8
- To: bothner at cygnus dot com
- Subject: Re: thoughts on martin's proposed patch for GCC and UTF-8
- From: Paul Eggert <eggert at twinsun dot com>
- Date: Mon, 21 Dec 1998 19:43:51 -0800 (PST)
- CC: rms at gnu dot org, amylaar at cygnus dot co dot uk, martin at mira dot isdn dot cs dot tu-berlin dot de, gcc2 at gnu dot org, egcs at cygnus dot com
- References: <199812220245.SAA05358@cygnus.com>
Date: Mon, 21 Dec 1998 18:45:09 -0800
From: Per Bothner <bothner@cygnus.com>
Yes, we could have auto-detection for C but not Java,
but that does seem rather clumsy.
It would be nice to use the same method for all languages, yes.
This is a good argument against autodetection.
libc should be written in UTF-8, but an
application may be written in a local character set.
libc's identifiers use only the "C" subset of ASCII, and therefore
libc will link to an application written in any locale, even if we use
the native multibyte encoding for identifiers.
Given that [.o] symbols have to be in a common character encoding,
it follows that you cannot possibly do autodetection, at least not
for identifiers.
I don't see how this follows. The compiler could use autodetection to
discover the input character set, and then translate the identifiers'
characters to UTF-8 when outputting assembly language.