This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: thoughts on martin's proposed patch for GCC and UTF-8
- To: eggert at twinsun dot com
- Subject: Re: thoughts on martin's proposed patch for GCC and UTF-8
- From: Martin von Loewis <martin at mira dot isdn dot cs dot tu-berlin dot de>
- Date: Sat, 12 Dec 1998 11:18:00 +0100
- CC: bothner at cygnus dot com, gcc2 at gnu dot org, egcs at cygnus dot com
- References: <199812100702.XAA26400@cygnus.com> <199812120323.TAA10442@shade.twinsun.com>
> I have misgivings about having GCC support multiple locales
> simultaneously.
So how about that:
gcc/g++ process strictly-conforming input that is already in the base
character set (plus \u escapes, in a way that the standards
mandate. Object files are then UTF-8, (or U escapes for C++).
gcc/g++ also process input based on the current locale, and pass the
input unmodified to the output.
There is no interworking between the two (i.e: characters in the
current locale are not at all related to \u escapes)
This means that the compiler, in locale-aware mode, would not be
strictly conforming, but so what? People could ask their editors to
save files in C/C++ style encoding if they want portable source files,
or use filters.
If this sounds like a reasonable strategy, we only need to worry how
to combine the two algorithms, i.e. how we arrange processing of
identifiers and strings both using the C wchar functions, and
recognizing \u.
What do you think?
Martin