[PATCH] PR18785: Support non-native execution charsets
Kai Henningsen
kaih@khms.westfalen.de
Thu Dec 23 00:12:00 GMT 2004
zack@codesourcery.com (Zack Weinberg) wrote on 22.12.04 in <87hdmeglfk.fsf@codesourcery.com>:
> * The source character set: the encoding used by internal processing
> in translation phases 1b-4 (1a is the conversion from input to
> source character set). This has several major constraints on it:
>
> - It has to be a proper multibyte character set as C99 defines
> that term (5.2.1.2p1). It may NOT have a state-dependent
> encoding.
>
> - It has to be isomorphic to ISO 10646 (Unicode) so that \u, \U
> escapes are meaningful. (Because of this, the source
> character set cannot be a single-byte encoding.)
>
> - All characters within the basic source character set must have
> the same code points that they do in ...
>
> * The host character set: that is, the narrow execution character
> set of the host machine. At present this is always either ASCII
> or EBCDIC, and we assume that whichever variant of EBCDIC is in
> use does not alter the code points corresponding to the basic
> source character set.
You do realize, I hope, that not all EBCDIC codepages have consistent
codepoints for at least {}[] (probably more)? This makes that "must have
the same code points" thing rather hard.
I still believe that rule is utterly misguided. Trying to use UTF-EBCDIC
really is ALWAYS a mistake.
MfG Kai
More information about the Gcc-patches
mailing list