[PATCH] PR18785: Support non-native execution charsets

Thu Dec 23 00:12:00 GMT 2004

zack@codesourcery.com (Zack Weinberg)  wrote on 22.12.04 in <87hdmeglfk.fsf@codesourcery.com>:

>   * The source character set: the encoding used by internal processing
>     in translation phases 1b-4 (1a is the conversion from input to
>     source character set).  This has several major constraints on it:
>
>       - It has to be a proper multibyte character set as C99 defines
>         that term (5.2.1.2p1).  It may NOT have a state-dependent
>         encoding.
>
>       - It has to be isomorphic to ISO 10646 (Unicode) so that \u, \U
>         escapes are meaningful.  (Because of this, the source
>         character set cannot be a single-byte encoding.)
>
>       - All characters within the basic source character set must have
>         the same code points that they do in ...
>
>    * The host character set: that is, the narrow execution character
>      set of the host machine.  At present this is always either ASCII
>      or EBCDIC, and we assume that whichever variant of EBCDIC is in
>      use does not alter the code points corresponding to the basic
>      source character set.

You do realize, I hope, that not all EBCDIC codepages have consistent  
codepoints for at least {}[] (probably more)? This makes that "must have  
the same code points" thing rather hard.

I still believe that rule is utterly misguided. Trying to use UTF-EBCDIC  
really is ALWAYS a mistake.

MfG Kai