This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: thoughts on martin's proposed patch for GCC and UTF-8


On Mon, 21 Dec 1998 18:57:01 -0800 (PST), Paul Eggert wrote:
>   Date: Mon, 21 Dec 1998 10:00:13 -0500
>   From: Zack Weinberg <zack@rabi.columbia.edu>
>
>   GCC should only care about the character set, not the rest of the
>   locale.  Therefore, it makes sense to use the charset names from the
>   iconv library (part of glibc 2.1, also in Solaris and probably
>   elsewhere) which are the names standardized by the MIME RFCs.
>
>This is a good suggestion.  I assume you're saying that GCC should use
>directives like `#charset "SJIS"' rather than directives like `#locale
>"ja"', since the other attributes of "ja" are not important for GCC.

Yes.

You have a point about this being something that belongs in the
environment.  I don't have any experience in this field and can't say
what makes the most sense.

>Unfortunately, this suggestion doesn't solve the problem of unportable
>directives in practice, because the charset+encoding names are not
>standardized well either.

The MIME RFCs try to standardize charset+encoding names.  It's the
closest to a proper standard there is, and all the iconv(3)
implementations I know of (all two of them :)  support all those
names.

I'm tempted to suggest that we use iconv to convert everything to
UTF-8 (Java seems to need this, and consistency is good) but only when
it comes with the system.  When it isn't available, we don't even try
to support extended charsets.  Trying to support all the different
incompatible encoding libraries out there would be a nightmare, and
importing glibc's iconv is impractical - it's >4megs of code.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]