This is the mail archive of the
libstdc++@sourceware.cygnus.com
mailing list for the libstdc++ project.
Re: FW: Unicode and C++
- To: Shiv at pspl dot co dot in
- Subject: Re: FW: Unicode and C++
- From: "Martin v. Loewis" <martin at loewis dot home dot cs dot tu-berlin dot de>
- Date: Fri, 7 Jul 2000 12:00:07 +0200
- CC: libstdc++ at sourceware dot cygnus dot com
- References: <000601bfe7eb$6fdbcd70$8d02a8c0@intranet.pspl.co.in>
> | - The encoding of wchar_t
>
> Isn't that implicitly supposed to mean Unicode?
Not in the C++ standard, which leaves it implementation-defined.
> I for one do not know of any system where it means anything other
> than Unicode.
All of the Unix systems use such a scheme in the EUC locales, see
http://cns-web.bu.edu/pub/djohnson/web_files/i18n/euc.html
Linux is an exception to the rule, it uses ISO 10646 for wchar_t in
all locales.
A C99 implementation may define __STDC_ISO_10646__ if wchar_t is
indeed ISO 10646 compliant.
> Well this is the most problematic but can anyone tell me why *NIXes
> chose 32bit wchar_t?
For one think, ISO 10646 says a character is coded in four
octets. Furthermore, the BMP is not sufficient in the long run.
> It seems that for most of the living languages 16bit UTF-16 or the
> BMP plane of ISO-10646 is more than enough.
It is by far not enough. Assignments to plane 1 and plane 2 are in
progress; plane 14 is reserved for language tagging. See the Unicode
Consortium pages for details.
Regards,
Martin