This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
- To: Andrew Cunningham <andjc at ozemail dot com dot au>
- Subject: Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
- From: Jamie Lokier <egcs at tantalophile dot demon dot co dot uk>
- Date: Fri, 4 Aug 2000 15:20:56 +0200
- Cc: linux-utf8 at nl dot linux dot org, sap-list at redhat dot com, gcc at gcc dot gnu dot org, libc-hacker at sources dot redhat dot com, "Nuesser, Wilhelm" <wilhelm dot nuesser at sap dot com>, "Rohland, Hans-Christoph" <hans-christoph dot rohland at sap dot com>
- References: <816D93CCC927D31188570008C75D1DE1011A0BDF@dbwdfx1a.wdf.sap-ag.de> <000e01bffe12$b0075440$7dd2223f@libadmin>
Andrew Cunningham wrote:
> any implimentation of utf-16 must include the capacity to correctly handle
> valid surrogate pairs. You cann't restrict utf-16 characters to 2-bytes.
That's way conversion from utf-16 to utf-32 should be analogous to
conversion from utf-8 to wchar_t, à la mbtowcs. Etc. The rules about
character by character processing apply. You may wish to use utf32_t
for the intermediate characters, e.g. in a simple parser.
-- Jamie