This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

To: Andrew Cunningham <andjc at ozemail dot com dot au>
Subject: Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
From: Jamie Lokier <egcs at tantalophile dot demon dot co dot uk>
Date: Fri, 4 Aug 2000 15:20:56 +0200
Cc: linux-utf8 at nl dot linux dot org, sap-list at redhat dot com, gcc at gcc dot gnu dot org, libc-hacker at sources dot redhat dot com, "Nuesser, Wilhelm" <wilhelm dot nuesser at sap dot com>, "Rohland, Hans-Christoph" <hans-christoph dot rohland at sap dot com>
References: <816D93CCC927D31188570008C75D1DE1011A0BDF@dbwdfx1a.wdf.sap-ag.de> <000e01bffe12$b0075440$7dd2223f@libadmin>

Andrew Cunningham wrote:
> any implimentation of utf-16 must include the capacity to correctly handle
> valid surrogate pairs. You cann't restrict utf-16 characters to 2-bytes.

That's way conversion from utf-16 to utf-32 should be analogous to
conversion from utf-8 to wchar_t, à la mbtowcs.  Etc.  The rules about
character by character processing apply.  You may wish to use utf32_t
for the intermediate characters, e.g. in a simple parser.

-- Jamie

References:
- Proposal for 2 Byte Unicode implementation in gcc and glibc
  - From: Nuesser, Wilhelm
- Re: Proposal for 2 Byte Unicode implementation in gcc and glibc
  - From: Andrew Cunningham

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]