UTF-8, UTF-16 and UTF-32

Eljay Love-Jensen eljay@adobe.com
Sun Aug 24 19:11:00 GMT 2008


Hi Dallas,

> Once again, there are no legacy issues because no one is currently using
16-bit Unicode in GCC, it does not exist.

I'm using UTF-16 Unicode in GCC.  I've done so for years.

I do not use wchar_t to specify UTF-16 Unicode, since that is not portable.

The same code runs on different platforms, the Windows platform being
compiled with MSVC++.

Although what you say is not without merit, in that C/C++ do not specify the
character set (let alone the encoding of the character set).

> So I have to ask - what are your arguments for not providing support for all
> three, 8-bit, 16-bit and 32-bit Unicode strings?

It is not part of ISO 9899 (for C), nor ISO 14882 (for C++).

There are languages which support UTF-8, UTF-16, and UTF-32 Unicode strings.
C and C++ are not those languages.

There are support libraries for Unicode (UTF-8, UTF-16, and UTF-32) for C
and C++.  They work on Linux and on Windows.  You are at liberty to use
those.

If you use Microsoft's extensions to C++, your code is no longer C++... it
is MS-C++.  Portability issues will be problematic, at least until Microsoft
comes out with MSVC++ for Linux and OS X and whatever other platform you are
interested in.

Maybe a future version of C and/or C++ will be more Unicode friendly.

Sincerely,
--Eljay



More information about the Gcc-help mailing list