This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Understanding implementation details of wcout and wchar_t vs char32_t

From: Paolo Carlini <paolo dot carlini at oracle dot com>
To: phorgan1 at yahoo dot com
Cc: libstdc++ at gcc dot gnu dot org
Date: Sun, 28 Feb 2010 11:59:16 +0100
Subject: Re: Understanding implementation details of wcout and wchar_t vs char32_t
References: <4B8A1DDA.7060501@yahoo.com>

Hi,

this is a big topic, which indeed, should be either covered more
completely in the docs, either discussed in the gcc-help mailing list
(the OR is not exclusive ;) We also have some open PRs in this area.

Anyway, a few basic clarifications:
> I suspect that this is off topic for the libstdc++ mailing list, but
> also suspect that the only people that point me straight are some of
> the people on this list.  If you can send me towards anywhere more
> appropriate I'd be grateful.
>
> I want to learn how to write code that can store and deal with any
> code point, so unicode 5.2 is a no brainer.  I've written a code
> conversion facet to wchar_t to char that converts between utf-8
> external and wchar_t unicode internally.  It's tested and works fine,
> but should it instead use char32_t for its internal representation?
>
> As I've read the specs (both C and C++ since C++ doesn't specify much
> about stdout/in/err), I have been surprised at how much is either
> listed as implementation defined or not specified at all, and maybe I
> some of what I need to know is where to read about gcc libstdc++
> implementation in particular?  I'd love if things where well enough
> specified so that I could write portable code.  If you could help this
> boy learn to fish, I'd be grateful.
As you are correctly pointing out the C++03 standard is pretty vague in
this area. Now, since you are interested in char16_t and char32_t, which
only officialy exist in C++1x, I would first suggest to look closely in
the latest WD:

  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3035.pdf

but at the same time I have to warn you that at the moment very little
is implemented of the C++1x-specific facilities in this area. In
particular the code conversion facets between the internal char16_t and
char32_t representations and the external char-based representations.
Help is certainly welcome. At the moment our plan is waiting a bit for
glibc support, which normally we use as lower-level layer in this area.

That said, note also that we do not provide any sort of char <-> char
conversions, which can be certainly useful in some cases. Also, by
default, wcin & co,  are *nonconverting* and synced char by char to the
C I/O. The user can change both at once by using sync_with_stdio(false).

As I tried to say already, help in this area is definitely welcome,
under to normal rules of course. If you are interested:

    http://gcc.gnu.org/contribute.html

and ask me privately the FSF questionnaire, which must be returned to
get the copyright assignment forms.

Paolo.

References:
- Understanding implementation details of wcout and wchar_t vs char32_t
  - From: Patrick Horgan

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]