This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Understanding implementation details of wcout and wchar_t vs char32_t
- From: Paolo Carlini <paolo dot carlini at oracle dot com>
- To: phorgan1 at yahoo dot com
- Cc: libstdc++ at gcc dot gnu dot org
- Date: Sun, 28 Feb 2010 11:59:16 +0100
- Subject: Re: Understanding implementation details of wcout and wchar_t vs char32_t
- References: <4B8A1DDA.7060501@yahoo.com>
Hi,
this is a big topic, which indeed, should be either covered more
completely in the docs, either discussed in the gcc-help mailing list
(the OR is not exclusive ;) We also have some open PRs in this area.
Anyway, a few basic clarifications:
> I suspect that this is off topic for the libstdc++ mailing list, but
> also suspect that the only people that point me straight are some of
> the people on this list. If you can send me towards anywhere more
> appropriate I'd be grateful.
>
> I want to learn how to write code that can store and deal with any
> code point, so unicode 5.2 is a no brainer. I've written a code
> conversion facet to wchar_t to char that converts between utf-8
> external and wchar_t unicode internally. It's tested and works fine,
> but should it instead use char32_t for its internal representation?
>
> As I've read the specs (both C and C++ since C++ doesn't specify much
> about stdout/in/err), I have been surprised at how much is either
> listed as implementation defined or not specified at all, and maybe I
> some of what I need to know is where to read about gcc libstdc++
> implementation in particular? I'd love if things where well enough
> specified so that I could write portable code. If you could help this
> boy learn to fish, I'd be grateful.
As you are correctly pointing out the C++03 standard is pretty vague in
this area. Now, since you are interested in char16_t and char32_t, which
only officialy exist in C++1x, I would first suggest to look closely in
the latest WD:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3035.pdf
but at the same time I have to warn you that at the moment very little
is implemented of the C++1x-specific facilities in this area. In
particular the code conversion facets between the internal char16_t and
char32_t representations and the external char-based representations.
Help is certainly welcome. At the moment our plan is waiting a bit for
glibc support, which normally we use as lower-level layer in this area.
That said, note also that we do not provide any sort of char <-> char
conversions, which can be certainly useful in some cases. Also, by
default, wcin & co, are *nonconverting* and synced char by char to the
C I/O. The user can change both at once by using sync_with_stdio(false).
As I tried to say already, help in this area is definitely welcome,
under to normal rules of course. If you are interested:
http://gcc.gnu.org/contribute.html
and ask me privately the FSF questionnaire, which must be returned to
get the copyright assignment forms.
Paolo.