This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Understanding implementation details of wcout and wchar_t vs char32_t


this is a big topic, which indeed, should be either covered more
completely in the docs, either discussed in the gcc-help mailing list
(the OR is not exclusive ;) We also have some open PRs in this area.

Anyway, a few basic clarifications:
> I suspect that this is off topic for the libstdc++ mailing list, but
> also suspect that the only people that point me straight are some of
> the people on this list.  If you can send me towards anywhere more
> appropriate I'd be grateful.
> I want to learn how to write code that can store and deal with any
> code point, so unicode 5.2 is a no brainer.  I've written a code
> conversion facet to wchar_t to char that converts between utf-8
> external and wchar_t unicode internally.  It's tested and works fine,
> but should it instead use char32_t for its internal representation?
> As I've read the specs (both C and C++ since C++ doesn't specify much
> about stdout/in/err), I have been surprised at how much is either
> listed as implementation defined or not specified at all, and maybe I
> some of what I need to know is where to read about gcc libstdc++
> implementation in particular?  I'd love if things where well enough
> specified so that I could write portable code.  If you could help this
> boy learn to fish, I'd be grateful.
As you are correctly pointing out the C++03 standard is pretty vague in
this area. Now, since you are interested in char16_t and char32_t, which
only officialy exist in C++1x, I would first suggest to look closely in
the latest WD:

but at the same time I have to warn you that at the moment very little
is implemented of the C++1x-specific facilities in this area. In
particular the code conversion facets between the internal char16_t and
char32_t representations and the external char-based representations.
Help is certainly welcome. At the moment our plan is waiting a bit for
glibc support, which normally we use as lower-level layer in this area.

That said, note also that we do not provide any sort of char <-> char
conversions, which can be certainly useful in some cases. Also, by
default, wcin & co,  are *nonconverting* and synced char by char to the
C I/O. The user can change both at once by using sync_with_stdio(false).

As I tried to say already, help in this area is definitely welcome,
under to normal rules of course. If you are interested:

and ask me privately the FSF questionnaire, which must be returned to
get the copyright assignment forms.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]