UTF-8 support - char or wchar_t?

Paolo Carlini pcarlini@suse.de
Sun Jun 20 18:11:00 GMT 2004


Ole Laursen wrote:

> The closing comments in my bug report seem to suggest that I should
> use the wchar_t equivalents of the stringstream classes. As far as I
> know, that would mean that I get 4 byte wide character strings in
> UCS-32 (but am I guaranteed that?). I can perhaps find a way to
> convert this to UTF-8, but doesn't libstdc++ support UTF-8 in a more
> direct manner?

As I tried to explain in the PR, in the ISO C++ Standard, the thousands
separator is a *single* char in the internal encoding, there is *no*
doubt about this. In general, that internal encoding had better be
sufficiently wide then, i.e., wchar_t. Afterwards, you need a converting
stream in order to actually have the internal representation converted
to the external representation, UTF-8, for instance (and viceversa, of
course). This happens with fstreams (as mandated by the ISO standard,
again) that, during I/O, exploit the codecvt facet for this purpose,
automatically. You have only to imbue the stream with the appropriate
locale.

Frankly, I'm not sure to understand in detail what you are trying to
accomplish via the C++ run time library: the facilities available are
only those mandated by the Standard, nothing more (and not much less, we
sincerely hope ;)

For more information, besides the Standard itself, I would recommend
the nice books from Josuttis and Langer & Kreft.

Paolo.

P.S. As I explained in the PR, you need gcc3.4.0 for iostreams + UTF-8
locales to work well together.



More information about the Libstdc++ mailing list