This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
[RFC 0/2] C++11 codecvt specializations.
- From: Rüdiger Sonderfeld <ruediger at c-plusplus dot de>
- To: libstdc++ at gcc dot gnu dot org
- Date: Fri, 25 Apr 2014 23:44:56 +0200
- Subject: [RFC 0/2] C++11 codecvt specializations.
- Authentication-results: sourceware.org; auth=none
Hello,
I started implementing a patch to add the codecvt specializations required by
the C++11 standard (22.4.1.4.3/Table 81 [locale.codecvt]) and the standard
code conversion facets (22.5 [locale.stdcvt]). Adding them requires a
significant amount of changes and several design decisions therefore I wanted
to ask for some feedback now.
The implementation requires some code to convert between UTF-32, UTF-8,
UTF-16. I couldn't find any code for it in libstdc++. I looked into gnulib
but it seems the conversion is done through iconv. Which might not be the
best choice if the codecs are known in advance. I had some existing code for
such a conversion which I adopted and added in the patch.
I'm not sure if my code leaves the from_next pointer at the right place if it
encounters an error while converting UTF-8. If it discovered an invalid
sequence the from_next pointer basically points to the first byte to break the
sequence. I think this is the right behavior according to the standard and it
has the advantage that this might be the point to recover from (e.g., a text
stream and there is one invalid byte "a\xFFb" in that case from_next would
point to b an allow restarting the decoding simply skipping the error). But
maybe my interpretation of the standard is wrong since the specification seems
a bit vague to me. (22.4.1.4.2.2)
It seems there are no facet specializations for char32_t (or char16_t) in
libstdc++ at the moment at all and they probably need to be added. Right now
I only add specializations needed for codecvt. That's why c32locale-inst.cc
does not include locale-inst.cc.
For implementing [locale.stdcvt] I'm not sure how to best approach it. The
standard adds two template parameters `unsigned long _Maxcode' and
`codecvt_mode _Mode'. Implementing this as template would require to expose
the internal UTF-8 conversion functions and a user might end up instantiating
a lot of code. I think the best approach would be to introduce a base class
which takes _Maxcode and _Mode as variable and hands it internally to the
conversion functions. Kinda like
template<typename _Elem>
struct __codecvt_utf8_base
: codecvt<_Elem, char, mbstate_t>
{
__codecvt_utf8_base(unsigned long __maxcode, codecvt_mode __mode);
// ...
};
template<typename _Elem, unsigned long _Maxcode, codecvt_mode _Mode>
struct codecvt_utf8
: __codecvt_utf8_base<_Elem>
{
codecvt_utf8() : __codecvt_utf8_base<_Elem>(_Maxcode, _Mode) { }
// ...
};
(a bit simplified)
That way specializations of __codecvt_utf8_base for char32_t, char16_t, and
wchar_t can be implemented in the library instead of the header similar to
codecvt right now. But I'm not sure if this is allowed by the standard since
it explicitly defines `codecvt_utf8' to directly derive from `codecvt'.
I'm also not sure how to treat wchar_t. Is it defined for libstdc++ to be
generally UCS4 (UTF-32)? Or does libstdc++ also support legacy systems where
wchar_t is only UCS2?
Regards,
RÃdiger
RÃdiger Sonderfeld (2):
libstdc++: Use _GLIBCXX_NOEXCEPT for codecvt.
libstdc++: Add codecvt<char32_t, char, mbstate_t>.
libstdc++-v3/config/abi/pre/gnu.ver | 5 +
libstdc++-v3/include/bits/codecvt.h | 106 ++++-
libstdc++-v3/include/bits/locale_facets.h | 6 +-
libstdc++-v3/src/c++11/Makefile.am | 6 +-
libstdc++-v3/src/c++11/Makefile.in | 17 +-
libstdc++-v3/src/c++11/c32locale-inst.cc | 57 +++
libstdc++-v3/src/c++11/codecvt.cc | 410 +++++++++++++++++
libstdc++-v3/src/c++11/locale_init.cc | 487
+++++++++++++++++++++
libstdc++-v3/src/c++11/localename.cc | 357 +++++++++++++++
libstdc++-v3/src/c++98/Makefile.am | 3 -
libstdc++-v3/src/c++98/Makefile.in | 12 +-
libstdc++-v3/src/c++98/codecvt.cc | 151 -------
libstdc++-v3/src/c++98/locale_init.cc | 472 --------------------
libstdc++-v3/src/c++98/localename.cc | 352 ---------------
.../testsuite/22_locale/codecvt/char32_t/1.cc | 101 +++++
15 files changed, 1532 insertions(+), 1010 deletions(-)
create mode 100644 libstdc++-v3/src/c++11/c32locale-inst.cc
create mode 100644 libstdc++-v3/src/c++11/codecvt.cc
create mode 100644 libstdc++-v3/src/c++11/locale_init.cc
create mode 100644 libstdc++-v3/src/c++11/localename.cc
delete mode 100644 libstdc++-v3/src/c++98/codecvt.cc
delete mode 100644 libstdc++-v3/src/c++98/locale_init.cc
delete mode 100644 libstdc++-v3/src/c++98/localename.cc
create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/char32_t/1.cc
--
1.9.2