This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC 0/2] C++11 codecvt specializations.


Hello,

I started implementing a patch to add the codecvt specializations required by 
the C++11 standard (22.4.1.4.3/Table 81 [locale.codecvt]) and the standard 
code conversion facets (22.5 [locale.stdcvt]).  Adding them requires a 
significant amount of changes and several design decisions therefore I wanted 
to ask for some feedback now.

The implementation requires some code to convert between UTF-32, UTF-8, 
UTF-16.  I couldn't find any code for it in libstdc++.  I looked into gnulib 
but it seems the conversion is done through iconv.  Which might not be the 
best choice if the codecs are known in advance.  I had some existing code for 
such a conversion which I adopted and added in the patch.

I'm not sure if my code leaves the from_next pointer at the right place if it 
encounters an error while converting UTF-8.  If it discovered an invalid 
sequence the from_next pointer basically points to the first byte to break the 
sequence.  I think this is the right behavior according to the standard and it 
has the advantage that this might be the point to recover from (e.g., a text 
stream and there is one invalid byte "a\xFFb" in that case from_next would 
point to b an allow restarting the decoding simply skipping the error).  But 
maybe my interpretation of the standard is wrong since the specification seems 
a bit vague to me. (22.4.1.4.2.2)

It seems there are no facet specializations for char32_t (or char16_t) in 
libstdc++ at the moment at all and they probably need to be added.  Right now 
I only add specializations needed for codecvt.  That's why c32locale-inst.cc 
does not include locale-inst.cc.

For implementing [locale.stdcvt] I'm not sure how to best approach it.  The 
standard adds two template parameters `unsigned long _Maxcode' and 
`codecvt_mode _Mode'.  Implementing this as template would require to expose 
the internal UTF-8 conversion functions and a user might end up instantiating 
a lot of code.  I think the best approach would be to introduce a base class 
which takes _Maxcode and _Mode as variable and hands it internally to the 
conversion functions.  Kinda like

template<typename _Elem>
struct __codecvt_utf8_base
  : codecvt<_Elem, char, mbstate_t>
{
  __codecvt_utf8_base(unsigned long __maxcode, codecvt_mode __mode);
  // ...
};

template<typename _Elem, unsigned long _Maxcode, codecvt_mode _Mode>
struct codecvt_utf8
  : __codecvt_utf8_base<_Elem>
{
  codecvt_utf8() : __codecvt_utf8_base<_Elem>(_Maxcode, _Mode) { }
  // ...
};

(a bit simplified)

That way specializations of __codecvt_utf8_base for char32_t, char16_t, and 
wchar_t can be implemented in the library instead of the header similar to 
codecvt right now.  But I'm not sure if this is allowed by the standard since 
it explicitly defines `codecvt_utf8' to directly derive from `codecvt'.

I'm also not sure how to treat wchar_t.  Is it defined for libstdc++ to be 
generally UCS4 (UTF-32)?  Or does libstdc++ also support legacy systems where 
wchar_t is only UCS2?

Regards,
RÃdiger

RÃdiger Sonderfeld (2):
  libstdc++: Use _GLIBCXX_NOEXCEPT for codecvt.
  libstdc++: Add codecvt<char32_t, char, mbstate_t>.

 libstdc++-v3/config/abi/pre/gnu.ver                |   5 +
 libstdc++-v3/include/bits/codecvt.h                | 106 ++++-
 libstdc++-v3/include/bits/locale_facets.h          |   6 +-
 libstdc++-v3/src/c++11/Makefile.am                 |   6 +-
 libstdc++-v3/src/c++11/Makefile.in                 |  17 +-
 libstdc++-v3/src/c++11/c32locale-inst.cc           |  57 +++
 libstdc++-v3/src/c++11/codecvt.cc                  | 410 +++++++++++++++++
 libstdc++-v3/src/c++11/locale_init.cc              | 487 
+++++++++++++++++++++
 libstdc++-v3/src/c++11/localename.cc               | 357 +++++++++++++++
 libstdc++-v3/src/c++98/Makefile.am                 |   3 -
 libstdc++-v3/src/c++98/Makefile.in                 |  12 +-
 libstdc++-v3/src/c++98/codecvt.cc                  | 151 -------
 libstdc++-v3/src/c++98/locale_init.cc              | 472 --------------------
 libstdc++-v3/src/c++98/localename.cc               | 352 ---------------
 .../testsuite/22_locale/codecvt/char32_t/1.cc      | 101 +++++
 15 files changed, 1532 insertions(+), 1010 deletions(-)
 create mode 100644 libstdc++-v3/src/c++11/c32locale-inst.cc
 create mode 100644 libstdc++-v3/src/c++11/codecvt.cc
 create mode 100644 libstdc++-v3/src/c++11/locale_init.cc
 create mode 100644 libstdc++-v3/src/c++11/localename.cc
 delete mode 100644 libstdc++-v3/src/c++98/codecvt.cc
 delete mode 100644 libstdc++-v3/src/c++98/locale_init.cc
 delete mode 100644 libstdc++-v3/src/c++98/localename.cc
 create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/char32_t/1.cc

-- 
1.9.2


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]