[PATCH 0/4]: C++ P1423R3 char8_t remediation implementation

Jonathan Wakely jwakely@redhat.com
Fri Nov 29 21:45:00 GMT 2019


On 29/11/19 19:48 +0000, Jonathan Wakely wrote:
>On 29/11/19 17:45 +0000, Jonathan Wakely wrote:
>>On 15/09/19 15:39 -0400, Tom Honermann wrote:
>>>This series of patches provides an implementation of the changes 
>>>for C++ proposal P1423R3 [1].
>>>
>>>These changes do not impact default libstdc++ behavior for C++17 
>>>and earlier; they are only active for C++2a or when the -fchar8_t 
>>>option is specified.
>>>
>>>Tested x86_64-linux.
>>>
>>>Patch 1: Decouple constraints for u8path from path constructors.
>>>Patch 2: Update __cpp_lib_char8_t feature test macro value, add 
>>>deleted operators, update u8path.
>>>Patch 3: Updates to existing tests.
>>>Patch 4: New tests.
>>>
>>>Tom.
>>>
>>>[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html
>>
>>It took a while, but I've committed these four patches, with just some
>>minor whitespace changes and changelog tweaks.
>
>Running the new tests revealed a latent bug on Windows, where
>experimental::filesystem::u8path(const Source&) assumed the input
>was an iterator over a NTCTS. That worked for a const char* but not a
>std::string or experimental::string_view.
>
>The attached patch fixes that (and simplifies the #if and if-constexpr
>conditions for Windows) but there's a remaining bug. Constructing a
>experimental::filesystem::path from a char8_t string doesn't do the
>right thing on Windows, so these cases fails:
>
> fs::path p(u8"\xf0\x9d\x84\x9e");
> VERIFY( p.u8string() == u8"\U0001D11E" );
>
> p = fs::u8path(u8"\xf0\x9d\x84\x9e");
> VERIFY( p.u8string() == u8"\U0001D11E" );
>
>It works correctly for std::filesystem::path, just not the TS version.

I think this is the fix needed for the TS code:

--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -765,7 +765,14 @@ namespace __detail
       {
 #ifdef _GLIBCXX_USE_CHAR8_T
        if constexpr (is_same<_CharT, char8_t>::value)
-         return _S_wconvert((const char*)__f, (const char*)__l, true_type());
+         {
+           const char* __f2 = (const char*)__f;
+           const char* __l2 = (const char*)__l;
+           std::wstring __wstr;
+           std::codecvt_utf8_utf16<wchar_t> __wcvt;
+           if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
+             return __wstr;
+         }
        else
 #endif
          {

The current code uses std::codecvt<wchar_t, char, mbstate_t> but when
we know the input is UTF-8 encoded we should use codecvt_utf8_utf16
(which is what the C++17 code already does for char8_t input).

I'll add that the patch I'm testing.



More information about the Libstdc++ mailing list