This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug libstdc++/80041] std::codecvt_utf16<wchar_t> converts to UTF-8 not UTF-16

From: "redi at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 17 Mar 2017 19:29:02 +0000
Subject: [Bug libstdc++/80041] std::codecvt_utf16<wchar_t> converts to UTF-8 not UTF-16
Auto-submitted: auto-generated
References: <bug-80041-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80041

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Author: redi
Date: Fri Mar 17 19:28:29 2017
New Revision: 246246

URL: https://gcc.gnu.org/viewcvs?rev=246246&root=gcc&view=rev
Log:
Backport <codecvt> fixes from trunk

Fix alignment bugs in std::codecvt_utf16

        * src/c++11/codecvt.cc (range): Add non-type template parameter and
        define oerloaded operators for reading and writing code units.
        (range<Elem, false>): Define partial specialization for accessing
        wide characters in potentially unaligned byte ranges.
        (ucs2_span(const char16_t*, const char16_t*, ...))
        (ucs4_span(const char16_t*, const char16_t*, ...)): Change parameters
        to range<const char16_t, false> in order to avoid unaligned reads.
        (__codecvt_utf16_base<char16_t>::do_out)
        (__codecvt_utf16_base<char32_t>::do_out)
        (__codecvt_utf16_base<wchar_t>::do_out): Use range specialization for
        unaligned data to avoid unaligned writes.
        (__codecvt_utf16_base<char16_t>::do_in)
        (__codecvt_utf16_base<char32_t>::do_in)
        (__codecvt_utf16_base<wchar_t>::do_in): Likewise for writes. Return
        error if there are unprocessable trailing bytes.
        (__codecvt_utf16_base<char16_t>::do_length)
        (__codecvt_utf16_base<char32_t>::do_length)
        (__codecvt_utf16_base<wchar_t>::do_length): Pass arguments of type
        range<const char16_t, false> to span functions.
        * testsuite/22_locale/codecvt/codecvt_utf16/misaligned.cc: New test.

PR libstdc++/79980 fix target type of cast

        PR libstdc++/79980
        * src/c++11/codecvt.cc (to_integer(codecvt_mode)): Fix target type.

PR libstdc++/80041 fix codecvt_utf16<wchar_t> to use UTF-16 not UTF-8

        PR libstdc++/80041
        * src/c++11/codecvt.cc (__codecvt_utf16_base<wchar_t>::do_out)
        (__codecvt_utf16_base<wchar_t>::do_in): Convert char arguments to
        char16_t to work with UTF-16 instead of UTF-8.
        * testsuite/22_locale/codecvt/codecvt_utf16/80041.cc: New test.

Fix encoding() and max_length() values for codecvt facets

        * src/c++11/codecvt.cc (codecvt<char16_t, char, mbstate_t>)
        (codecvt<char32_t, char, mbstate_t>, __codecvt_utf8_base<char16_t>)
        (__codecvt_utf8_base<char32_t>, __codecvt_utf8_base<wchar_t>)
        (__codecvt_utf16_base<char16_t>, __codecvt_utf16_base<char32_t>)
        (__codecvt_utf16_base<wchar_t>, __codecvt_utf8_utf16_base<char16_t>)
        (__codecvt_utf8_utf16_base<char32_t>)
        (__codecvt_utf8_utf16_base<wchar_t>): Fix do_encoding() and
        do_max_length() return values.
        * testsuite/22_locale/codecvt/codecvt_utf16/members.cc: New test.
        * testsuite/22_locale/codecvt/codecvt_utf8/members.cc: New test.
        * testsuite/22_locale/codecvt/codecvt_utf8_utf16/members.cc: New test.

PR libstdc++/79980 fix BOM detection, maxcode checks, UCS2 handling

        PR libstdc++/79980
        * include/bits/locale_conv.h (__do_str_codecvt): Set __count on
        error path.
        * src/c++11/codecvt.cc (operator&=, operator|=, operator~): Overloads
        for manipulating codecvt_mode values.
        (read_utf16_bom): Compare input to BOM constants instead of integral
        constants that depend on endianness.  Take mode parameter by
        reference and adjust it, to distinguish between no BOM present and
        UTF-16BE BOM present.
        (ucs4_in, ucs2_span, ucs4_span): Adjust calls to read_utf16_bom.
        (surrogates): New enumeration type.
        (utf16_in, utf16_out): Add surrogates parameter to choose between
        UTF-16 and UCS2 behaviour.
        (utf16_span, ucs2_span): Use std::min not std::max.
        (ucs2_out): Use std::min not std::max.  Disallow surrogate pairs.
        (ucs2_in): Likewise. Adjust calls to read_utf16_bom.
        * testsuite/22_locale/codecvt/codecvt_utf16/79980.cc: New test.
        * testsuite/22_locale/codecvt/codecvt_utf8/79980.cc: New test.

PR libstdc++/79511 fix endianness of UTF-16 data

        PR libstdc++/79511
        * src/c++11/codecvt.cc (write_utf16_code_point): Don't write 0xffff
        as a surrogate pair.
        (__codecvt_utf8_utf16_base<char32_t>::do_in): Use native endianness
        for internal representation.
        (__codecvt_utf8_utf16_base<wchar_t>::do_in): Likewise.
        * testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc: New test.

Added:
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/79980.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/80041.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/members.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/misaligned.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/79980.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/members.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc
   
branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/members.cc
Modified:
    branches/gcc-6-branch/libstdc++-v3/ChangeLog
    branches/gcc-6-branch/libstdc++-v3/include/bits/locale_conv.h
    branches/gcc-6-branch/libstdc++-v3/src/c++11/codecvt.cc
    branches/gcc-6-branch/libstdc++-v3/testsuite/22_locale/codecvt/char16_t.cc

References:
- [Bug libstdc++/80041] New: std::codecvt_utf16<wchar_t> converts to UTF-8 not UTF-16
  - From: redi at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]