This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libstdc++/66441] New: wstring_convert not working correctly


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66441

            Bug ID: 66441
           Summary: wstring_convert not working correctly
           Product: gcc
           Version: 5.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lcarreon at bigpond dot net.au
  Target Milestone: ---

Created attachment 35707
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35707&action=edit
Test program which demonstrates the issue

In my opinion, there is a problem with the way wstring_convert behaves.

I use Fedora 22 32-bit and 64-bit which includes GCC 5.1.1.

Attached is a test program which demonstrates the problem.  This test program
converts a UTF-8 string which does not have a BOM into a UCS-4 string.  The
UCS-4 string is then converted to a UTF-16LE string.  The test program is
composed of two parts: 1) performs the conversion using the codecvt facets
directly and 2) performs the conversion using wstring_convert.

I compiled the test program using the following command:

g++ -std=c++14 -o test_convert test_convert.cpp

This test program generates the following result:

UTF-8 string=ProvenÃal
UTF-8 string length=10

Test conversion using codecvt facets directly:
UCS-4 string=50 72 6f 76 65 6e e7 61 6c 
UCS-4 string length=9
UTF-16LE string=ff fe 50 0 72 0 6f 0 76 0 65 0 6e 0 e7 0 61 0 6c 0 
UTF-16LE string length=20

Test conversion using wstring_convert:
UCS-4 string=50 72 6f 76 65 6e e7 61 6c 
UCS-4 string length=9
UTF-16LE string=ff fe 50 0 72 0 6f 0 76 0 65 0 ff fe 6e 0 e7 0 ff fe 61 0 6c 0 
UTF-16LE string length=24

In my opinion, the result generated by the codecvt facets is the correct
result.  Notice that the UTF-16LE result generated by wstring_convert contains
three occurrences of the BOM which is incorrect.

I hope I have given enough information concerning this issue.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]