This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libstdc++/66441] New: wstring_convert not working correctly
- From: "lcarreon at bigpond dot net.au" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 06 Jun 2015 06:34:52 +0000
- Subject: [Bug libstdc++/66441] New: wstring_convert not working correctly
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66441
Bug ID: 66441
Summary: wstring_convert not working correctly
Product: gcc
Version: 5.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: lcarreon at bigpond dot net.au
Target Milestone: ---
Created attachment 35707
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35707&action=edit
Test program which demonstrates the issue
In my opinion, there is a problem with the way wstring_convert behaves.
I use Fedora 22 32-bit and 64-bit which includes GCC 5.1.1.
Attached is a test program which demonstrates the problem. This test program
converts a UTF-8 string which does not have a BOM into a UCS-4 string. The
UCS-4 string is then converted to a UTF-16LE string. The test program is
composed of two parts: 1) performs the conversion using the codecvt facets
directly and 2) performs the conversion using wstring_convert.
I compiled the test program using the following command:
g++ -std=c++14 -o test_convert test_convert.cpp
This test program generates the following result:
UTF-8 string=ProvenÃal
UTF-8 string length=10
Test conversion using codecvt facets directly:
UCS-4 string=50 72 6f 76 65 6e e7 61 6c
UCS-4 string length=9
UTF-16LE string=ff fe 50 0 72 0 6f 0 76 0 65 0 6e 0 e7 0 61 0 6c 0
UTF-16LE string length=20
Test conversion using wstring_convert:
UCS-4 string=50 72 6f 76 65 6e e7 61 6c
UCS-4 string length=9
UTF-16LE string=ff fe 50 0 72 0 6f 0 76 0 65 0 ff fe 6e 0 e7 0 ff fe 61 0 6c 0
UTF-16LE string length=24
In my opinion, the result generated by the codecvt facets is the correct
result. Notice that the UTF-16LE result generated by wstring_convert contains
three occurrences of the BOM which is incorrect.
I hope I have given enough information concerning this issue.