37298 – no output when use wcout and cout, wrong utf16 -> utf8 conversion

Bug 37298 - no output when use wcout and cout, wrong utf16 -> utf8 conversion

Summary: no output when use wcout and cout, wrong utf16 -> utf8 conversion

Status:	RESOLVED DUPLICATE of bug 35353

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libstdc++ (show other bugs)
Version:	4.2.3

Importance:	P3 trivial
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-08-31 16:19 UTC by Jacek Jaworski
Modified:	2008-08-31 20:57 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jacek Jaworski 2008-08-31 16:19:35 UTC

Hi!
I wrote my first std::wstring program and have found bug in libstdc++!
   
*  the exact version of GCC
gcc version 4.2.3 (4.2.3-6mnb1)

* the system type;
Linux Mandriva 2008.1 PowerPack

* the options given when GCC was configured/built;
./configure --prefix=/usr --libexecdir=/usr/lib --with-slibdir=/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-checking=release --enable-languages=c,c++,ada,fortran,objc,obj-c++,java --host=i586-manbo-linux-gnu --with-cpu=generic --with-system-zlib --enable-threads=posix --enable-shared --enable-long-long --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --enable-gtk-cairo --disable-libjava-multilib --enable-ssp --disable-libssp

* the complete command line that triggers the bug;
g++ couttest.cpp -o couttest; ./couttest

* the compiler output (error messages, warnings, etc.)
There is no error or warrning messages.

Important think:
$ locale
LANG=pl_PL.UTF-8
...

I wrote this program (couttest.cpp):
#include <string>
#include <iostream>
#include <locale>
int main()
{
   std::wstring wstr(L"letters1:&#261;&#347;&#322;ó&#322;&#380;&#378;");
   std::string str("letters2:&#261;&#347;&#322;ó&#322;&#380;&#378;");

   std::wcout.imbue(std::locale(""));
   std::cout.imbue(std::locale(""));
   
   std::wcout << wstr << std::endl;
   std::cout << str << std::endl;
}
Output is:
letters1:???????

Second line doesn't appear.
Another problem: why I can't see polish letters??? I expect that conversion from UTF-16 to UTF-8 is quite trivial. Besides I think, that conversion is often case.

But let change above program:
#include <string>
#include <iostream>
#include <locale>
int main()
{
   std::wstring wstr(L"letters1:&#261;&#347;&#322;ó&#322;&#380;&#378;");
   std::string str("letters2:&#261;&#347;&#322;ó&#322;&#380;&#378;");

   std::wcout.imbue(std::locale(""));
   std::cout.imbue(std::locale(""));

   std::cout << str << std::endl;
   std::wcout << wstr << std::endl;
}
Output is:
letters2:&#261;&#347;&#322;ó&#322;&#380;&#378;
letters1:[B&#65533;B|z

Both lines appear. But in this case wcout output is different than in first program - why??? Correct cout output is not suprise, as sources are in utf-8 - in this case there is no conversion.


I think that there are two bugs: 1) lack of whole line in first program's output 2) wrong output (wrong conversion from utf-16 to utf-8).

Jacek Jaworski

Comment 1 Paolo Carlini 2008-08-31 18:15:07 UTC


*** This bug has been marked as a duplicate of 35353 ***

Comment 2 Paolo Carlini 2008-08-31 18:24:19 UTC

By the way, your testcases are strictly speaking invalid, because you cannot mix operations on cout and wcout, per 27.3 (there is some discussion in libstdc++/11705). Fixed that, the issue is the same as 35353, currently, by default, cout, wcout, etc, are non-converting in our implementation.

Comment 3 Jacek Jaworski 2008-08-31 20:14:16 UTC

Subject: Re:  no output when use wcout and cout, wrong utf16 -> utf8 conversion

> currently, by
> default, cout, wcout, etc, are non-converting in our implementation.

Very strange.... uft16->utf8 conversion seems be very easy, and required for 
proper output on contemporary Linux. I don't know what special info is 
required if utf16 and utf8 is the same stadard. But some strange conversion 
is performed because the output is not utf16 nor utf8 - so what is the 
output's format? It is some your own format? Is there any reason for this?
I want to use uft16 because I don't want care of some codings. I expect that 
it works straightforward. So, what I should do? Create utf8 coding? In my 
example I do this because utf8 is default settings in my system, but it 
doesn't work. Should I choose polish locale? But when I create polish 
locale, and some day I will add russian translation, what will happen then?
In my opinion it isn't Unicode way! I think, that Unicode mind no care about 
character codding.

Jacek Jaworski

Comment 4 Paolo Carlini 2008-08-31 20:57:36 UTC

(In reply to comment #3)
> Very strange.... uft16->utf8 conversion seems be very easy, and required for 
> proper output on contemporary Linux.

By the way, contemporary Linux, or any Linux for that matter, is normally part of a GNU / Linux system and in that case wchar_t is always 32 bits wide, UCS-4 encoding. UTF-16 doesn't play an important role. See the glibc docs for further information. Our implementation, if sync_with_stdio(false) is called, is perfectly able to convert back and forth from an internal wchar_t (UCS-4) encoding to an external char (UTF-8) encoding, via the delivered codecvt facet.