Hi! I wrote my first std::wstring program and have found bug in libstdc++! * the exact version of GCC gcc version 4.2.3 (4.2.3-6mnb1) * the system type; Linux Mandriva 2008.1 PowerPack * the options given when GCC was configured/built; ./configure --prefix=/usr --libexecdir=/usr/lib --with-slibdir=/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-checking=release --enable-languages=c,c++,ada,fortran,objc,obj-c++,java --host=i586-manbo-linux-gnu --with-cpu=generic --with-system-zlib --enable-threads=posix --enable-shared --enable-long-long --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --enable-gtk-cairo --disable-libjava-multilib --enable-ssp --disable-libssp * the complete command line that triggers the bug; g++ couttest.cpp -o couttest; ./couttest * the compiler output (error messages, warnings, etc.) There is no error or warrning messages. Important think: $ locale LANG=pl_PL.UTF-8 ... I wrote this program (couttest.cpp): #include <string> #include <iostream> #include <locale> int main() { std::wstring wstr(L"letters1:ąśłółżź"); std::string str("letters2:ąśłółżź"); std::wcout.imbue(std::locale("")); std::cout.imbue(std::locale("")); std::wcout << wstr << std::endl; std::cout << str << std::endl; } Output is: letters1:??????? Second line doesn't appear. Another problem: why I can't see polish letters??? I expect that conversion from UTF-16 to UTF-8 is quite trivial. Besides I think, that conversion is often case. But let change above program: #include <string> #include <iostream> #include <locale> int main() { std::wstring wstr(L"letters1:ąśłółżź"); std::string str("letters2:ąśłółżź"); std::wcout.imbue(std::locale("")); std::cout.imbue(std::locale("")); std::cout << str << std::endl; std::wcout << wstr << std::endl; } Output is: letters2:ąśłółżź letters1:[B�B|z Both lines appear. But in this case wcout output is different than in first program - why??? Correct cout output is not suprise, as sources are in utf-8 - in this case there is no conversion. I think that there are two bugs: 1) lack of whole line in first program's output 2) wrong output (wrong conversion from utf-16 to utf-8). Jacek Jaworski
*** This bug has been marked as a duplicate of 35353 ***
By the way, your testcases are strictly speaking invalid, because you cannot mix operations on cout and wcout, per 27.3 (there is some discussion in libstdc++/11705). Fixed that, the issue is the same as 35353, currently, by default, cout, wcout, etc, are non-converting in our implementation.
Subject: Re: no output when use wcout and cout, wrong utf16 -> utf8 conversion > currently, by > default, cout, wcout, etc, are non-converting in our implementation. Very strange.... uft16->utf8 conversion seems be very easy, and required for proper output on contemporary Linux. I don't know what special info is required if utf16 and utf8 is the same stadard. But some strange conversion is performed because the output is not utf16 nor utf8 - so what is the output's format? It is some your own format? Is there any reason for this? I want to use uft16 because I don't want care of some codings. I expect that it works straightforward. So, what I should do? Create utf8 coding? In my example I do this because utf8 is default settings in my system, but it doesn't work. Should I choose polish locale? But when I create polish locale, and some day I will add russian translation, what will happen then? In my opinion it isn't Unicode way! I think, that Unicode mind no care about character codding. Jacek Jaworski
(In reply to comment #3) > Very strange.... uft16->utf8 conversion seems be very easy, and required for > proper output on contemporary Linux. By the way, contemporary Linux, or any Linux for that matter, is normally part of a GNU / Linux system and in that case wchar_t is always 32 bits wide, UCS-4 encoding. UTF-16 doesn't play an important role. See the glibc docs for further information. Our implementation, if sync_with_stdio(false) is called, is perfectly able to convert back and forth from an internal wchar_t (UCS-4) encoding to an external char (UTF-8) encoding, via the delivered codecvt facet.