Bug 35353 - C++ wide character locale doesn't work
Summary: C++ wide character locale doesn't work
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.1.2
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 33852 37298 37673 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-02-24 14:07 UTC by Ioannis Vranos
Modified: 2024-03-17 23:10 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-09-14 21:10:27


Attachments
The main.ii file produced by -save-temps option (90.20 KB, text/plain)
2008-02-24 14:11 UTC, Ioannis Vranos
Details
The produced main.s file (1.68 KB, text/plain)
2008-02-24 14:15 UTC, Ioannis Vranos
Details
Screenshot of the standard I/O of the working code and of the non-working code. (18.15 KB, image/png)
2008-02-24 14:23 UTC, Ioannis Vranos
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ioannis Vranos 2008-02-24 14:07:13 UTC
The following code works:

#include <iostream>
#include <clocale>
#include <string>

int main()
{
    using namespace std;
    
    char *p= setlocale( LC_ALL, "greek" );

    if (!p)
      cerr<< "NULL returned!\n";

    wstring ws;
    
    wcin>> ws;
    
    wcout<< ws<< endl;
} 

[john@localhost src]$ ./foobar-cpp 
&#916;&#959;&#954;&#953;&#956;&#945;&#963;&#964;&#953;&#954;&#972;
&#916;&#959;&#954;&#953;&#956;&#945;&#963;&#964;&#953;&#954;&#972;
[john@localhost src]$ 



The following code DOES NOT work:

#include <iostream>
#include <locale>
#include <string>

int main()
{
    using namespace std;
    
    wcout.imbue(locale("greek")); 

    wstring ws;
    
    wcin>> ws;
    
    wcout<< ws<< endl;
} 


[john@localhost src]$ ./foobar-cpp 
&#916;&#959;&#954;&#953;&#956;&#945;&#963;&#964;&#953;&#954;&#972;

[john@localhost src]$ 


For the code that does not work:

[john@localhost src]$ g++ -v -save-temps -ansi -pedantic-errors -Wall main.cc -o foobar-cpp 
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)
 /usr/libexec/gcc/i386-redhat-linux/4.1.2/cc1plus -E -quiet -v -D_GNU_SOURCE main.cc -mtune=generic -ansi -pedantic-errors -Wall -fpch-preprocess -o main.ii
ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../i386-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2
 /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/i386-redhat-linux
 /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/backward
 /usr/local/include
 /usr/lib/gcc/i386-redhat-linux/4.1.2/include
 /usr/include
End of search list.
 /usr/libexec/gcc/i386-redhat-linux/4.1.2/cc1plus -fpreprocessed main.ii -quiet -dumpbase main.cc -mtune=generic -ansi -auxbase main -pedantic-errors -Wall -ansi -version -o main.s
GNU C++ version 4.1.2 20070626 (Red Hat 4.1.2-14) (i386-redhat-linux)
        compiled by GNU C version 4.1.2 20070626 (Red Hat 4.1.2-14).
GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129413
Compiler executable checksum: a9d7d7ea3146608fff5ae7eec9c8ae61
 as -V -Qy -o main.o main.s
GNU assembler version 2.17.50.0.6-5.el5 (i386-redhat-linux) using BFD version 2.17.50.0.6-5.el5 20061020
 /usr/libexec/gcc/i386-redhat-linux/4.1.2/collect2 --eh-frame-hdr -m elf_i386 --hash-style=gnu -dynamic-linker /lib/ld-linux.so.2 -o foobar-cpp /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crt1.o /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.1.2/crtbegin.o -L/usr/lib/gcc/i386-redhat-linux/4.1.2 -L/usr/lib/gcc/i386-redhat-linux/4.1.2 -L/usr/lib/gcc/i386-redhat-linux/4.1.2/../../.. main.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i386-redhat-linux/4.1.2/crtend.o /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../crtn.o
[john@localhost src]$
Comment 1 Ioannis Vranos 2008-02-24 14:11:33 UTC
Created attachment 15217 [details]
The main.ii file produced by  -save-temps option

This is the file created by the 

g++ -v -save-temps -ansi -pedantic-errors -Wall main.cc -o foobar-cpp 

command, on the non-working code.
Comment 2 Ioannis Vranos 2008-02-24 14:15:26 UTC
Created attachment 15218 [details]
The produced main.s file

The main.s file produced by "g++ -v -save-temps -ansi -pedantic-errors -Wall main.cc -o foobar-cpp"
Comment 3 Paolo Carlini 2008-02-24 14:18:15 UTC
Not a bug, given our implementation-defined behavior: the various cin / wcin, streams are by default synced with stdio (per the standard requirements) and thus not converting. You can either call sync_with_stdio(false) before any I/O or use converting stream, like fstreams.
Comment 4 Ioannis Vranos 2008-02-24 14:23:59 UTC
Created attachment 15219 [details]
Screenshot of the standard I/O of the working code and of the non-working code.

This screenshot shows the I/O of the working code and of the non-working code respectively.
Comment 5 Ioannis Vranos 2008-02-24 14:35:08 UTC
sync_with_stdio (false) doesn't work. Actually it crashes the code.

Check the screenshot I have attached in the latest attachment, to see the difference between the C++ working code and the C++ non-working code.
Comment 6 Paolo Carlini 2008-02-24 14:40:34 UTC
sync_with_stdio(false) works, and is tested dozens of times a day in our testsuites. And that is only half of my answer. Please understand what I said, study the details of the ISO C++ Standard and then come back.
Comment 7 Ioannis Vranos 2008-02-25 12:02:45 UTC
I am sorry for insisting on this, but I think there is an issue, and I want the best for GCC. So please have a look at the messages of this link:

http://tinyurl.com/384u3n and use Unicode (UTF-8) character encoding in your browser, to see the issues.


Thanks.
Comment 8 Ioannis Vranos 2008-02-25 12:12:42 UTC
Summary of the case:

What doesn't work:

#include <iostream>
#include <locale>
#include <string>


int main()
{
    using namespace std;



    wcin.imbue(locale("greek"));
    wcout.imbue(locale("greek"));

    wstring ws;

    wcin>> ws;

    wcout<< ws<< endl;
} 


What works (under 2 conditions):

1. Only when "locale::global()" statement is used:

#include <iostream>
#include <locale>
#include <string>


int main()
{
    using namespace std;

    locale::global(locale("en_US"));

    wcin.imbue(locale("greek"));
    wcout.imbue(locale("greek"));

    wstring ws;

    wcin>> ws;

    wcout<< ws<< endl;
} 


2. Only when "ios_base::sync_with_stdio(false)" statement is used.

#include <iostream>
#include <locale>
#include <string>


int main()
{
    using namespace std;

    ios_base::sync_with_stdio(false);

    wcin.imbue(locale("greek"));
    wcout.imbue(locale("greek"));

    wstring ws;

    wcin>> ws;

    wcout<< ws<< endl;
} 
Comment 9 Paolo Carlini 2008-02-25 12:44:30 UTC
Maybe we can improve the behavior when the stdio is synced, that is we can transcode each wchar_t and sync after each transcoding. Very likely, you can also simulate that behavior right now by using sync_with_stdio(false) + a custom single-char I/O buffer. In any case, any enhancement will be implemented only when the binary compatibility will be broken.
Comment 10 Paolo Carlini 2008-02-25 13:11:14 UTC
Note, anyway, that there is a serious blocker to any enhancement in this area (and of course it explains the current behavior): if wcin & co are converting, they deal with the underlying stream as a narrow-character oriented stream. But when the stream is synced it must be possible to mix char-by-char with wchar_t C stdio operations, which require a wide-character orientation of the stream, whereas, per C99 7.19.2, the orientation of a stream cannot be changed after opening.
Comment 11 Paolo Carlini 2008-02-25 14:18:48 UTC
About my last reply: I checked, and within the current implementation of the underlying I/O the last issue (per libstdc++/9662) doesn't exist anymore, in other terms, when sync_with_stdio(false), C++ I/O on wcin/wcout doesn't change the orientation of the stream to byte (i.e, fwide < 0). Good.

We have re-investigate all the other reasons that led to the separate non-converting synced (default) implementation of wcin & co...
Comment 12 Paolo Carlini 2008-08-31 18:15:07 UTC
*** Bug 37298 has been marked as a duplicate of this bug. ***
Comment 13 Paolo Carlini 2008-09-30 10:21:55 UTC
*** Bug 37673 has been marked as a duplicate of this bug. ***
Comment 14 Paolo Carlini 2010-02-21 01:31:53 UTC
*** Bug 33852 has been marked as a duplicate of this bug. ***
Comment 15 Luca Barbieri 2010-04-08 02:33:09 UTC
Why can't wcout simply convert to the selected encoding, and append the results to the cout buffer, as if the converted string had been directly output to cout?

I'm not sure about the implementation details, but I fail to see how anything could prevent adopting this rather obvious solution.

Of course, if cout is in the middle of the byte sequence of a character, this will not result in sensible output, but that is user error and I fail to see how such use could be made meaningful.

BTW, doesn't cout share the stdout buffer via the GNU libio FILE/iostream sharing mechanism, making sync_with_stdio do nothing anyway?

Comment 16 Paolo Carlini 2010-04-08 09:44:24 UTC
We may make progress on this for 4.6.0, but I don't make promises. If, after having studied the relevant bits of the Standard and the current implementation of these features (I remind you that this is Free Software, thus no mysteries, no need for black-box thinking) I would recommend going ahead and proposing a patch (after having filed the required Copyright Assignment). Thanks.