c/3804: Extended ASCII "wide" characters not behaving with UTF-8locale

Alex Eulenberg alex@rent-a-mind.com
Wed Jul 25 13:48:00 GMT 2001


First item: I do not think it's a problem with the library I'm linking
with my executable (glibc-2.2.2). The compiler should assure that the two
statements

    printf("%ls\n", L"Sch\366ne Gr\374\337e"); // escape codes in source
    printf("%ls\n", L"Schöne Grüße"); // 8bit characters in source

be equivalent in the object code. My test code shows that they are treated
differently. The latter works, and the former doesn't, when the executable
is run under a UTF-8 locale. As I said, I get good results when I compile
with gcc-2.95.3. It breaks with gcc-3.0 and higher.

Second item: According to the Sun compiler documentation
http://docs.sun.com/htmlcoll/coll.33.7/iso-8859-1/CUG/tguide.html#785 , for
all ANSI/ISO Compilers, "When the compilation system encounters a wide
character constant or wide string literal, each multibyte character is
converted into a wide character, as if by calling the mbtowc() function."
I would prefer to be able to use gcc to compile source code encoded in
UTF-8, with UTF-8 string and character literals (I can manage with ASCII
identifiers). There is no way to do that now, either through locale (why
not?) or compiler directives. If there are no plans to put in this
capability, it should be documented as a missing feature.

--Alex

On 25 Jul 2001 neil@gcc.gnu.org wrote:

> Synopsis: Extended ASCII "wide" characters not behaving with UTF-8 locale
> 
> State-Changed-From-To: open->analyzed
> State-Changed-By: neil
> State-Changed-When: Tue Jul 24 23:49:41 2001
> State-Changed-Why:
>     Your first item might be a C library bug.
>     
>     Your second item is not the way GCC is heading.  GCC will not interpret translation units based upon locale settings (and note that it is files that need to be translated, strings are just part of the picture).  It currently is not implemented, though.
> 
> http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=3804&database=gcc
> 




More information about the Gcc-bugs mailing list