This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: char & wchar_t encoding


> What character encodings are used for char and wchar_t (Latin-1, ASCII,
> Unicode, ISO-10646, etc.) ? Is gcc's implementation of the encodings
> different for different platforms?

For 2.95, it depends whether --enable-c-mbchar was given during
configuration or not.

If gcc was configured without --enable-c-mbchar, it treats input
source 'as-is', and widens each source character to wide character
with the same encoding value.

One way to look at this is: For a conforming program, input uses only
the "basic character set" (i.e. ASCII without the dollar sign)(*). GCC
will then put wide characters into the binary which are Unicode
encoded.

Another way to view this: The implementation-defined input character
encoding is ISO 8859-1, and the execution character set is ISO 10646.
The size of wchar_t depends on the architecture, it is either
BITS_PER_WORD or BITS_PER_WORD/2 (on vxm68k and vxsparc, it is 8).

If gcc is configured with --enable-c-mbchar, the interpretation of the
source character set, and the execution character set, both depend on
the compile-time locale. The compiler uses mbtowc to convert the input
string to wide characters.

As a special case, some locales starting with "C-" are
special-cased. Please look at the documentation for the LANG variable.

Regards,
Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]