This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: char & wchar_t encoding
> What character encodings are used for char and wchar_t (Latin-1, ASCII,
> Unicode, ISO-10646, etc.) ? Is gcc's implementation of the encodings
> different for different platforms?
For 2.95, it depends whether --enable-c-mbchar was given during
configuration or not.
If gcc was configured without --enable-c-mbchar, it treats input
source 'as-is', and widens each source character to wide character
with the same encoding value.
One way to look at this is: For a conforming program, input uses only
the "basic character set" (i.e. ASCII without the dollar sign)(*). GCC
will then put wide characters into the binary which are Unicode
encoded.
Another way to view this: The implementation-defined input character
encoding is ISO 8859-1, and the execution character set is ISO 10646.
The size of wchar_t depends on the architecture, it is either
BITS_PER_WORD or BITS_PER_WORD/2 (on vxm68k and vxsparc, it is 8).
If gcc is configured with --enable-c-mbchar, the interpretation of the
source character set, and the execution character set, both depend on
the compile-time locale. The compiler uses mbtowc to convert the input
string to wide characters.
As a special case, some locales starting with "C-" are
special-cased. Please look at the documentation for the LANG variable.
Regards,
Martin