This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Compiling files not encoded with system settings



Ross Ridge wrote:
Nicolas De Rico wrote:
The file hi-utf16.c, created with Notepad and saved in "unicode",
contains a BOM which is, in essence, a small header at the beginning of
the file that indicates the encoding.

It's not a header that indicates the encoding. It's a header that indicates the byte order of the 16-bit values that follow when the encoding is already known to be UTF-16. When then encoding is known to be UTF-16LE or UTF-16BE there shouldn't be any "BOM" present at the start of a C file, since a "BOM" in the correct byte order is actually the Unicode "zero-width non-breaking space" character, which isn't valid as the first character in a C file. Similarly, there shouldn't be a BOM mark at the start of a UTF-8 C file, especially since UTF-8 encoded files don't have a byte-order.

The presence of what looks to be UTF-16 BOM header can be used a part
of a heuristic to guess the encoding of file, but I don't think it's a
good idea for GCC to be guessing the encoding of files.

Of course, stdio.h is stored in UTF-8 on the system so trying to convert
it from UTF-16 will fail right away.

It would probably be more accurate to describe "stdio.h" as an ASCII file.



It's true that stdio.h is ascii. I wasn't thinking properly, files saved in UTF-8 or LATIN-1 (for example) should compile properly if using the proper setting.


But how can someone compile using gcc a file created with Visual C++ and saved in unicode?

Microsoft puts a BOM for UTF-16 files. It even does so for UTF-8 files that are saved with Notepad (this can be confirmed using 'od -x'). This allows their programs to detect the encoding automatically. Note that vim seems to be able to detect the encoding using the BOM.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]