As I need to port my project to vs2005, and the source code contain some UTF-8 string which is not suitable to represent by escaping, I have to add UTF-8 BOM to make vs2005 recognize it. But after I added the UTF-8 BOM, gcc can't compile it anymore, even using -finput-charset=UTF-8, it still say error about \357 \273 \277. Can you fix this problem? escaping is troublesome as too many of them and make the source code unreadable. vs2005 surely need UTF-8 BOM. While gcc can't accept UTF-8 BOM presently. Thank you!
Please attach a testcase. See http://gcc.gnu.org/bugs.html for more information. W.
Actually I already know this is not handled. In fact any of the BOMs are not handled.
I think some BOMs will be handled by iconv. In particular I tried UTF-16 and this seemed to work ok. UTF-8 is a special problem in two ways. First, glibc's iconv does not appear to recognize the UTF-8 BOM. And, even if it did, we special-case UTF-8 (at least on non-EBCDIC hosts). This could be fixed in files.c without too much difficulty (it makes a few inconvenient assumptions), except that files.c does not know the name of the source charset.
Testing a patch.
Fixed on trunk. As I doubt this will be back-ported to 4.3.x, I am closing the bug.
Subject: Bug 33415 Author: tromey Date: Mon Apr 21 14:02:00 2008 New Revision: 134507 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=134507 Log: libcpp PR libcpp/33415: * charset.c (_cpp_convert_input): Add buffer_start argument. Ignore UTF-8 BOM if seen. * internal.h (_cpp_convert_input): Add argument. * files.c (struct _cpp_file) <buffer_start>: New field. (destroy_cpp_file): Free buffer_start, not buffer. (_cpp_pop_file_buffer): Likewise. (read_file_guts): Update. gcc/testsuite PR libcpp/33415: * gcc.dg/cpp/pr33415.c: New file. Added: trunk/gcc/testsuite/gcc.dg/cpp/pr33415.c Modified: trunk/gcc/testsuite/ChangeLog trunk/libcpp/ChangeLog trunk/libcpp/charset.c trunk/libcpp/files.c trunk/libcpp/internal.h
*** Bug 40441 has been marked as a duplicate of this bug. ***
Hi Experts I am facing the same kind of problem...... I had an c++ application which uses unicode string inside it, I had compiled the solution using Visual Studio 2012. The file is saved in utf-8 with BOM(byte order marker). When I run the same file in linux, I got the following errors:- error: stray '\239' in program 1: error: stray '\187' in program 1: error: stray '\191' in program I found that gcc won't support BOM in the c++ file.If I remove the BOM from the file error get resolved. Is there a way by which I can compile my application containing files saved in utf-8 with BOM ? I am compiling the application in "Red hat enterprise Linux 4 edition" , where GCC version 3.4.6. Please help me in this regard. Thanks & Regards Vikas
Vikas: This was fixed in GCC in 2008. The version of GCC you are using (3.4.6) was released in 2006 and the entire 3.4.x tree is in fact from 2004. It is time for you to upgrade your system after almost a decade if there are features you need.
And please don't use Bugzilla for questions about using GCC, use the gcc-help mailing list, thanks.