Bug 33415

Summary: Can't compile .cpp file with UTF-8 BOM.
Product: gcc Reporter: Hu Zheng <huzheng_001>
Component: preprocessorAssignee: Tom Tromey <tromey>
Severity: normal CC: bangerth, dh.liu, gcc-bugs, neil, pinskia, tromey
Priority: P3    
Version: 4.1.2   
Target Milestone: 4.4.0   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2008-04-16 21:29:21

Description Hu Zheng 2007-09-13 10:04:15 UTC
As I need to port my project to vs2005, and the source code contain some UTF-8 string which is not suitable to represent by escaping, I have to add UTF-8 BOM to make vs2005 recognize it. But after I added the UTF-8 BOM, gcc can't compile it anymore, even using -finput-charset=UTF-8, it still say error about \357 \273 \277.
Can you fix this problem?

escaping is troublesome as too many of them and make the source code unreadable.
vs2005 surely need UTF-8 BOM.
While gcc can't accept UTF-8 BOM presently.

Thank you!
Comment 1 Wolfgang Bangerth 2007-09-14 04:12:37 UTC
Please attach a testcase. See
for more information.
Comment 2 Andrew Pinski 2007-09-14 09:28:32 UTC
Actually I already know this is not handled.  In fact any of the BOMs are not handled.
Comment 3 Tom Tromey 2008-04-16 20:37:27 UTC
I think some BOMs will be handled by iconv.
In particular I tried UTF-16 and this seemed to work ok.

UTF-8 is a special problem in two ways.  First, glibc's iconv does not
appear to recognize the UTF-8 BOM.

And, even if it did, we special-case UTF-8 (at least on non-EBCDIC hosts).

This could be fixed in files.c without too much difficulty (it makes a few
inconvenient assumptions), except that files.c does not know the name of the
source charset.
Comment 4 Tom Tromey 2008-04-16 21:29:21 UTC
Testing a patch.
Comment 5 Tom Tromey 2008-04-21 14:02:46 UTC
Fixed on trunk.
As I doubt this will be back-ported to 4.3.x, I am closing the bug.
Comment 6 Tom Tromey 2008-04-21 14:02:52 UTC
Subject: Bug 33415

Author: tromey
Date: Mon Apr 21 14:02:00 2008
New Revision: 134507

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=134507
	PR libcpp/33415:
	* charset.c (_cpp_convert_input): Add buffer_start argument.
	Ignore UTF-8 BOM if seen.
	* internal.h (_cpp_convert_input): Add argument.
	* files.c (struct _cpp_file) <buffer_start>: New field.
	(destroy_cpp_file): Free buffer_start, not buffer.
	(_cpp_pop_file_buffer): Likewise.
	(read_file_guts): Update.
	PR libcpp/33415:
	* gcc.dg/cpp/pr33415.c: New file.


Comment 7 Joseph S. Myers 2009-06-14 23:03:33 UTC
*** Bug 40441 has been marked as a duplicate of this bug. ***
Comment 8 Vikas 2013-03-25 05:02:57 UTC
Hi Experts
I am facing the same kind of problem......

I had an c++ application which uses unicode string inside it, I had compiled the solution using Visual Studio 2012. The file is saved in utf-8 with BOM(byte order marker). When I run the same file in linux, I got the following errors:-
 error: stray '\239' in program
 1: error: stray '\187' in program
 1: error: stray '\191' in program
I found that gcc won't support BOM in the c++ file.If I remove the BOM from the file error get resolved. Is there a way by which I can compile my application containing files saved in utf-8 with BOM ?
I am compiling the application in "Red hat enterprise Linux 4 edition" , where GCC version 3.4.6. 

Please help me in this regard.

Thanks & Regards
Comment 9 Wolfgang Bangerth 2013-03-25 13:50:00 UTC
Vikas: This was fixed in GCC in 2008. The version of GCC you are using (3.4.6) was released in 2006 and the entire 3.4.x tree is in fact from 2004. It is time for you to upgrade your system after almost a decade if there are features you need.
Comment 10 Jonathan Wakely 2013-03-25 14:15:17 UTC
And please don't use Bugzilla for questions about using GCC, use the gcc-help mailing list, thanks.