This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: How to compile c++ code without strip off utf-8 BOM?

From: "John (Eljay) Love-Jensen" <eljay at adobe dot com>
To: Dancefire <dancefire at gmail dot com>, "gcc-help at gcc dot gnu dot org" <gcc-help at gcc dot gnu dot org>
Date: Tue, 17 Feb 2009 05:23:10 -0800
Subject: RE: How to compile c++ code without strip off utf-8 BOM?
References: <e074d31a0902142343l1f22f47cp4d2b14a72455525f@mail.gmail.com>,<e074d31a0902162308o7414e45boc1bf0c50de61a603@mail.gmail.com>

Hi Tao Wang,

My test.cpp source is UTF-8 with BOM.

If I compile it like this...

g++ -x c++ <(xxd -g 1 -s 3 test.cpp | xxd -g 1 -s -3 -r) -o a.out

... that strips out the first three bytes at the beginning.  For test.cpp, this happens to be the BOM (ef bb bf) at the beginning.

You'd may want to create a little 'stripBOM' program that behaves like 'cat', but gobbles the BOM if present.

Or you could use awk, sed, perl, or your favorite-text-munging-tool-of-choice to perform the same conversion.  I just used xxd because it was quick, for illustrative purposes.  (There's probably a more suitable unix tool than xxd for this kind of cat-with-offset, but you'd want something that filters out BOM rather than always offsetting.)

HTH,
--Ejlay

Follow-Ups:
- Re: How to compile c++ code without strip off utf-8 BOM?
  - From: Dario Saccavino

References:
- Re: How to compile c++ code without strip off utf-8 BOM?
  - From: Dancefire

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]