This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [gfortran,patch] Ignore byte order mark at start of file

From: FX Coudert <fxcoudert at gmail dot com>
To: Brooks Moses <brooks dot moses at codesourcery dot com>
Cc: fortran at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
Date: Wed, 25 Apr 2007 23:32:10 +0200
Subject: Re: [gfortran,patch] Ignore byte order mark at start of file
References: <02088E24-2A38-4C0F-8639-B9AD5F8F3A9B@gmail.com> <462FC4EB.90402@codesourcery.com>

One question -- how is it that we end up handling UTF-16 files correctly aside from the BOM, anyway?

I haven't looked very hard into this. Andrew Pinski suggested on IRC that it might be due to the front-end simply ignoring NULL bytes.

I can't seem to figure out why it works, but it seems to. As such, unless there's some particular reason to expect that it will always work correctly (and that includes not doing bizarre things with non- ASCII characters in string literals, for instance), then we should probably issue a warning if a UTF-16 BOM is found.

This point is unrelated, in my view, to the BOM issue. (At least because I have easily found how to fix the BOM thing, while I have no idea how the encoding is handled, if it is at all.)

And I'm not convinced that we shouldn't issue a warning for a UTF-8 BOM, either...

I thought about that, but then we'd need to add a flag to get rid of it (because having a warning for every single file in a compilation about something that is purely mechanical and has no potential for trouble). And since I don't see a potential for abuse, it seemed simpler as is.

This part assumes that line[1] and line[2] exist. However, line[] is allocated in load_line as having length maxlen, which is set to gfc_option.free_line_length if we have free-form source with limited line lengths, and there is no guarantee that free_line_length is 3 or higher.

See Tobias' answer. I will add the check in the code, anyway.

+ int n = (line[1] == '\xBB' ? 3 : 2);
I don't think the parentheses around this expression are needed.

I find it easier on the eye, but the general coding style appears to agree with you. I'll change it.

Does the testsuite harness know what to do with a "dg-do compile" that's in UTF-16 format?

Yes: all these testcases are run and checked. I don't know how it does it :)


Many thanks to Tobias and you for the review!
FX

Follow-Ups:
- Re: [gfortran,patch] Ignore byte order mark at start of file
  - From: Brooks Moses
- Re: [gfortran,patch] Ignore byte order mark at start of file
  - From: Jerry DeLisle

References:
- [gfortran,patch] Ignore byte order mark at start of file
  - From: FX Coudert
- Re: [gfortran,patch] Ignore byte order mark at start of file
  - From: Brooks Moses

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]