This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gfortran,patch] Ignore byte order mark at start of file
- From: Tobias Schlüter <tobias dot schlueter at physik dot uni-muenchen dot de>
- To: Brooks Moses <brooks dot moses at codesourcery dot com>
- Cc: fortran at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Fri, 27 Apr 2007 09:14:08 +0200
- Subject: Re: [gfortran,patch] Ignore byte order mark at start of file
- References: <02088E24-2A38-4C0F-8639-B9AD5F8F3A9B@gmail.com> <462FC4EB.90402@codesourcery.com>
Brooks Moses wrote:
Index: gcc/fortran/scanner.c
===================================================================
@@ -1467,6 +1469,24 @@
if (feof (input) && len == 0)
break;
+ /* If this is the first line of the file, it can contain a byte
+ order mark (BOM), which we will ignore:
+ FF FE is UTF-16 little endian,
+ FE FF is UTF-16 big endian,
+ EF BB BF is UTF-8. */
+ if (first_line && ((line[0] == '\xFF' && line[1] == '\xFE')
+ || (line[0] == '\xFE' && line[1] == '\xFF')
+ || (line[0] == '\xEF' && line[1] == '\xBB'
+ && line[2] == '\xBF')))
This part assumes that line[1] and line[2] exist. However, line[] is
allocated in load_line as having length maxlen, which is set to
gfc_option.free_line_length if we have free-form source with limited
line lengths, and there is no guarantee that free_line_length is 3 or
higher.
Thus, these should be conditioned on line_len > 2 or 3 as appropriate.
Actually, loading the BOM into the line-buffer and then throwing away
the first two characters is wrong for another reason: it would reduce
the available line length in an unexpected way. If a 2 byte BOM is
present and no other options are given, the first line in a
free-form-file would be truncated after 130 characters instead of the
expected 132.
- Tobi