This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: [gfortran,patch] Ignore byte order mark at start of file
One question -- how is it that we end up handling UTF-16 files
correctly aside from the BOM, anyway?
I haven't looked very hard into this. Andrew Pinski suggested on IRC
that it might be due to the front-end simply ignoring NULL bytes.
I can't seem to figure out why it works, but it seems to. As such,
unless there's some particular reason to expect that it will always
work correctly (and that includes not doing bizarre things with non-
ASCII characters in string literals, for instance), then we should
probably issue a warning if a UTF-16 BOM is found.
This point is unrelated, in my view, to the BOM issue. (At least
because I have easily found how to fix the BOM thing, while I have no
idea how the encoding is handled, if it is at all.)
And I'm not convinced that we shouldn't issue a warning for a UTF-8
BOM, either...
I thought about that, but then we'd need to add a flag to get rid of
it (because having a warning for every single file in a compilation
about something that is purely mechanical and has no potential for
trouble). And since I don't see a potential for abuse, it seemed
simpler as is.
This part assumes that line[1] and line[2] exist. However, line[]
is allocated in load_line as having length maxlen, which is set to
gfc_option.free_line_length if we have free-form source with
limited line lengths, and there is no guarantee that
free_line_length is 3 or higher.
See Tobias' answer. I will add the check in the code, anyway.
+ int n = (line[1] == '\xBB' ? 3 : 2);
I don't think the parentheses around this expression are needed.
I find it easier on the eye, but the general coding style appears to
agree with you. I'll change it.
Does the testsuite harness know what to do with a "dg-do compile"
that's in UTF-16 format?
Yes: all these testcases are run and checked. I don't know how it
does it :)
Many thanks to Tobias and you for the review!
FX