Bug 31645

Summary: Error on reading Byte Order Mark
Product: gcc Reporter: Francois-Xavier Coudert <fxcoudert>
Component: fortranAssignee: Francois-Xavier Coudert <fxcoudert>
Status: RESOLVED FIXED    
Severity: enhancement CC: gcc-bugs
Priority: P3 Keywords: patch
Version: 4.3.0   
Target Milestone: 4.3.0   
URL: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01731.html
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2007-04-21 09:24:13

Description Francois-Xavier Coudert 2007-04-21 09:22:46 UTC
We should probably take care of files that begin with a byte order mark (BOM; see http://en.wikipedia.org/wiki/Byte_Order_Mark) because some editors (like windows notepad) use them. We currently say:

$ xxd bom.f
0000000: fffe 2020 2020 2020 7072 696e 7420 2a2c  ..      print *,
0000010: 2022 4865 6c6c 6f20 776f 726c 6422 0a20   "Hello world". 
0000020: 2020 2020 2065 6e64                           end
$ gfortran bom.f 
bom.f:1.1:

\xFF\xFE      print *, "Hello world"                                          
1
Error: Non-numeric character in statement label at (1)
bom.f:1.2:

\xFF\xFE      print *, "Hello world"                                          
 1
Error: Invalid character in name at (1)
Comment 1 Andrew Pinski 2007-04-21 16:23:30 UTC
Note you might also need to add support to the preprocessor also (which means adding it to the C family of languages which is a good thing).  You might want to support more than just the UTF-8 BOM but also the UTF-16 and UTF-32 one too.  
Comment 2 Francois-Xavier Coudert 2007-04-29 11:46:15 UTC
Subject: Bug 31645

Author: fxcoudert
Date: Sun Apr 29 11:45:57 2007
New Revision: 124274

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124274
Log:
	PR fortran/31645

	* scanner.c (load_file): Discard the byte order mark if one is
	found on the first non-preprocessor line of a file.

	* testsuite/gfortran.dg/bom_error.f90: New test.
	* testsuite/gfortran.dg/bom_include.f90: New test.
	* testsuite/gfortran.dg/bom_UTF16-LE.f90: New test.
	* testsuite/gfortran.dg/bom_UTF16-BE.f90: New test.
	* testsuite/gfortran.dg/bom_UTF-8.f90: New test.
	* testsuite/gfortran.dg/bom_UTF-32.f90: New test.
	* testsuite/gfortran.dg/bom_UTF-8.F90: New test.
	* testsuite/gfortran.dg/bom_include.inc: New file.

Added:
    trunk/gcc/testsuite/gfortran.dg/bom_UTF-32.f90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF-8.F90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF-8.f90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF16-BE.f90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF16-LE.f90
    trunk/gcc/testsuite/gfortran.dg/bom_error.f90
    trunk/gcc/testsuite/gfortran.dg/bom_include.f90
    trunk/gcc/testsuite/gfortran.dg/bom_include.inc
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/scanner.c
    trunk/gcc/testsuite/ChangeLog

Comment 3 Francois-Xavier Coudert 2007-04-29 12:31:51 UTC
Fixed