We should probably take care of files that begin with a byte order mark (BOM; see http://en.wikipedia.org/wiki/Byte_Order_Mark) because some editors (like windows notepad) use them. We currently say: $ xxd bom.f 0000000: fffe 2020 2020 2020 7072 696e 7420 2a2c .. print *, 0000010: 2022 4865 6c6c 6f20 776f 726c 6422 0a20 "Hello world". 0000020: 2020 2020 2065 6e64 end $ gfortran bom.f bom.f:1.1: \xFF\xFE print *, "Hello world" 1 Error: Non-numeric character in statement label at (1) bom.f:1.2: \xFF\xFE print *, "Hello world" 1 Error: Invalid character in name at (1)
Note you might also need to add support to the preprocessor also (which means adding it to the C family of languages which is a good thing). You might want to support more than just the UTF-8 BOM but also the UTF-16 and UTF-32 one too.
Subject: Bug 31645 Author: fxcoudert Date: Sun Apr 29 11:45:57 2007 New Revision: 124274 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124274 Log: PR fortran/31645 * scanner.c (load_file): Discard the byte order mark if one is found on the first non-preprocessor line of a file. * testsuite/gfortran.dg/bom_error.f90: New test. * testsuite/gfortran.dg/bom_include.f90: New test. * testsuite/gfortran.dg/bom_UTF16-LE.f90: New test. * testsuite/gfortran.dg/bom_UTF16-BE.f90: New test. * testsuite/gfortran.dg/bom_UTF-8.f90: New test. * testsuite/gfortran.dg/bom_UTF-32.f90: New test. * testsuite/gfortran.dg/bom_UTF-8.F90: New test. * testsuite/gfortran.dg/bom_include.inc: New file. Added: trunk/gcc/testsuite/gfortran.dg/bom_UTF-32.f90 trunk/gcc/testsuite/gfortran.dg/bom_UTF-8.F90 trunk/gcc/testsuite/gfortran.dg/bom_UTF-8.f90 trunk/gcc/testsuite/gfortran.dg/bom_UTF16-BE.f90 trunk/gcc/testsuite/gfortran.dg/bom_UTF16-LE.f90 trunk/gcc/testsuite/gfortran.dg/bom_error.f90 trunk/gcc/testsuite/gfortran.dg/bom_include.f90 trunk/gcc/testsuite/gfortran.dg/bom_include.inc Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/scanner.c trunk/gcc/testsuite/ChangeLog
Fixed