31645 – Error on reading Byte Order Mark

Bug 31645 - Error on reading Byte Order Mark

Summary: Error on reading Byte Order Mark

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	fortran (show other bugs)
Version:	4.3.0

Importance:	P3 enhancement
Target Milestone:	4.3.0
Assignee:	Francois-Xavier Coudert

URL:	http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:	patch

Depends on:
Blocks:

Reported:	2007-04-21 09:22 UTC by Francois-Xavier Coudert
Modified:	2007-04-29 12:31 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2007-04-21 09:24:13

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Francois-Xavier Coudert 2007-04-21 09:22:46 UTC

We should probably take care of files that begin with a byte order mark (BOM; see http://en.wikipedia.org/wiki/Byte_Order_Mark) because some editors (like windows notepad) use them. We currently say:

$ xxd bom.f
0000000: fffe 2020 2020 2020 7072 696e 7420 2a2c  ..      print *,
0000010: 2022 4865 6c6c 6f20 776f 726c 6422 0a20   "Hello world". 
0000020: 2020 2020 2065 6e64                           end
$ gfortran bom.f 
bom.f:1.1:

\xFF\xFE      print *, "Hello world"                                          
1
Error: Non-numeric character in statement label at (1)
bom.f:1.2:

\xFF\xFE      print *, "Hello world"                                          
 1
Error: Invalid character in name at (1)

Comment 1 Andrew Pinski 2007-04-21 16:23:30 UTC

Note you might also need to add support to the preprocessor also (which means adding it to the C family of languages which is a good thing).  You might want to support more than just the UTF-8 BOM but also the UTF-16 and UTF-32 one too.

Comment 2 Francois-Xavier Coudert 2007-04-29 11:46:15 UTC

Subject: Bug 31645

Author: fxcoudert
Date: Sun Apr 29 11:45:57 2007
New Revision: 124274

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=124274
Log:
	PR fortran/31645

	* scanner.c (load_file): Discard the byte order mark if one is
	found on the first non-preprocessor line of a file.

	* testsuite/gfortran.dg/bom_error.f90: New test.
	* testsuite/gfortran.dg/bom_include.f90: New test.
	* testsuite/gfortran.dg/bom_UTF16-LE.f90: New test.
	* testsuite/gfortran.dg/bom_UTF16-BE.f90: New test.
	* testsuite/gfortran.dg/bom_UTF-8.f90: New test.
	* testsuite/gfortran.dg/bom_UTF-32.f90: New test.
	* testsuite/gfortran.dg/bom_UTF-8.F90: New test.
	* testsuite/gfortran.dg/bom_include.inc: New file.

Added:
    trunk/gcc/testsuite/gfortran.dg/bom_UTF-32.f90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF-8.F90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF-8.f90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF16-BE.f90
    trunk/gcc/testsuite/gfortran.dg/bom_UTF16-LE.f90
    trunk/gcc/testsuite/gfortran.dg/bom_error.f90
    trunk/gcc/testsuite/gfortran.dg/bom_include.f90
    trunk/gcc/testsuite/gfortran.dg/bom_include.inc
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/scanner.c
    trunk/gcc/testsuite/ChangeLog

Comment 3 Francois-Xavier Coudert 2007-04-29 12:31:51 UTC

Fixed