This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: EOF character in parser


DJ Delorie <dj@redhat.com> writes:

>> There is no special handling of 0x1a in cpplib.
>
> There should be, but it's probably not a big issue.

It's hard, because there are other reasons why 0x1a might appear in a
file.  For instance, consider a file containing characters outside the
ASCII range, which has been run through a conversion program prior to
compilation.  Characters that aren't in the target charset are
correctly replaced with 0x1a bytes (ASCII SUBstitute).  This should
not abort translation.  The meaning of the program might not be
affected - for instance, if all the problem characters are in comments.

So correct behavior is to treat 0x1a as EOF only if the underlying OS
does the same, which cpplib doesn't currently know.

Given that O_BINARY has been in cpplib's open() call since 3.0 and no
one has complained about 0x1a not being treated as EOF yet, I am not
inclined to do anything about it.  However, now that we are no longer
using mmap to read the file, the O_BINARY might be unnecessary, and
removing it would eliminate this potential issue.  If you would like
to audit cpplib for any assumptions that would be broken by removing
it, and submit a patch, I'll consider it for 3.4.  (Note that the
internal handling of \r\n line breaks needs to remain, so that people
can copy files from DOS to Unix and use them without conversion.)

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]