This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [libcpp] [PATCH] Handle DOS EOF character for DJGPP
DJ Delorie <dj@redhat.com> writes:
>> ^Z can legitimately appear in the middle of a file even under DJGPP.
>
> Microsoft disagrees with you.
>
> ^Z can NOT legitimately appear in the middle of a text file under
> MS-DOS derivatives, because by definition, it marks the end of the
> text. Bytes after the ^Z are not technically part of the file, but
> part of the disk buffers used to hold that file. It's a stupid
> design, left over from the CP/M days, but that's the way it is
> nonetheless.
>
> I, personally, would not bother to support editors that actually
> create files with ^Z in them, because proper support for ^Z is
> sloppy at best in most MS-DOS and Windows programs, and you cannot
> count on apps to correctly stop at the first ^Z character, so such
> editors are going to have lots of problems anyway. But, that's no
> excuse to be wrong about *why* we're allowing ^Z.
This is *another* reason to allow ^Z in the middle of a file. My
reason is not invalid just because, in a world where there was only
MS-DOS and all programs implemented its text-file formatting
convention correctly, the scenario I outlined would never come up.
Stepping back a little, the overarching design goals in this part of
cpplib are, first, cross-platform consistent behavior, second,
tolerance for cross-platform compatibility headaches. Single-platform
compatibility logic (for instance, the "unreliable stat size" logic
for VMS) is tolerated only to the extent that it doesn't get in the
way of the platform-generic logic.
It seems to me that the MS-DOS end-of-text convention qualifies as a
cross-platform compatibility headache that should be tolerated. To do
so properly, we must consider not only files created on MS-DOS using a
correct implementation of that convention, but also files created
using an incorrect implementation of that convention; files created on
a different operating system entirely and transferred to an MS-DOS
system using a program ignorant of the convention (e.g. binary mode
FTP, NFS mounts without translation); conversely, files originally
created on MS-DOS and bearing a correctly-positioned ^Z, then
transferred to a different operating system using a program ignorant
of the convention; and so on.
Given all this, I see only two things to do:
1) Document that GCC deliberately does not honor this aspect of the
DOS text file conventions, and leave the code alone. (If as
reported this is causing bootstrap failure due to the generator
programs producing files with trailing ^Z, fix that in the
generators.)
2) Ignore a ^Z if it appears immediately before what I will continue
to call the true end of file. Preserve ^Z anywhere else. Do this
on all hosts, not just DJGPP. (Note that this needs to happen
after character-set conversion, because of the possibility that
that ^Z is not a standalone character but a second-or-subsequent
byte of a multibyte character.)
The hypothetical option 3 corresponding to Andreas' patch is not
acceptable, because it introduces divergent behavior between hosts
(thus rendering it more difficult to transport files between DOS and
other operating systems) and breaks situations where ^Z does not
signify end of file.
Currently I am leaning toward option 2 on the theory that this will
solve the majority of cases where this comes up in practice, and
should not break anything that works now; in particular, there is no
well-formed C program incorporating a ^Z as the very last standalone
character of meaningful content.
zw