In the attached file, there is a comment terminated with a line-termination character (\) followed by spaces. This should NOT be considered a line terminator, yet gcc considers it as such. From 2.1/2 in the C++03 standard: "Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines." That is, only backslashes immediately followed by a newline are considered line terminators. The existing behavior of gcc violates the standard and conflicts with the behavior of other popular C++ compilers (EDG, MSVC).
Created attachment 16843 [details] Compile with: g++ -Wall test.cpp
Subject: Re: New: Incorrect handling of line termination character with trailing spaces On Sat, 6 Dec 2008, eric dot niebler at gmail dot com wrote: > In the attached file, there is a comment terminated with a line-termination > character (\) followed by spaces. This should NOT be considered a line > terminator, yet gcc considers it as such. From 2.1/2 in the C++03 standard: > > "Each instance of a new-line character and an immediately preceding backslash > character is deleted, splicing physical source lines to form logical source > lines." This (removal of such spaces) is part of how GCC defines the implementation-defined mapping in translation phase 1. There are no input files that GCC interprets as representing a program that enters phase 2 with backslash-space at the end of a line. > That is, only backslashes immediately followed by a newline are considered line > terminators. The existing behavior of gcc violates the standard and conflicts > with the behavior of other popular C++ compilers (EDG, MSVC). No, it conforms to the standard but does not allow certain programs to be represented. (I think this is a bad idea, but that's another matter.)
If you are referring to 2.1/1 ... "Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences (2.3) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e. using the \uXXXX notation), are handled equivalently.)" I read this as permitting a mapping of characters, but not a deletion of characters, which is what gcc is doing. The only deletion of characters I see permitted is the deletion of a newline and an IMMEDIATELY preceding backslash.
(In reply to comment #3) > I read this as permitting a mapping of characters, but not a deletion of > characters, which is what gcc is doing. The only deletion of characters I see > permitted is the deletion of a newline and an IMMEDIATELY preceding backslash. I don't think it is deleting in the sense you are thinking of; it maps backslash followed by spaces followed by a return into just a return.
*** This bug has been marked as a duplicate of 8270 ***