Bug 38433 - Incorrect handling of line termination character with trailing spaces
Summary: Incorrect handling of line termination character with trailing spaces
Status: RESOLVED DUPLICATE of bug 8270
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-06 23:57 UTC by Eric Niebler
Modified: 2008-12-07 00:58 UTC (History)
7 users (show)

See Also:
Host:
Target: i686-pc-cygwin
Build: Configured with: ../gcc-4.3.0/configure --enable-languages=c,c++
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Compile with: g++ -Wall test.cpp (192 bytes, text/plain)
2008-12-06 23:59 UTC, Eric Niebler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Niebler 2008-12-06 23:57:58 UTC
In the attached file, there is a comment terminated with a line-termination character (\) followed by spaces. This should NOT be considered a line terminator, yet gcc considers it as such. From 2.1/2 in the C++03 standard:

"Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines."

That is, only backslashes immediately followed by a newline are considered line terminators. The existing behavior of gcc violates the standard and conflicts with the behavior of other popular C++ compilers (EDG, MSVC).
Comment 1 Eric Niebler 2008-12-06 23:59:01 UTC
Created attachment 16843 [details]
Compile with: g++ -Wall test.cpp
Comment 2 jsm-csl@polyomino.org.uk 2008-12-07 00:28:01 UTC
Subject: Re:   New: Incorrect handling of line termination
 character with trailing spaces

On Sat, 6 Dec 2008, eric dot niebler at gmail dot com wrote:

> In the attached file, there is a comment terminated with a line-termination
> character (\) followed by spaces. This should NOT be considered a line
> terminator, yet gcc considers it as such. From 2.1/2 in the C++03 standard:
> 
> "Each instance of a new-line character and an immediately preceding backslash
> character is deleted, splicing physical source lines to form logical source
> lines."

This (removal of such spaces) is part of how GCC defines the 
implementation-defined mapping in translation phase 1.  There are no input 
files that GCC interprets as representing a program that enters phase 2 
with backslash-space at the end of a line.

> That is, only backslashes immediately followed by a newline are considered line
> terminators. The existing behavior of gcc violates the standard and conflicts
> with the behavior of other popular C++ compilers (EDG, MSVC).

No, it conforms to the standard but does not allow certain programs to be 
represented.  (I think this is a bad idea, but that's another matter.)

Comment 3 Eric Niebler 2008-12-07 00:46:27 UTC
If you are referring to 2.1/1 ...

"Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences (2.3) are replaced by corresponding single-character internal representations. Any source file
character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e. using the \uXXXX notation), are handled equivalently.)"

I read this as permitting a mapping of characters, but not a deletion of characters, which is what gcc is doing. The only deletion of characters I see permitted is the deletion of a newline and an IMMEDIATELY preceding backslash.
Comment 4 Andrew Pinski 2008-12-07 00:54:38 UTC
(In reply to comment #3)
> I read this as permitting a mapping of characters, but not a deletion of
> characters, which is what gcc is doing. The only deletion of characters I see
> permitted is the deletion of a newline and an IMMEDIATELY preceding backslash.

I don't think it is deleting in the sense you are thinking of; it maps backslash followed by spaces followed by a return into just a return.
Comment 5 Andrew Pinski 2008-12-07 00:58:34 UTC

*** This bug has been marked as a duplicate of 8270 ***