Bug 8270 - [12/13/14/15 Regression] back-slash white space newline with comments, no warning
Summary: [12/13/14/15 Regression] back-slash white space newline with comments, no war...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 3.2
: P5 minor
Target Milestone: 12.5
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
: 3786 5735 15519 24531 38433 44355 (view as bug list)
Depends on: 263
Blocks: 24531
  Show dependency treegraph
 
Reported: 2002-10-17 15:26 UTC by eschmidt
Modified: 2024-07-19 12:52 UTC (History)
13 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.0.4
Last reconfirmed: 2005-10-26 00:25:10


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description eschmidt 2002-10-17 15:26:00 UTC
The preprocessor does not remove the extension allowing extra white space in a backslash-newline, even when GCC is invoked with -ansi

gcc -ansi bug.c -o bug

bug.c:1:13: warning: backslash and newline separated by space
bug.c:3: parse error before "return"

An executable file sohuld have been created.

Release:
gcc 3.2

Environment:
Configured with: /devel/gcc32/gnu/gcc-3.2/configure i586-pc-msdosdjgpp --prefix=/dev/env/DJDIR --disable-nls
Thread model: single

How-To-Repeat:
#define foo \ 
int main(void) {
  return 0;
}
Comment 1 eschmidt 2002-10-17 15:26:00 UTC
Fix:
Workaround: put a comment between the backslash and the newline.
Comment 2 Neil Booth 2002-10-21 20:57:43 UTC
From: Neil Booth <neil@daikokuya.co.uk>
To: eschmidt@safeaccess.com, Zack Weinberg <zack@codesourcery.com>
Cc: gcc-gnats@gcc.gnu.org
Subject: Re: preprocessor/8270: back-slash newline extension can't be removed
Date: Mon, 21 Oct 2002 20:57:43 +0100

 eschmidt@safeaccess.com wrote:-
 
 > >Synopsis:       back-slash newline extension can't be removed
 
 [...]
 
 > The preprocessor does not remove the extension allowing extra white space in a backslash-newline, even when GCC is invoked with -ansi
 > 
 > gcc -ansi bug.c -o bug
 > 
 > bug.c:1:13: warning: backslash and newline separated by space
 > bug.c:3: parse error before "return"
 > 
 > An executable file sohuld have been created.
 
 Zack, I favour closing this, since the behaviour can be avoided if
 deemed necessary by judicious insertion of a C comment.  Agreed?
 
 Neil.
Comment 3 Zack Weinberg 2002-10-22 14:19:04 UTC
From: Zack Weinberg <zack@codesourcery.com>
To: Neil Booth <neil@daikokuya.co.uk>
Cc: eschmidt@safeaccess.com, gcc-gnats@gcc.gnu.org
Subject: Re: preprocessor/8270: back-slash newline extension can't be removed
Date: Tue, 22 Oct 2002 14:19:04 -0700

 On Mon, Oct 21, 2002 at 08:57:43PM +0100, Neil Booth wrote:
 > eschmidt@safeaccess.com wrote:-
 > 
 > > >Synopsis:       back-slash newline extension can't be removed
 > 
 > [...]
 > 
 > > The preprocessor does not remove the extension allowing extra white space in a backslash-newline, even when GCC is invoked with -ansi
 > > 
 > > gcc -ansi bug.c -o bug
 > > 
 > > bug.c:1:13: warning: backslash and newline separated by space
 > > bug.c:3: parse error before "return"
 > > 
 > > An executable file sohuld have been created.
 > 
 > Zack, I favour closing this, since the behaviour can be avoided if
 > deemed necessary by judicious insertion of a C comment.  Agreed?
 
 Agreed.  I don't consider it necessary to cut any slack for people
 whose code depends on trailing whitespace.
 
 zw
Comment 4 Neil Booth 2002-10-22 14:23:58 UTC
State-Changed-From-To: open->closed
State-Changed-Why: We feel that in the rare cases where a backslash newline is not intended, a C-style comment can be inserted.
Comment 5 Andrew Pinski 2004-09-08 17:28:43 UTC
*** Bug 15519 has been marked as a duplicate of this bug. ***
Comment 6 Andrew Pinski 2005-10-25 23:45:15 UTC
Reopening to ...
Comment 7 Andrew Pinski 2005-10-25 23:45:45 UTC
To close as invalid.
Comment 8 Andrew Pinski 2005-10-25 23:46:11 UTC
*** Bug 24531 has been marked as a duplicate of this bug. ***
Comment 9 Eric Christopher 2005-10-26 00:24:18 UTC
reopening
Comment 10 Eric Christopher 2005-10-26 00:26:06 UTC
int main (int argc, char * const argv[]) {

    //  |_____________|            |______________\ 
    if (1)
    {
        printf("Hello");
    }
    else
    {
        printf("There");
    }

    return 0;
}

Note that there's a space after the comment above.

To clarify this a bit more:

http://gcc.gnu.org/ml/gcc/2005-10/msg00819.html

Is Howard Hinnant's comment on the subject and the thread that started this.

There's a space at the end of the line that the has the comment.
Comment 12 Andrew Pinski 2005-10-26 00:45:07 UTC
Hmm, looks like the diagnost issue is a regression.
Comment 13 Andrew Pinski 2005-10-26 00:46:11 UTC
The diagnostic issue is a regression.
Comment 14 Andrew Pinski 2005-10-26 00:46:51 UTC
Cause by the patch referenced in comment #11.
Comment 15 Andrew Pinski 2005-10-26 00:58:39 UTC
One more previous discussion, this time started from someone at Apple:
http://gcc.gnu.org/ml/gcc/2002-11/msg00267.html
Comment 16 Andrew Pinski 2005-10-26 01:02:05 UTC
More discusssions:
http://gcc.gnu.org/ml/gcc-patches/2000-08/msg01118.html
Comment 17 Andrew Pinski 2005-10-26 01:05:15 UTC
Some more:
http://gcc.gnu.org/ml/gcc-patches/2001-04/msg00543.html
Comment 19 Andrew Pinski 2005-10-26 01:17:07 UTC
http://gcc.gnu.org/ml/gcc/2000-05/msg01032.html


Knowing the history around these are important.  and if you don't believe that well then getting this right is not going to happen.
Comment 20 Andrew Pinski 2005-10-26 01:31:25 UTC
Maybe the last one:
http://gcc.gnu.org/ml/gcc-bugs/2000-10/msg00117.html

There are most likely more.  All found using google, maybe people should be doing that sometimes.
Comment 21 Andrew Pinski 2005-10-26 01:33:13 UTC
Oh, the last discussion of this problem was in PR 15519
Comment 22 Andrew Pinski 2005-10-26 01:37:31 UTC
Another one from earlier this year:
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01685.html
Comment 23 Andrew Pinski 2005-10-26 01:39:35 UTC
From 2003:
http://gcc.gnu.org/ml/gcc/2003-11/msg00105.html
Comment 24 Andrew Pinski 2005-10-26 01:42:19 UTC
*** Bug 3786 has been marked as a duplicate of this bug. ***
Comment 25 Andrew Pinski 2005-10-26 01:59:21 UTC
*** Bug 5735 has been marked as a duplicate of this bug. ***
Comment 26 Andrew Pinski 2005-10-26 02:00:44 UTC
Note changing this behavior will most likely get PR 263 wrong.
Comment 27 Andrew Pinski 2005-10-26 02:12:18 UTC
One more note, we do get a warning with -W -Wall:
t.c:1:1: warning: multi-line comment

Which is our recommened warning level anyways, yes this is inconstaint but that was a disussion which was maded in comment #11
Comment 28 Andrew Pinski 2005-10-26 06:29:25 UTC
You might as well look into PR 24024 which is only semi related but since you are looking at preprocessor with backslash, it might be easy to fix that one too.
Comment 29 Andrew Pinski 2005-10-26 21:41:56 UTC
Hmm, there consense is that at the least we should warn for comments.  But the consense from non Apple people it seems to not to change the behavior.
Comment 30 Eric Christopher 2005-10-26 21:46:18 UTC
That would be the consensus from Andrew, not from people concerned that deal with language issues routinely.
Comment 31 Andrew Pinski 2005-10-26 21:49:03 UTC
(In reply to comment #30)
> That would be the consensus from Andrew, not from people concerned that deal
> with language issues routinely.

Wait a minute, if you actually look at the people agrueing for the change, it is only Apple employees.  Joe has said we should not change it.  It looks like DJ is saying the same in the new thread which shows the real issues with the other compilers implemenation.
Comment 32 Neil Booth 2005-10-26 23:07:09 UTC
Subject: Re:  [3.4/4.0/4.1 Regression] back-slash newline extension can't be removed

pinskia at gcc dot gnu dot org wrote:-

> > That would be the consensus from Andrew, not from people concerned that deal
> > with language issues routinely.
> 
> Wait a minute, if you actually look at the people agrueing for the change, it
> is only Apple employees.  Joe has said we should not change it.  It looks like
> DJ is saying the same in the new thread which shows the real issues with the
> other compilers implemenation.

I've said we should change it, I don't work for Apple.  Please stop
trying to claim your opinion is some kind of consensus.

Neil.
Comment 33 DJ Delorie 2005-10-26 23:13:47 UTC
Subject: Re:  [3.4/4.0/4.1 Regression] back-slash newline extension can't be removed


> It looks like DJ is saying the same in the new thread which shows
> the real issues with the other compilers implemenation.

I would be in favor of treating \r differently from other whitespace
for the purposes of reporting this error.  The cr-lf-newline mess is
different from the trailing space mess.
Comment 34 Mark Mitchell 2005-10-30 21:43:13 UTC
This is a usability issue (and, maybe, a pedantic standards-conformance issue?), but is not release-critical.
Comment 35 Eric Christopher 2006-06-08 21:06:14 UTC
I'm unlikely to work on this...
Comment 36 Gabriel Dos Reis 2007-01-18 02:37:10 UTC
A fix is not going to happen for GCC-4.0.x
Comment 37 Joseph S. Myers 2008-07-04 16:28:58 UTC
Closing 4.1 branch.
Comment 38 Andrew Pinski 2008-12-07 00:58:34 UTC
*** Bug 38433 has been marked as a duplicate of this bug. ***
Comment 39 Andrew Pinski 2008-12-07 01:01:09 UTC
From JSM in PR 38433:

On Sat, 6 Dec 2008, eric dot niebler at gmail dot com wrote:

> In the attached file, there is a comment terminated with a line-termination
> character (\) followed by spaces. This should NOT be considered a line
> terminator, yet gcc considers it as such. From 2.1/2 in the C++03 standard:
> 
> "Each instance of a new-line character and an immediately preceding backslash
> character is deleted, splicing physical source lines to form logical source
> lines."

This (removal of such spaces) is part of how GCC defines the 
implementation-defined mapping in translation phase 1.  There are no input 
files that GCC interprets as representing a program that enters phase 2 
with backslash-space at the end of a line.

> That is, only backslashes immediately followed by a newline are considered line
> terminators. The existing behavior of gcc violates the standard and conflicts
> with the behavior of other popular C++ compilers (EDG, MSVC).

No, it conforms to the standard but does not allow certain programs to be 
represented.  (I think this is a bad idea, but that's another matter.)

--- CUT ---
Which explains why this is conforming to the standard and is allowed.
Comment 40 Joseph S. Myers 2009-03-31 16:12:11 UTC
Closing 4.2 branch.
Comment 41 Richard Biener 2009-08-04 12:25:53 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 42 Richard Biener 2010-05-22 18:09:55 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 43 Andrew Pinski 2010-06-01 18:24:59 UTC
*** Bug 44355 has been marked as a duplicate of this bug. ***
Comment 44 Richard Biener 2011-06-27 12:11:52 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 45 Jakub Jelinek 2012-03-13 12:44:39 UTC
4.4 branch is being closed, moving to 4.5.4 target.
Comment 46 Jakub Jelinek 2013-04-12 15:15:22 UTC
GCC 4.6.4 has been released and the branch has been closed.
Comment 47 GoWhoopee 2013-12-11 12:41:42 UTC
There is a rule: single line comments are extended by backslash newline.
Ludicrous as it is, this rule is not optional.

Failure to observe this rule by a compiler is criminal: the result is that lines of code which a correctly written syntax highlighter shows as code that will be compiled, are not.
This has cost this company several days in time. We write code for the military: the cost could have been far worse had the unusual behaviour not been noticed.
Compilers MUST NOT randomly ignore lines of code!

Programmers frequently illustrate code with ASCII art and backslash is a valuable ASCII art character. To ensure the single-line comment is not extended and the clarity of the ASCII art remains, a space would be inserted after any trailing backslash.
ASCII art (or anything attempt at explanation of the code) in comment is NOT bad coding style (as stated by Atmel technical support).
Any editor or compiler operation that trims spaces from lines MUST NOT trim spaces all the way back to a backslash because that will change the programmers intent by creating backslash newline where there was none!
That should be obvious!

GCC isn't alone in missing this crucial point.
Atmel's version of MS Visual Studio offers a [Tools][Options][Environment][Custom Settings][Remove whitespaces trailing end of line, while saving the document] which does what it says, silently breaking code as it does so.
Their tech support said their environment did what it said on the tin and they couldn't impose themselves on the GCC community, but they could write to the syntax highlight software company and ask them to break their code too!

Oh, how we laughed...

The solution should be simple: when trimming white-space from the right of a line of code, don't create backslash newline where it didn't exist before.

Please reconsider and stop gcc from changing our code without our permission.
Comment 48 Andrew Pinski 2013-12-11 18:33:36 UTC
(In reply to GoWhoopee from comment #47)
> Please reconsider and stop gcc from changing our code without our permission.

It is not changing your code at all.  Read comment #39 to understand this issue at full understanding of the standard.
Comment 49 GoWhoopee 2013-12-12 08:24:11 UTC
I've read all the comments and all those on linked forums and I have no idea how you struggle with this!

If a compiler changes backslash space into backslash newline and consequently deletes the newline it is changing the meaning of the code!

All other development environments people here have used don't do this and gcc shouldn't!

Here's an example of code your compiler changes:

#define HIGH_SPEED_TURRET   // \ 
#define SAFETY_LOCKED_ON    //  >------------- Critical Configuration
#define NEVER_PRIME_MISSILE // /

The programmer put backslash space and the syntax highlighter correctly showed the safety was locked on.

Sleep well.
Comment 50 Mikael Pettersson 2013-12-12 15:02:42 UTC
(In reply to Andrew Pinski from comment #48)
> (In reply to GoWhoopee from comment #47)
> > Please reconsider and stop gcc from changing our code without our permission.
> 
> It is not changing your code at all.  Read comment #39 to understand this
> issue at full understanding of the standard.

I'm looking at N1570 section 5.1.1.2 "Translation phases".  Phase 1 only maps multibyte characters and trigraphs,  Backslash-space-newline is neither so should be preserved as-is to phase 2.  The splicing in phase 2 then shouldn't occur because of the space.  Or am I missing something?
Comment 51 GoWhoopee 2013-12-13 08:15:35 UTC
http://web.cs.dal.ca/~vlado/pl/C_Standard_2011-n1570.pdf

That's the principle, but not what happens with gcc...

Phase 2 says, "Each instance of a backslash character (\) immediately followed by a newline character is deleted, splicing physical source lines to form logical source lines.", and explicitly states that, "Only the last backslash on any physical source line shall be eligible for such a splice.".

I wonder if all trailing white-space is being trimmed from each source line before or during the first Translation Phase?
Comment 52 GoWhoopee 2013-12-13 10:48:36 UTC
Whitespace is required by Translation Phase 3, consequently Translation Phase 1 should not be changing whitespace at all, only mapping multibyte characters and trigraphs.

Comment #39: Indicates that gcc is known to work incorrectly, "This (removal of such spaces) is part of how GCC defines the implementation-defined mapping in translation phase 1.": the removal of white-space is not mapping multibyte characters or trigraphs, it is removing critical information from Translation Phases 2 and 3 resulting in misinterpretation of the source code.

Looking at the 4.8.2 source, libcpp\lex.c line 1427, there is a fix when parsing raw strings, after the event:
______________________________________________
static void
lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base,
		const uchar *cur)
{
[...]
	  switch (note->type)
	    {
	    case '\\':
	    case ' ':
	      /* Restore backslash followed by newline.  */
	      BUF_APPEND (base, cur - base);
	      base = cur;
	      BUF_APPEND ("\\", 1);
	    after_backslash:
	      if (note->type == ' ')
		{
		  /* GNU backslash whitespace newline extension.  FIXME
		     could be any sequence of non-vertical space.  When we
		     can properly restore any such sequence, we should mark
		     this note as handled so _cpp_process_line_notes
		     doesn't warn.  */
		  BUF_APPEND (" ", 1);
		}

	      BUF_APPEND ("\n", 1);
	      break;
______________________________________________

but fixing all the varieties of broken things after the event wouldn't be necessary if Translation Phase 1 didn't trim whitespace.

If Translation Phase 1 is required to trim whitespace for some reason (performance, perhaps) then it should trim multiple consecutive spaces down to exactly one space; which wouldn't break Translation Phase 2 and 3.

Does that sound like a sensible compromise?
Comment 53 Richard Biener 2014-06-12 13:40:58 UTC
The 4.7 branch is being closed, moving target milestone to 4.8.4.
Comment 54 Jakub Jelinek 2014-12-19 13:23:36 UTC
GCC 4.8.4 has been released.
Comment 55 Kai Tietz 2015-03-19 13:23:26 UTC
Well, by looking into the standard ISO/IEC 9899:TC3 I found the following statement:

5.1.12 Translation phase
"2. Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place."

For ISO/IEC 14882:2003 we see at topic "2 Lexical Convention"

"2 Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines. If, as a result, a character sequence that matches the syntax of a universal-character-name is produced, the behavior is undefined. If a source file that is not empty does not end in a new-line character, or ends in a new-line character immediately preceded by a backslash character, the behavior is undefined."

So the handling of backslash whitespace newline is clearly a gnu-extension and not part of the standard.

I suggest something like this patch for fixing standard-requirement.  Additionally we could check here for cpp_option lang being gnu-style for allowing 'backslash,whitespaces,newling' too.

Index: lex.c
===================================================================
--- lex.c       (Revision 221514)
+++ lex.c       (Arbeitskopie)
@@ -896,6 +896,11 @@ _cpp_clean_line (cpp_reader *pfile)
        p--;
       if (p - 1 != pbackslash)
        goto done;
+      if (p != d)
+       {
+         ++p;
+         goto done;
+       }

       /* Have an escaped newline; process it and proceed to
         the slow path.  */
@@ -917,13 +922,19 @@ _cpp_clean_line (cpp_reader *pfile)
              if (s == buffer->rlimit)
                break;

-             /* Escaped?  */
+             /* Escaped?
+                But make sure it isn't a backslash followed by a
+                whitespace.  */
              p = d;
              while (p != buffer->next_line && is_nvspace (p[-1]))
                p--;
              if (p == buffer->next_line || p[-1] != '\\')
                break;
-
+             if (p != d)
+               {
+                 ++p;
+                 break;
+               }
              add_line_note (buffer, p - 1, p != d ? ' ': '\\');
              d = p - 2;
              buffer->next_line = p - 1;
Comment 56 doug mcilroy 2015-03-21 14:40:04 UTC
(In reply to Kai Tietz from comment #55)
Comment #55 overlooks the Standard's translation phase 1, which replaces an implementation-defined end-of-line indicator with a new-line character. GCC's convention of including in the end-of-line indicator any white space that is preceded by a backslash conforms, though it may be a surprise.

The surprise is perversely out of sympathy with the raison d'etre of the standard--maximal portability. It is incompatible with the most direct (and historically prior) implementations, wherein the end-of-line indicator is simply a new-line character.

A suitable fix is to warn when white space occurs in an end-of-line indicator. This will break no code that GCC currently compiles, yet draw attention to the nonportable construct.

Here is what the C11 standard says about the end-of-line indicator:

5.1.1.2
Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary.

5.2.1 paragraph 3
In source files, there shall be some way of indicating the end of each line of text; this International Standard treats such an end-of-line indicator as if it were a single new-line character.
Comment 57 Kai Tietz 2015-03-23 09:44:39 UTC
(In reply to doug mcilroy from comment #56)
> (In reply to Kai Tietz from comment #55)
> Comment #55 overlooks the Standard's translation phase 1, which replaces an
> implementation-defined end-of-line indicator with a new-line character.
> GCC's convention of including in the end-of-line indicator any white space
> that is preceded by a backslash conforms, though it may be a surprise.

Sure, sorry for omitting that.  Common understanding of "multibyte" (this term is indeed misleading here) newline characters are in common the combination of '\r' and '\n'.  So by interpreting any whitespace + new-line being seen as a single-character is valid, but has indeed semantic differences.
 
> The surprise is perversely out of sympathy with the raison d'etre of the
> standard--maximal portability. It is incompatible with the most direct (and
> historically prior) implementations, wherein the end-of-line indicator is
> simply a new-line character.

Agreed, and we should at least consider to provide an option - beside the necessary warning - to not strip whitespaces from right-handside of lines containing a backslash at line's end.
Should we use an existing option (like -ansi), or introduce new option for this?
 
> A suitable fix is to warn when white space occurs in an end-of-line
> indicator. This will break no code that GCC currently compiles, yet draw
> attention to the nonportable construct.

Well, in general we are warning, but within comments.  For C-style comments there is indeed not much reason to warn, as there is no semantic difference.  But for C++-style comments we should, as here indeed a semantic difference can occure for gnu-style end-of-line treating
Comment 58 Richard Biener 2015-06-23 08:12:40 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 59 Jakub Jelinek 2015-06-26 19:51:09 UTC
GCC 4.9.3 has been released.
Comment 60 Richard Biener 2016-08-03 08:37:09 UTC
GCC 4.9 branch is being closed
Comment 61 doug mcilroy 2017-11-03 13:40:46 UTC
Contrary to comment #57, the GCC convention does affect the interpretation of C-style comments. GCC rejects this Christmas tree with trailing spaces.

/*
  /\    
 /**\   
/****\  
*/
Comment 62 Jakub Jelinek 2018-10-26 10:18:05 UTC
GCC 6 branch is being closed
Comment 63 Richard Biener 2019-11-14 07:48:29 UTC
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
Comment 64 Jakub Jelinek 2020-03-04 09:40:06 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 65 Jakub Jelinek 2021-05-14 09:45:07 UTC
GCC 8 branch is being closed.
Comment 66 Richard Biener 2021-06-01 08:03:27 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 67 Richard Biener 2022-05-27 09:33:00 UTC
GCC 9 branch is being closed
Comment 68 Jakub Jelinek 2022-06-28 10:28:42 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 69 Richard Biener 2023-07-07 10:28:05 UTC
GCC 10 branch is being closed.
Comment 70 veronica alphonso 2024-03-10 12:09:18 UTC
Any update on this bug?
Comment 71 Richard Biener 2024-07-19 12:52:36 UTC
GCC 11 branch is being closed.