This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

gcc 2.95.1 fixes for backslashes in #line, #include


The following recent change to GCC's preprocessor broke ANSI C conformance:

	Mon Dec  7 17:55:06 1998  Mike Stump  <mrs@wrs.com>

		* cccp.c (ignore_escape_flag): Add support for \ as `natural'
		characters in file names in #line to be consistent
		with #include handling....

I realize that the change was made for the benefit of compiling for
Microsoft, but the C standard requires that the file names in #line
directives must be parsed as C strings.  For example, the following
program must output a single " character followed by a newline, but
GCC 2.95.1 incorrectly rejects this program:

	#line 1 "\""
	main () { puts (__FILE__); }

Also, the old behavior was more useful, since it made it easier to
write programs that generate C programs from arbitrarily-named files.

The patch below fixes this problem for cccp, and also fixes several
related problems as noted below.  It changes the default behavior to
be ANSI-compliant, as with previous versions of GCC; it also adds a
new option -fbs-filenames for people who want the Microsoft-like
behavior.

I haven't fixed cpplib, though I note that it already parses #include
and #line filenames as plain strings; I'll CC this message to Zack
Weinberg so that he can look at the issue.

Here are the other, related problems fixed by this patch.

* `gcc -ansi -pedantic' does not emit a diagnostic for the line
  
  #include "stdio" ".h"

  as the C standard requires.

* Peter Seebach, a member of the ANSI C committee, recently reported
  that there's an error in the GCC documentation.  When talking about
  GCC's behavior with `#include "foo\bar"', the GCC manual says

    None of the character escape sequences appropriate to string
    constants in C are processed.  Thus, @samp{#include "x\n\\y"}
    specifies a filename containing three backslashes.  It is not
    clear why this behavior is ever useful, but the ANSI standard
    specifies it.

  The last quoted sentence is incorrect, because the ANSI standard says
  only that the behavior of backslashes in file names is defined by
  the implementation.

  For consistency and utility, I suggest that GCC instead treat
  special characters in `#include "file"' just as it treats them in
  `#line "file"'.  I suspect that this was the preferred behavior when
  cpp was originally written, but that cpp's authors mistakenly
  thought that the C standard prohibited it.  This is a strict
  extension to the C standard except for one small detail: the ANSI
  grammar requires a diagnostic for `#include "...\"..."'.  So the
  patch below emits a diagnostic for that particular case (but only if
  the -pedantic option is enabled).

  I realize that this change to `#include "..."' semantics may
  discomfit people who use Microsoft file names.  But they can work around
  any problems with -fbs-filenames, and anyway we should be
  encouraging people to use portable file names that are compatible
  with the ANSI standard, instead of Microsoft file names that are
  incompatible with the ANSI standard.

(Also, this change simplifies the code, so it must be right. :-)


1999-08-23  Paul Eggert  <eggert@twinsun.com>

	Handle `#line 1 "foo\bar"' in conformance to the C Standard,
	by parsing escape sequences in #line file names.  Also, change
	`#include "foo\bar"' so that it's consistent with #line
	"foo\bar".  Add a new option -fbs-filenames that disables
	backslash processing in include directive file names, for
	people who prefer Microsoft filename semantics.

	* extend.texi (#include "string"): New section.
	* cpp.texi, invoke.texi: New option -fbs-filenames
	include directive file names are now string literals by default.

	* gcc.c (default_compilers): Add -fbs-filenames.

	* cccp.c (bs_filenames): Renamed from ignore_escape_flag.
	Now defaults to zero.
	(print_help, main): New option -fbs-filenames.
	(handle_directive): Do not treat strings specially in #include
	and #line directives.
	(do_include): If !bs_filenames,
	convert escapes to the represented chars.
	But warn about \" if pedantic, since ANSI C requires this.
	Warn about concatenated string literals in #include if pedantic.
	(skip_if_group): Do not treat strings specially in #include
	directives.

===================================================================
RCS file: cccp.c,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- cccp.c	1999/06/01 17:10:01	2.95
+++ cccp.c	1999/08/24 00:55:55	2.95.1.1
@@ -834,10 +834,11 @@
 /* Name of output file, for error messages.  */
 static char *out_fname;
 
-/* Nonzero to ignore \ in string constants.  Use to treat #line 1 "A:\file.h
-   as a non-form feed.  If you want it to be a form feed, you must use
-   # 1 "\f".  */
-static int ignore_escape_flag = 1;
+/* Nonzero to disable escapes in user directive file name literals.
+   Use this to treat #line 1 "A:\file.h" as a non-form feed.
+   This is incompatible with the C standard, so it is off by default.
+   The (non-user) directive `# 1 "\f"' is unaffected by this option.  */
+static int bs_filenames;
 
 /* Stack of conditionals currently in progress
    (including both successful and failing conditionals).  */
@@ -1136,6 +1137,7 @@
   printf ("  -pedantic                 Issue all warnings demanded by strict ANSI C\n");
   printf ("  -traditional              Follow K&R pre-processor behaviour\n");
   printf ("  -trigraphs                Support ANSI C trigraphs\n");
+  printf ("  -fbs-filenames	       \\ is ordinary in #include \"...\"\n");
   printf ("  -lang-c                   Assume that the input sources are in C\n");
   printf ("  -lang-c89                 Assume that the input is C89; depricated\n");
   printf ("  -lang-c++                 Assume that the input sources are in C++\n");
@@ -1536,6 +1538,10 @@
 	  user_label_prefix = "_";
 	else if (!strcmp (argv[i], "-fno-leading-underscore"))
 	  user_label_prefix = "";
+	else if (!strcmp (argv[i], "-fbs-filenames"))
+	  bs_filenames = 1;
+	else if (!strcmp (argv[i], "-fno-bs-filenames"))
+	  bs_filenames = 0;
 	break;
 
       case 'M':
@@ -3669,8 +3675,6 @@
   /* Record where the directive started.  do_xifdef needs this.  */
   directive_start = bp - 1;
 
-  ignore_escape_flag = 1;
-
   /* Skip whitespace and \-newline.  */
   while (1) {
     if (is_hor_space[*bp]) {
@@ -3733,7 +3737,6 @@
 	pedwarn ("`#' followed by integer");
       after_ident = ident;
       kt = line_directive_table;
-      ignore_escape_flag = 0;
       goto old_linenum;
     }
 
@@ -3801,23 +3804,6 @@
 	  break;
 
 	case '"':
-	  /* "..." is special for #include.  */
-	  if (IS_INCLUDE_DIRECTIVE_TYPE (kt->type)) {
-	    while (bp < limit && *bp != '\n') {
-	      if (*bp == '"') {
-		bp++;
-		break;
-	      }
-	      if (*bp == '\\' && bp[1] == '\n') {
-		ip->lineno++;
-		copy_directive = 1;
-		bp++;
-	      }
-	      bp++;
-	    }
-	    break;
-	  }
-	  /* Fall through.  */
 	case '\'':
 	  bp = skip_quoted_string (bp - 1, limit, ip->lineno, &ip->lineno, &copy_directive, &unterminated);
 	  /* Don't bother calling the directive if we already got an error
@@ -4378,6 +4364,13 @@
 	    *fend = *fin++;
 	    if (*fend == '"')
 	      break;
+	    if (*fend == '\\' && !bs_filenames) {
+	      char *finc = (char *) fin - 1;
+	      if (*fin == '\"' && pedantic)
+		pedwarn ("file name contains \"");
+	      *fend = parse_escape (&finc, (HOST_WIDEST_INT) (U_CHAR) -1);
+	      fin = (U_CHAR *) finc;
+	    }
 	    fend++;
 	  }
 	  if (fin == limit)
@@ -4385,10 +4378,11 @@
 	  /* If not at the end, there had better be another string.  */
 	  /* Skip just horiz space, and don't go past limit.  */
 	  while (fin != limit && is_hor_space[*fin]) fin++;
-	  if (fin != limit && *fin == '\"')
-	    fin++;
-	  else
+	  if (fin == limit || *fin != '\"')
 	    goto fail;
+	  if (pedantic)
+	    pedwarn ("concatenated string literals in #include");
+	  fin++;
 	}
       }
 
@@ -6853,7 +6847,7 @@
 	return 0;
 
       case '\\':
-	if (! ignore_escape_flag)
+	if (! bs_filenames)
 	  {
 	    char *bpc = (char *) bp;
 	    HOST_WIDEST_INT c = parse_escape (&bpc, (HOST_WIDEST_INT) (U_CHAR) (-1));
@@ -7439,21 +7433,6 @@
       }
       break;
     case '\"':
-      if (skipping_include_directive) {
-	while (bp < endb && *bp != '\n') {
-	  if (*bp == '"') {
-	    bp++;
-	    break;
-	  }
-	  if (*bp == '\\' && bp[1] == '\n') {
-	    ip->lineno++;
-	    bp++;
-	  }
-	  bp++;
-	}
-	break;
-      }
-      /* Fall through.  */
     case '\'':
       bp = skip_quoted_string (bp - 1, endb, ip->lineno, &ip->lineno,
 			       NULL_PTR, NULL_PTR);
===================================================================
RCS file: cpp.texi,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- cpp.texi	1999/05/17 23:37:18	2.95
+++ cpp.texi	1999/08/24 00:05:53	2.95.1.1
@@ -333,13 +333,17 @@
 that the current input file refers to.  (If the @samp{-I-} option is
 used, the special treatment of the current directory is inhibited.)
 
-The argument @var{file} may not contain @samp{"} characters.  If
-backslashes occur within @var{file}, they are considered ordinary text
-characters, not escape characters.  None of the character escape
-sequences appropriate to string constants in C are processed.  Thus,
-@samp{#include "x\n\\y"} specifies a filename containing three
-backslashes.  It is not clear why this behavior is ever useful, but
-the ANSI standard specifies it.
+The argument @var{file} is a string literal.  Any backslashes within
+@var{file} are treated as normal string escape characters.  Thus,
+@samp{#include "\n\\\""} specifies a file name containing a newline, a
+backslash, and a quote.  This is an extension to the ANSI standard,
+which prohibits prohibits quotes in @var{file}, and which says that
+backslashes have implementation-defined behavior.  @xref{Invocation},
+for how to disable escape processing in @var{file}.
+
+An @samp{#include} directive can use a concatenation of string literals.
+For example, @samp{#include "foo" ".h"} is equivalent to @samp{#include
+"foo.h"}.  This is also an extension to ANSI C.
 
 @item #include @var{anything else}
 @cindex computed @samp{#include}
@@ -2422,7 +2426,8 @@
 came originally from source file @var{filename} and its line number there
 was @var{linenum}.  Keep in mind that @var{filename} is not just a
 file name; it is surrounded by doublequote characters so that it looks
-like a string constant.
+like a string constant.  @xref{Invocation}, for how to disable escape
+processing in @var{filename}.
 
 @item #line @var{anything else}
 @var{anything else} is checked for macro calls, which are expanded.
@@ -2937,6 +2942,12 @@
 The 199x C standard plus GNU extensions.
 @end table
 
+@item -fbs-filenames
+@findex -fbs-filenames
+In the file name string literals of @samp{#include} and @samp{#line}
+directives, treat @samp{\} as an ordinary character instead of as the
+initial character of an escape sequence.
+
 @item -Wp,-lint
 @findex -lint
 Look for commands to the program checker @code{lint} embedded in
===================================================================
RCS file: extend.texi,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- extend.texi	1999/04/14 05:34:34	2.95
+++ extend.texi	1999/08/24 00:05:53	2.95.1.1
@@ -48,6 +48,7 @@
 * Function Attributes:: Declaring that functions have no side effects,
                          or that they can never return.
 * Function Prototypes:: Prototype declarations and old-style definitions.
+* #include "string"::   Include directives can have arbitrary string literals.
 * C++ Comments::        C++ comments are recognized.
 * Dollar Signs::        Dollar sign is allowed in identifiers.
 * Character Escapes::   @samp{\e} stands for the character @key{ESC}.
@@ -96,6 +97,7 @@
 * Function Attributes:: Declaring that functions have no side effects,
                          or that they can never return.
 * Function Prototypes:: Prototype declarations and old-style definitions.
+* #include "string"::   Include directives can have arbitrary string literals.
 * C++ Comments::        C++ comments are recognized.
 * Dollar Signs::        Dollar sign is allowed in identifiers.
 * Character Escapes::   @samp{\e} stands for the character @key{ESC}.
@@ -1760,6 +1762,30 @@
 GNU C++ does not support old-style function definitions, so this
 extension is irrelevant.
 
+@node #include "string"
+@section #include "string"
+@cindex #include "string"
+@cindex include directives and string literals
+@cindex string literals in include directives
+
+In GNU C, @samp{#include} directives can contain arbitrary string
+literals.  Any backslashes within @var{file} are treated as string
+escape characters.  Thus, @samp{#include "\n\\\""} specifies a file name
+containing a newline, a backslash, and a quote.  This extends ANSI C,
+which prohibits quotes in @var{file}, and which says that backslashes
+have implementation-defined behavior.
+
+Also, in GNU C, @samp{#include} directives can use a concatenation of
+string literals.  For example, @samp{#include "foo" ".h"} is equivalent
+to @samp{#include "foo.h"}.
+
+Some non-GNU hosts treat @samp{\} as a file name directory separator.
+On such hosts you should normally use @samp{/} instead of @samp{\},
+since @samp{/} also works and is more portable to GNU hosts.  However,
+if for some reason you cannot fix the backslashes in directives like
+@samp{#include "dir\foo.h"}, you can use the @samp{-fbs-filenames}
+option instead (@pxref{C Dialect Options}).
+
 @node C++ Comments
 @section C++ Style Comments
 @cindex //
===================================================================
RCS file: gcc.c,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- gcc.c	1999/08/05 08:44:13	2.95
+++ gcc.c	1999/08/24 00:05:53	2.95.1.1
@@ -605,6 +605,7 @@
 	%{ffast-math:-D__FAST_MATH__}\
         %{traditional} %{ftraditional:-traditional}\
         %{traditional-cpp:-traditional}\
+	%{fbs-filenames} %{fno-bs-filenames}\
 	%{fleading-underscore} %{fno-leading-underscore}\
 	%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
         %i %{E:%W{o*}}%{M:%W{o*}}%{MM:%W{o*}}\n}\
@@ -641,6 +642,7 @@
 	%{ffast-math:-D__FAST_MATH__}\
         %{traditional} %{ftraditional:-traditional}\
         %{traditional-cpp:-traditional}\
+	%{fbs-filenames} %{fno-bs-filenames}\
 	%{fleading-underscore} %{fno-leading-underscore}\
 	%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
         %i %{!M:%{!MM:%{!E:%{!pipe:%g.i}}}}%{E:%W{o*}}%{M:%W{o*}}%{MM:%W{o*}} |\n",
@@ -669,6 +671,7 @@
 	%{ffast-math:-D__FAST_MATH__}\
         %{traditional} %{ftraditional:-traditional}\
         %{traditional-cpp:-traditional}\
+	%{fbs-filenames} %{fno-bs-filenames}\
 	%{fleading-underscore} %{fno-leading-underscore}\
 	%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
         %i %W{o*}}\
@@ -686,6 +689,7 @@
 	%{ffast-math:-D__FAST_MATH__}\
         %{traditional} %{ftraditional:-traditional}\
         %{traditional-cpp:-traditional}\
+	%{fbs-filenames} %{fno-bs-filenames}\
 	%{fleading-underscore} %{fno-leading-underscore}\
 	%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
         %i %W{o*}"}},
@@ -715,6 +719,7 @@
 	%{ffast-math:-D__FAST_MATH__}\
         %{traditional} %{ftraditional:-traditional}\
         %{traditional-cpp:-traditional}\
+	%{fbs-filenames} %{fno-bs-filenames}\
 	%{fleading-underscore} %{fno-leading-underscore}\
 	%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
         %i %{!M:%{!MM:%{!E:%{!pipe:%g.s}}}}%{E:%W{o*}}%{M:%W{o*}}%{MM:%W{o*}} |\n",
===================================================================
RCS file: invoke.texi,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- invoke.texi	1999/08/11 06:50:31	2.95
+++ invoke.texi	1999/08/24 00:05:53	2.95.1.1
@@ -93,7 +93,8 @@
 @item C Language Options
 @xref{C Dialect Options,,Options Controlling C Dialect}.
 @smallexample
--ansi -flang-isoc9x -fallow-single-precision  -fcond-mismatch  -fno-asm
+-ansi  -flang-isoc9x  -fallow-single-precision
+-fno-asm  -fbs-filenames  -fcond-mismatch
 -fno-builtin  -ffreestanding  -fhosted  -fsigned-bitfields  -fsigned-char
 -funsigned-bitfields  -funsigned-char  -fwritable-strings
 -traditional  -traditional-cpp  -trigraphs
@@ -851,6 +852,11 @@
 string constants can contain the newline character as typed.)
 @end itemize
 
+@item -fbs-filenames
+In the file name string literals of @samp{#include} and @samp{#line}
+directives, treat @samp{\} as an ordinary character instead of as the
+initial character of an escape sequence.
+
 @item -fcond-mismatch
 Allow conditional expressions with mismatched types in the second and
 third arguments.  The value of such an expression is void.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]