This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
gcc 2.95.1 fixes for backslashes in #line, #include
- To: gcc-patches@gcc.gnu.org
- Subject: gcc 2.95.1 fixes for backslashes in #line, #include
- From: Paul Eggert <eggert@twinsun.com>
- Date: Mon, 23 Aug 1999 18:34:57 -0700 (PDT)
- CC: Mike Stump <mrs@wrs.com>, Peter Seebach <seebs@plethora.net>, Zack Weinberg <zack@rabi.columbia.edu>
The following recent change to GCC's preprocessor broke ANSI C conformance:
Mon Dec 7 17:55:06 1998 Mike Stump <mrs@wrs.com>
* cccp.c (ignore_escape_flag): Add support for \ as `natural'
characters in file names in #line to be consistent
with #include handling....
I realize that the change was made for the benefit of compiling for
Microsoft, but the C standard requires that the file names in #line
directives must be parsed as C strings. For example, the following
program must output a single " character followed by a newline, but
GCC 2.95.1 incorrectly rejects this program:
#line 1 "\""
main () { puts (__FILE__); }
Also, the old behavior was more useful, since it made it easier to
write programs that generate C programs from arbitrarily-named files.
The patch below fixes this problem for cccp, and also fixes several
related problems as noted below. It changes the default behavior to
be ANSI-compliant, as with previous versions of GCC; it also adds a
new option -fbs-filenames for people who want the Microsoft-like
behavior.
I haven't fixed cpplib, though I note that it already parses #include
and #line filenames as plain strings; I'll CC this message to Zack
Weinberg so that he can look at the issue.
Here are the other, related problems fixed by this patch.
* `gcc -ansi -pedantic' does not emit a diagnostic for the line
#include "stdio" ".h"
as the C standard requires.
* Peter Seebach, a member of the ANSI C committee, recently reported
that there's an error in the GCC documentation. When talking about
GCC's behavior with `#include "foo\bar"', the GCC manual says
None of the character escape sequences appropriate to string
constants in C are processed. Thus, @samp{#include "x\n\\y"}
specifies a filename containing three backslashes. It is not
clear why this behavior is ever useful, but the ANSI standard
specifies it.
The last quoted sentence is incorrect, because the ANSI standard says
only that the behavior of backslashes in file names is defined by
the implementation.
For consistency and utility, I suggest that GCC instead treat
special characters in `#include "file"' just as it treats them in
`#line "file"'. I suspect that this was the preferred behavior when
cpp was originally written, but that cpp's authors mistakenly
thought that the C standard prohibited it. This is a strict
extension to the C standard except for one small detail: the ANSI
grammar requires a diagnostic for `#include "...\"..."'. So the
patch below emits a diagnostic for that particular case (but only if
the -pedantic option is enabled).
I realize that this change to `#include "..."' semantics may
discomfit people who use Microsoft file names. But they can work around
any problems with -fbs-filenames, and anyway we should be
encouraging people to use portable file names that are compatible
with the ANSI standard, instead of Microsoft file names that are
incompatible with the ANSI standard.
(Also, this change simplifies the code, so it must be right. :-)
1999-08-23 Paul Eggert <eggert@twinsun.com>
Handle `#line 1 "foo\bar"' in conformance to the C Standard,
by parsing escape sequences in #line file names. Also, change
`#include "foo\bar"' so that it's consistent with #line
"foo\bar". Add a new option -fbs-filenames that disables
backslash processing in include directive file names, for
people who prefer Microsoft filename semantics.
* extend.texi (#include "string"): New section.
* cpp.texi, invoke.texi: New option -fbs-filenames
include directive file names are now string literals by default.
* gcc.c (default_compilers): Add -fbs-filenames.
* cccp.c (bs_filenames): Renamed from ignore_escape_flag.
Now defaults to zero.
(print_help, main): New option -fbs-filenames.
(handle_directive): Do not treat strings specially in #include
and #line directives.
(do_include): If !bs_filenames,
convert escapes to the represented chars.
But warn about \" if pedantic, since ANSI C requires this.
Warn about concatenated string literals in #include if pedantic.
(skip_if_group): Do not treat strings specially in #include
directives.
===================================================================
RCS file: cccp.c,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- cccp.c 1999/06/01 17:10:01 2.95
+++ cccp.c 1999/08/24 00:55:55 2.95.1.1
@@ -834,10 +834,11 @@
/* Name of output file, for error messages. */
static char *out_fname;
-/* Nonzero to ignore \ in string constants. Use to treat #line 1 "A:\file.h
- as a non-form feed. If you want it to be a form feed, you must use
- # 1 "\f". */
-static int ignore_escape_flag = 1;
+/* Nonzero to disable escapes in user directive file name literals.
+ Use this to treat #line 1 "A:\file.h" as a non-form feed.
+ This is incompatible with the C standard, so it is off by default.
+ The (non-user) directive `# 1 "\f"' is unaffected by this option. */
+static int bs_filenames;
/* Stack of conditionals currently in progress
(including both successful and failing conditionals). */
@@ -1136,6 +1137,7 @@
printf (" -pedantic Issue all warnings demanded by strict ANSI C\n");
printf (" -traditional Follow K&R pre-processor behaviour\n");
printf (" -trigraphs Support ANSI C trigraphs\n");
+ printf (" -fbs-filenames \\ is ordinary in #include \"...\"\n");
printf (" -lang-c Assume that the input sources are in C\n");
printf (" -lang-c89 Assume that the input is C89; depricated\n");
printf (" -lang-c++ Assume that the input sources are in C++\n");
@@ -1536,6 +1538,10 @@
user_label_prefix = "_";
else if (!strcmp (argv[i], "-fno-leading-underscore"))
user_label_prefix = "";
+ else if (!strcmp (argv[i], "-fbs-filenames"))
+ bs_filenames = 1;
+ else if (!strcmp (argv[i], "-fno-bs-filenames"))
+ bs_filenames = 0;
break;
case 'M':
@@ -3669,8 +3675,6 @@
/* Record where the directive started. do_xifdef needs this. */
directive_start = bp - 1;
- ignore_escape_flag = 1;
-
/* Skip whitespace and \-newline. */
while (1) {
if (is_hor_space[*bp]) {
@@ -3733,7 +3737,6 @@
pedwarn ("`#' followed by integer");
after_ident = ident;
kt = line_directive_table;
- ignore_escape_flag = 0;
goto old_linenum;
}
@@ -3801,23 +3804,6 @@
break;
case '"':
- /* "..." is special for #include. */
- if (IS_INCLUDE_DIRECTIVE_TYPE (kt->type)) {
- while (bp < limit && *bp != '\n') {
- if (*bp == '"') {
- bp++;
- break;
- }
- if (*bp == '\\' && bp[1] == '\n') {
- ip->lineno++;
- copy_directive = 1;
- bp++;
- }
- bp++;
- }
- break;
- }
- /* Fall through. */
case '\'':
bp = skip_quoted_string (bp - 1, limit, ip->lineno, &ip->lineno, ©_directive, &unterminated);
/* Don't bother calling the directive if we already got an error
@@ -4378,6 +4364,13 @@
*fend = *fin++;
if (*fend == '"')
break;
+ if (*fend == '\\' && !bs_filenames) {
+ char *finc = (char *) fin - 1;
+ if (*fin == '\"' && pedantic)
+ pedwarn ("file name contains \"");
+ *fend = parse_escape (&finc, (HOST_WIDEST_INT) (U_CHAR) -1);
+ fin = (U_CHAR *) finc;
+ }
fend++;
}
if (fin == limit)
@@ -4385,10 +4378,11 @@
/* If not at the end, there had better be another string. */
/* Skip just horiz space, and don't go past limit. */
while (fin != limit && is_hor_space[*fin]) fin++;
- if (fin != limit && *fin == '\"')
- fin++;
- else
+ if (fin == limit || *fin != '\"')
goto fail;
+ if (pedantic)
+ pedwarn ("concatenated string literals in #include");
+ fin++;
}
}
@@ -6853,7 +6847,7 @@
return 0;
case '\\':
- if (! ignore_escape_flag)
+ if (! bs_filenames)
{
char *bpc = (char *) bp;
HOST_WIDEST_INT c = parse_escape (&bpc, (HOST_WIDEST_INT) (U_CHAR) (-1));
@@ -7439,21 +7433,6 @@
}
break;
case '\"':
- if (skipping_include_directive) {
- while (bp < endb && *bp != '\n') {
- if (*bp == '"') {
- bp++;
- break;
- }
- if (*bp == '\\' && bp[1] == '\n') {
- ip->lineno++;
- bp++;
- }
- bp++;
- }
- break;
- }
- /* Fall through. */
case '\'':
bp = skip_quoted_string (bp - 1, endb, ip->lineno, &ip->lineno,
NULL_PTR, NULL_PTR);
===================================================================
RCS file: cpp.texi,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- cpp.texi 1999/05/17 23:37:18 2.95
+++ cpp.texi 1999/08/24 00:05:53 2.95.1.1
@@ -333,13 +333,17 @@
that the current input file refers to. (If the @samp{-I-} option is
used, the special treatment of the current directory is inhibited.)
-The argument @var{file} may not contain @samp{"} characters. If
-backslashes occur within @var{file}, they are considered ordinary text
-characters, not escape characters. None of the character escape
-sequences appropriate to string constants in C are processed. Thus,
-@samp{#include "x\n\\y"} specifies a filename containing three
-backslashes. It is not clear why this behavior is ever useful, but
-the ANSI standard specifies it.
+The argument @var{file} is a string literal. Any backslashes within
+@var{file} are treated as normal string escape characters. Thus,
+@samp{#include "\n\\\""} specifies a file name containing a newline, a
+backslash, and a quote. This is an extension to the ANSI standard,
+which prohibits prohibits quotes in @var{file}, and which says that
+backslashes have implementation-defined behavior. @xref{Invocation},
+for how to disable escape processing in @var{file}.
+
+An @samp{#include} directive can use a concatenation of string literals.
+For example, @samp{#include "foo" ".h"} is equivalent to @samp{#include
+"foo.h"}. This is also an extension to ANSI C.
@item #include @var{anything else}
@cindex computed @samp{#include}
@@ -2422,7 +2426,8 @@
came originally from source file @var{filename} and its line number there
was @var{linenum}. Keep in mind that @var{filename} is not just a
file name; it is surrounded by doublequote characters so that it looks
-like a string constant.
+like a string constant. @xref{Invocation}, for how to disable escape
+processing in @var{filename}.
@item #line @var{anything else}
@var{anything else} is checked for macro calls, which are expanded.
@@ -2937,6 +2942,12 @@
The 199x C standard plus GNU extensions.
@end table
+@item -fbs-filenames
+@findex -fbs-filenames
+In the file name string literals of @samp{#include} and @samp{#line}
+directives, treat @samp{\} as an ordinary character instead of as the
+initial character of an escape sequence.
+
@item -Wp,-lint
@findex -lint
Look for commands to the program checker @code{lint} embedded in
===================================================================
RCS file: extend.texi,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- extend.texi 1999/04/14 05:34:34 2.95
+++ extend.texi 1999/08/24 00:05:53 2.95.1.1
@@ -48,6 +48,7 @@
* Function Attributes:: Declaring that functions have no side effects,
or that they can never return.
* Function Prototypes:: Prototype declarations and old-style definitions.
+* #include "string":: Include directives can have arbitrary string literals.
* C++ Comments:: C++ comments are recognized.
* Dollar Signs:: Dollar sign is allowed in identifiers.
* Character Escapes:: @samp{\e} stands for the character @key{ESC}.
@@ -96,6 +97,7 @@
* Function Attributes:: Declaring that functions have no side effects,
or that they can never return.
* Function Prototypes:: Prototype declarations and old-style definitions.
+* #include "string":: Include directives can have arbitrary string literals.
* C++ Comments:: C++ comments are recognized.
* Dollar Signs:: Dollar sign is allowed in identifiers.
* Character Escapes:: @samp{\e} stands for the character @key{ESC}.
@@ -1760,6 +1762,30 @@
GNU C++ does not support old-style function definitions, so this
extension is irrelevant.
+@node #include "string"
+@section #include "string"
+@cindex #include "string"
+@cindex include directives and string literals
+@cindex string literals in include directives
+
+In GNU C, @samp{#include} directives can contain arbitrary string
+literals. Any backslashes within @var{file} are treated as string
+escape characters. Thus, @samp{#include "\n\\\""} specifies a file name
+containing a newline, a backslash, and a quote. This extends ANSI C,
+which prohibits quotes in @var{file}, and which says that backslashes
+have implementation-defined behavior.
+
+Also, in GNU C, @samp{#include} directives can use a concatenation of
+string literals. For example, @samp{#include "foo" ".h"} is equivalent
+to @samp{#include "foo.h"}.
+
+Some non-GNU hosts treat @samp{\} as a file name directory separator.
+On such hosts you should normally use @samp{/} instead of @samp{\},
+since @samp{/} also works and is more portable to GNU hosts. However,
+if for some reason you cannot fix the backslashes in directives like
+@samp{#include "dir\foo.h"}, you can use the @samp{-fbs-filenames}
+option instead (@pxref{C Dialect Options}).
+
@node C++ Comments
@section C++ Style Comments
@cindex //
===================================================================
RCS file: gcc.c,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- gcc.c 1999/08/05 08:44:13 2.95
+++ gcc.c 1999/08/24 00:05:53 2.95.1.1
@@ -605,6 +605,7 @@
%{ffast-math:-D__FAST_MATH__}\
%{traditional} %{ftraditional:-traditional}\
%{traditional-cpp:-traditional}\
+ %{fbs-filenames} %{fno-bs-filenames}\
%{fleading-underscore} %{fno-leading-underscore}\
%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
%i %{E:%W{o*}}%{M:%W{o*}}%{MM:%W{o*}}\n}\
@@ -641,6 +642,7 @@
%{ffast-math:-D__FAST_MATH__}\
%{traditional} %{ftraditional:-traditional}\
%{traditional-cpp:-traditional}\
+ %{fbs-filenames} %{fno-bs-filenames}\
%{fleading-underscore} %{fno-leading-underscore}\
%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
%i %{!M:%{!MM:%{!E:%{!pipe:%g.i}}}}%{E:%W{o*}}%{M:%W{o*}}%{MM:%W{o*}} |\n",
@@ -669,6 +671,7 @@
%{ffast-math:-D__FAST_MATH__}\
%{traditional} %{ftraditional:-traditional}\
%{traditional-cpp:-traditional}\
+ %{fbs-filenames} %{fno-bs-filenames}\
%{fleading-underscore} %{fno-leading-underscore}\
%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
%i %W{o*}}\
@@ -686,6 +689,7 @@
%{ffast-math:-D__FAST_MATH__}\
%{traditional} %{ftraditional:-traditional}\
%{traditional-cpp:-traditional}\
+ %{fbs-filenames} %{fno-bs-filenames}\
%{fleading-underscore} %{fno-leading-underscore}\
%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
%i %W{o*}"}},
@@ -715,6 +719,7 @@
%{ffast-math:-D__FAST_MATH__}\
%{traditional} %{ftraditional:-traditional}\
%{traditional-cpp:-traditional}\
+ %{fbs-filenames} %{fno-bs-filenames}\
%{fleading-underscore} %{fno-leading-underscore}\
%{g*} %{W*} %{w} %{pedantic*} %{H} %{d*} %C %{D*} %{U*} %{i*} %Z\
%i %{!M:%{!MM:%{!E:%{!pipe:%g.s}}}}%{E:%W{o*}}%{M:%W{o*}}%{MM:%W{o*}} |\n",
===================================================================
RCS file: invoke.texi,v
retrieving revision 2.95
retrieving revision 2.95.1.1
diff -u -r2.95 -r2.95.1.1
--- invoke.texi 1999/08/11 06:50:31 2.95
+++ invoke.texi 1999/08/24 00:05:53 2.95.1.1
@@ -93,7 +93,8 @@
@item C Language Options
@xref{C Dialect Options,,Options Controlling C Dialect}.
@smallexample
--ansi -flang-isoc9x -fallow-single-precision -fcond-mismatch -fno-asm
+-ansi -flang-isoc9x -fallow-single-precision
+-fno-asm -fbs-filenames -fcond-mismatch
-fno-builtin -ffreestanding -fhosted -fsigned-bitfields -fsigned-char
-funsigned-bitfields -funsigned-char -fwritable-strings
-traditional -traditional-cpp -trigraphs
@@ -851,6 +852,11 @@
string constants can contain the newline character as typed.)
@end itemize
+@item -fbs-filenames
+In the file name string literals of @samp{#include} and @samp{#line}
+directives, treat @samp{\} as an ordinary character instead of as the
+initial character of an escape sequence.
+
@item -fcond-mismatch
Allow conditional expressions with mismatched types in the second and
third arguments. The value of such an expression is void.