This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Patch to fix handling of incomplete header names
- From: "Joseph S. Myers" <jsm at polyomino dot org dot uk>
- To: gcc-patches at gcc dot gnu dot org
- Date: Thu, 5 Feb 2004 20:42:42 +0000 (UTC)
- Subject: Patch to fix handling of incomplete header names
The C standard defines how a source file is lexed into preprocessing
tokens by a greedy algorithm; there is no such thing as a partial
preprocessing token. Unmatched quotes yield undefined behavior
(because a single ' or " character as a token is specified as doing
so) so actually implementing the lexing to make ' or " a single token
isn't needed, but there is no such laxity in header names <... (with
missing >). Zack said in a previous discussion on comp.std.c
<http://groups.google.com/groups?selm=9csqt9%24omd%241%40nntp.Stanford.EDU>
that "major architectural changes" would be needed for this to work.
This no longer seems to be the case; the following simple patch
implements the required semantics.
Bootstrapped with no regressions on i686-pc-linux-gnu. OK to commit
to mainline and 3.4 branch (first two cases in the testcase are
regressions from 2.95)?
2004-02-05 Joseph S. Myers <jsm@polyomino.org.uk>
* cpplex.c (lex_string): Return a CPP_LESS token for missing '>'
in a header name.
(_cpp_lex_direct): Handle this.
testsuite:
2004-02-05 Joseph S. Myers <jsm@polyomino.org.uk>
* gcc.dg/cpp/include4.c: New test.
--- GCC/gcc/cpplex.c.orig 2003-11-02 10:06:48.000000000 +0000
+++ GCC/gcc/cpplex.c 2004-02-05 00:46:44.000000000 +0000
@@ -549,7 +549,8 @@ create_literal (cpp_reader *pfile, cpp_t
/* Lexes a string, character constant, or angle-bracketed header file
name. The stored string contains the spelling, including opening
quote and leading any leading 'L'. It returns the type of the
- literal, or CPP_OTHER if it was not properly terminated.
+ literal, or CPP_OTHER if it was not properly terminated, or CPP_LESS
+ for an unterminated header name which must be relexed as normal tokens.
The spelling is NUL-terminated, but it is not guaranteed that this
is the first NUL since embedded NULs are preserved. */
@@ -584,6 +585,14 @@ lex_string (cpp_reader *pfile, cpp_token
else if (c == '\n')
{
cur--;
+ /* Unmatched quotes always yield undefined behavior, but
+ greedy lexing means that what appears to be an unterminated
+ header name may actually be a legitimate sequence of tokens. */
+ if (terminator == '>')
+ {
+ token->type = CPP_LESS;
+ return;
+ }
type = CPP_OTHER;
break;
}
@@ -959,7 +968,8 @@ _cpp_lex_direct (cpp_reader *pfile)
if (pfile->state.angled_headers)
{
lex_string (pfile, result, buffer->cur - 1);
- break;
+ if (result->type != CPP_LESS)
+ break;
}
result->type = CPP_LESS;
--- GCC/gcc/testsuite/gcc.dg/cpp/include4.c 2002-08-26 16:21:36.000000000 +0000
+++ GCC/gcc/testsuite/gcc.dg/cpp/include4.c 2004-02-05 01:03:02.000000000 +0000
@@ -0,0 +1,14 @@
+/* There is no such thing as an incomplete preprocessing token,
+ so "#include <stddef.h" must be interpreted as a sequence of tokens,
+ of which the "h" then gets macro expanded. Likewise the other
+ examples. */
+
+#define h h>
+#include <stddef.h
+#undef h
+
+#define foo stddef.h>
+#include <foo
+
+#include <foo /*
+> */
--
Joseph S. Myers
jsm@polyomino.org.uk