This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Patch to fix handling of incomplete header names


The C standard defines how a source file is lexed into preprocessing
tokens by a greedy algorithm; there is no such thing as a partial
preprocessing token.  Unmatched quotes yield undefined behavior
(because a single ' or " character as a token is specified as doing
so) so actually implementing the lexing to make ' or " a single token
isn't needed, but there is no such laxity in header names <... (with
missing >).  Zack said in a previous discussion on comp.std.c
<http://groups.google.com/groups?selm=9csqt9%24omd%241%40nntp.Stanford.EDU>
that "major architectural changes" would be needed for this to work.
This no longer seems to be the case; the following simple patch
implements the required semantics.

Bootstrapped with no regressions on i686-pc-linux-gnu.  OK to commit
to mainline and 3.4 branch (first two cases in the testcase are
regressions from 2.95)?

2004-02-05  Joseph S. Myers  <jsm@polyomino.org.uk>

	* cpplex.c (lex_string): Return a CPP_LESS token for missing '>'
	in a header name.
	(_cpp_lex_direct): Handle this.

testsuite:
2004-02-05  Joseph S. Myers  <jsm@polyomino.org.uk>

	* gcc.dg/cpp/include4.c: New test.

--- GCC/gcc/cpplex.c.orig	2003-11-02 10:06:48.000000000 +0000
+++ GCC/gcc/cpplex.c	2004-02-05 00:46:44.000000000 +0000
@@ -549,7 +549,8 @@ create_literal (cpp_reader *pfile, cpp_t
 /* Lexes a string, character constant, or angle-bracketed header file
    name.  The stored string contains the spelling, including opening
    quote and leading any leading 'L'.  It returns the type of the
-   literal, or CPP_OTHER if it was not properly terminated.
+   literal, or CPP_OTHER if it was not properly terminated, or CPP_LESS
+   for an unterminated header name which must be relexed as normal tokens.
 
    The spelling is NUL-terminated, but it is not guaranteed that this
    is the first NUL since embedded NULs are preserved.  */
@@ -584,6 +585,14 @@ lex_string (cpp_reader *pfile, cpp_token
       else if (c == '\n')
 	{
 	  cur--;
+	  /* Unmatched quotes always yield undefined behavior, but
+	     greedy lexing means that what appears to be an unterminated
+	     header name may actually be a legitimate sequence of tokens.  */
+	  if (terminator == '>')
+	    {
+	      token->type = CPP_LESS;
+	      return;
+	    }
 	  type = CPP_OTHER;
 	  break;
 	}
@@ -959,7 +968,8 @@ _cpp_lex_direct (cpp_reader *pfile)
       if (pfile->state.angled_headers)
 	{
 	  lex_string (pfile, result, buffer->cur - 1);
-	  break;
+	  if (result->type != CPP_LESS)
+	    break;
 	}
 
       result->type = CPP_LESS;
--- GCC/gcc/testsuite/gcc.dg/cpp/include4.c	2002-08-26 16:21:36.000000000 +0000
+++ GCC/gcc/testsuite/gcc.dg/cpp/include4.c	2004-02-05 01:03:02.000000000 +0000
@@ -0,0 +1,14 @@
+/* There is no such thing as an incomplete preprocessing token,
+   so "#include <stddef.h" must be interpreted as a sequence of tokens,
+   of which the "h" then gets macro expanded.  Likewise the other
+   examples.  */
+
+#define h h>
+#include <stddef.h
+#undef h
+
+#define foo stddef.h>
+#include <foo
+
+#include <foo /*
+> */

-- 
Joseph S. Myers
jsm@polyomino.org.uk


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]