Fix handling of incomplete header names (committed)

Joseph S. Myers joseph@codesourcery.com
Sun Feb 22 01:09:00 GMT 2009


I previously posted essentially this patch five years ago
<http://article.gmane.org/gmane.comp.gcc.patches/54558> to fix a
(regression) bug in a corner case of lexing (see that message for more
details).

I still think this is a bug in GCC not the standard, and when the
committee since then had the opportunity to address the lexing rules
with DR#324 they did not change the greedy algorithm.  As C and C++
front-end maintainers are now preprocessor maintainers and no current
preprocessor maintainer objected in the previous discussion, I have
now committed this updated patch.  Bootstrapped with no regressions on
i686-pc-linux-gnu.

libcpp:
2009-02-21  Joseph Myers  <joseph@codesourcery.com>

	* lex.c (lex_string): Return a CPP_LESS token for missing '>' in a
	header name.
	(_cpp_lex_direct): Handle this.

gcc/testsuite:
2009-02-21  Joseph Myers  <joseph@codesourcery.com>

	* gcc.dg/cpp/include4.c: New test.

Index: gcc/testsuite/gcc.dg/cpp/include4.c
===================================================================
--- gcc/testsuite/gcc.dg/cpp/include4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/cpp/include4.c	(revision 0)
@@ -0,0 +1,14 @@
+/* Preprocessing tokens are always formed according to a greedy algorithm,
+   so "#include <stddef.h" must be interpreted as a sequence of tokens,
+   of which the "h" then gets macro expanded.  Likewise the other
+   examples.  */
+
+#define h h>
+#include <stddef.h
+#undef h
+
+#define foo stddef.h>
+#include <foo
+
+#include <foo /*
+> */
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 144344)
+++ libcpp/lex.c	(working copy)
@@ -613,7 +613,9 @@ create_literal (cpp_reader *pfile, cpp_t
 /* Lexes a string, character constant, or angle-bracketed header file
    name.  The stored string contains the spelling, including opening
    quote and leading any leading 'L', 'u' or 'U'.  It returns the type
-   of the literal, or CPP_OTHER if it was not properly terminated.
+   of the literal, or CPP_OTHER if it was not properly terminated, or
+   CPP_LESS for an unterminated header name which must be relexed as
+   normal tokens.
 
    The spelling is NUL-terminated, but it is not guaranteed that this
    is the first NUL since embedded NULs are preserved.  */
@@ -652,6 +654,14 @@ lex_string (cpp_reader *pfile, cpp_token
       else if (c == '\n')
 	{
 	  cur--;
+	  /* Unmatched quotes always yield undefined behavior, but
+	     greedy lexing means that what appears to be an unterminated
+	     header name may actually be a legitimate sequence of tokens.  */
+	  if (terminator == '>')
+	    {
+	      token->type = CPP_LESS;
+	      return;
+	    }
 	  type = CPP_OTHER;
 	  break;
 	}
@@ -1181,7 +1191,8 @@ _cpp_lex_direct (cpp_reader *pfile)
       if (pfile->state.angled_headers)
 	{
 	  lex_string (pfile, result, buffer->cur - 1);
-	  break;
+	  if (result->type != CPP_LESS)
+	    break;
 	}
 
       result->type = CPP_LESS;

-- 
Joseph S. Myers
joseph@codesourcery.com



More information about the Gcc-patches mailing list