This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Attempt to fix #17964 - need advice wrt diagnostic side-effects


Here is an attempt to fix PR 17964, which reports that, in C++,
cpplib's diagnostics about bad escapes (or whatever) in string
constants are tagged with the wrong line.  The fix is conceptually
straightforward, but lengthy in execution, and has consequences which
I need advice on.

The problem is simply that cpp_interpret_string() and its subroutines
have been using the default "current input location" for all their
diagnostics.  In C++, the entire file is scanned before parsing, and
strings are only interpreted during parsing, so cpplib's idea of the
current input location is way off.  We cannot resync cpplib's idea of
the current input location with the C++ front end's, because cpplib
always operates with mapped locations, and if USE_MAPPED_LOCATION is
false in the front end, the conversion from 'source_location' to
'location_t' is irreversible.

So the fix is to make cpp_interpret_string grow an explicit argument
specifying the source location of the string, and pass that down from
all callers.  This is straightforward everywhere but the C++ front
end, where we have also to save source locations as source_locations
rather than location_ts, because of the aforementioned impossibility
of converting from the latter to the former.  This is a desirable
change in itself, as it saves considerable memory (one pointer per
cp_token structure, of which there are thousands).  It does mean
cp/parser.c has to duplicate more code from c-lex.c, but that problem
will go away once Joseph's C parser rewrite lands.

Now for the consequences.  cpplib has always marked tokens from macro
expansion (not from arguments) with the location of the macro
*definition*.  Using these locations, instead of the flattened
locations that one gets from the line_change callback, means that
diagnostics triggered inside macros will indicate the location of the
macro definition.  For example (see gcc.dg/cpp/charconst-4.c for a
more complete version of this test):

/* { dg-do preprocess } */
/* { dg-options "-Wno-multichar" } */

#define TOO_LONG 'abcde'  // { dg-warning "too long" }
#if TOO_LONG != 'bcde'    // no diagnostic on this line
#error
#endif

In the C front end this change only affects cpp_interpret_string
diagnostics, because we are only using source_locations directly for
those.  However, in the C++ front end we are now preserving
source_locations for *all* tokens, and setting input_location from
each one of them.  That means *all* diagnostics triggered by text from
macro expansions will now indicate the location of the macro
definition, which is not always the right thing.  For instance,
g++.old-deja/g++.other/null1.C and g++.old-deja/g++.other/vaarg3.C now
issue all their diagnostics at the definitions of NULL and va_arg
respectively, which is useless to the user.

I'd appreciate any advice as to how to solve this problem.  I do think
that in the general case it is more useful for diagnostics triggered
inside macro definitions to indicate the line of the macro definition,
particularly for complicated macros.  (This was in fact one of my
original, long-postponed goals for integrated preprocessing.)  And I
would like not to make this patch much more complicated than it
already is.  However, in the case of null1.C and vaarg3.C the
diagnostics were clearly in the right place before and are no longer.

zw

===================================================================
Index: gcc/c-common.c
--- gcc/c-common.c	29 Jan 2005 16:12:33 -0000	1.602
+++ gcc/c-common.c	10 Feb 2005 00:29:26 -0000
@@ -736,24 +736,9 @@ fname_as_string (int pretty_p)
   if (current_function_decl)
     name = lang_hooks.decl_printable_name (current_function_decl, vrb);
 
-  if (c_lex_string_translate)
-    {
-      int len = strlen (name) + 3; /* Two for '"'s.  One for NULL.  */
-      cpp_string cstr = { 0, 0 }, strname;
-
-      namep = XNEWVEC (char, len);
-      snprintf (namep, len, "\"%s\"", name);
-      strname.text = (unsigned char *) namep;
-      strname.len = len - 1;
-
-      if (cpp_interpret_string (parse_in, &strname, 1, &cstr, false))
-	{
-	  XDELETEVEC (namep);
-	  return (char *) cstr.text;
-	}
-    }
-  else
-    namep = xstrdup (name);
+  namep = cpp_convert_to_exec_charset (parse_in, name, /*wide=*/false);
+  if (!namep)
+    abort ();
 
   return namep;
 }
===================================================================
Index: gcc/c-lex.c
--- gcc/c-lex.c	27 Oct 2004 17:24:20 -0000	1.242
+++ gcc/c-lex.c	10 Feb 2005 00:29:26 -0000
@@ -60,22 +60,17 @@ int c_header_level;	 /* depth in C heade
    to the untranslated one.  */
 int c_lex_string_translate = 1;
 
-/* True if strings should be passed to the caller of c_lex completely
-   unmolested (no concatenation, no translation).  */
-bool c_lex_return_raw_strings = false;
-
 static tree interpret_integer (const cpp_token *, unsigned int);
 static tree interpret_float (const cpp_token *, unsigned int);
 static enum integer_type_kind narrowest_unsigned_type
 	(unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned int);
 static enum integer_type_kind narrowest_signed_type
 	(unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned int);
-static enum cpp_ttype lex_string (const cpp_token *, tree *, bool);
-static tree lex_charconst (const cpp_token *);
+static enum cpp_ttype c_interpret_string (const cpp_token *, tree *, bool);
 static void update_header_times (const char *);
 static int dump_one_header (splay_tree_node, void *);
 static void cb_line_change (cpp_reader *, const cpp_token *, int);
-static void cb_ident (cpp_reader *, unsigned int, const cpp_string *);
+static void cb_ident (cpp_reader *, const cpp_token *);
 static void cb_def_pragma (cpp_reader *, unsigned int);
 static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
 static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
@@ -180,21 +175,20 @@ dump_time_statistics (void)
 
 static void
 cb_ident (cpp_reader * ARG_UNUSED (pfile),
-	  unsigned int ARG_UNUSED (line),
-	  const cpp_string * ARG_UNUSED (str))
+	  const cpp_token * ARG_UNUSED (token))
 {
-#ifdef ASM_OUTPUT_IDENT
-  if (!flag_no_ident)
+  /* Convert escapes in the string.  Do this unconditionally, so we get
+     all appropriate diagnostics.  */
+  cpp_string cstr = { 0, 0 };
+  if (cpp_interpret_string (pfile, &token->val.str, 1, &cstr, false,
+			    token->src_loc))
     {
-      /* Convert escapes in the string.  */
-      cpp_string cstr = { 0, 0 };
-      if (cpp_interpret_string (pfile, str, 1, &cstr, false))
-	{
-	  ASM_OUTPUT_IDENT (asm_out_file, (const char *) cstr.text);
-	  free ((void *) cstr.text);
-	}
-    }
+#ifdef ASM_OUTPUT_IDENT
+      if (!flag_no_ident)
+	ASM_OUTPUT_IDENT (asm_out_file, (const char *) cstr.text);
 #endif
+      free ((void *) cstr.text);
+    }
 }
 
 /* Called at the start of every non-empty line.  TOKEN is the first
@@ -354,28 +348,21 @@ c_lex_with_flags (tree *value, unsigned 
       break;
 
     case CPP_NUMBER:
-      {
-	unsigned int flags = cpp_classify_number (parse_in, tok);
+      *value = c_interpret_number (tok);
+      break;
 
-	switch (flags & CPP_N_CATEGORY)
-	  {
-	  case CPP_N_INVALID:
-	    /* cpplib has issued an error.  */
-	    *value = error_mark_node;
-	    break;
-
-	  case CPP_N_INTEGER:
-	    *value = interpret_integer (tok, flags);
-	    break;
-
-	  case CPP_N_FLOATING:
-	    *value = interpret_float (tok, flags);
-	    break;
-
-	  default:
-	    gcc_unreachable ();
-	  }
-      }
+    case CPP_CHAR:
+    case CPP_WCHAR:
+      *value = c_interpret_charconst (tok);
+      break;
+
+    case CPP_STRING:
+    case CPP_WSTRING:
+      type = c_interpret_string (tok, value, false);
+      break;
+
+    case CPP_PRAGMA:
+      *value = build_string (tok->val.str.len, (char *) tok->val.str.text);
       break;
 
     case CPP_ATSIGN:
@@ -394,7 +381,7 @@ c_lex_with_flags (tree *value, unsigned 
 	      
 	    case CPP_STRING:
 	    case CPP_WSTRING:
-	      type = lex_string (tok, value, true);
+	      type = c_interpret_string (tok, value, true);
 	      break;
 
 	    case CPP_NAME:
@@ -440,25 +427,6 @@ c_lex_with_flags (tree *value, unsigned 
       }
       goto retry;
 
-    case CPP_CHAR:
-    case CPP_WCHAR:
-      *value = lex_charconst (tok);
-      break;
-
-    case CPP_STRING:
-    case CPP_WSTRING:
-      if (!c_lex_return_raw_strings)
-	{
-	  type = lex_string (tok, value, false);
-	  break;
-	}
-      
-      /* FALLTHROUGH */
-
-    case CPP_PRAGMA:
-      *value = build_string (tok->val.str.len, (char *) tok->val.str.text);
-      break;
-
       /* These tokens should not be visible outside cpplib.  */
     case CPP_HEADER_NAME:
     case CPP_COMMENT:
@@ -489,6 +457,10 @@ c_lex (tree *value)
 {
   return c_lex_with_flags (value, NULL);
 }
+
+/* Token value interpretation routines.  The exported ones are used by
+   the C++ parser, which cannot use c_lex(_with_flags) -- see cp/parser.c
+   for details.  */
 
 /* Returns the narrowest C-visible unsigned type, starting with the
    minimum specified by FLAGS, that can fit HIGH:LOW, or itk_none if
@@ -691,6 +663,31 @@ interpret_float (const cpp_token *token,
   return value;
 }
 
+/* Interpret TOK, a preprocessing number.  Returns a tree representing
+   the number, or error_mark_node for a syntactically invalid number.  */
+tree
+c_interpret_number (const cpp_token *tok)
+{
+  unsigned int flags = cpp_classify_number (parse_in, tok);
+
+  switch (flags & CPP_N_CATEGORY)
+    {
+    case CPP_N_INVALID:
+      /* cpplib has issued an error.  */
+      return error_mark_node;
+
+    case CPP_N_INTEGER:
+      return interpret_integer (tok, flags);
+
+    case CPP_N_FLOATING:
+      return interpret_float (tok, flags);
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+
 /* Convert a series of STRING and/or WSTRING tokens into a tree,
    performing string constant concatenation.  TOK is the first of
    these.  VALP is the location to write the string into.  OBJC_STRING
@@ -705,16 +702,20 @@ interpret_float (const cpp_token *token,
    sequences do not continue across the boundary between two strings in
    a series (6.4.5p7), so we must not lose the boundaries.  Therefore
    cpp_interpret_string takes a vector of cpp_string structures, which
-   we must arrange to provide.  */
+   we must arrange to provide.
+
+   FIXME: This routine is not usable by the C++ front end, due to its
+   decidedly different token-buffering scheme.  */
 
 static enum cpp_ttype
-lex_string (const cpp_token *tok, tree *valp, bool objc_string)
+c_interpret_string (const cpp_token *tok, tree *valp, bool objc_string)
 {
   tree value;
   bool wide = false;
   size_t concats = 0;
   struct obstack str_ob;
   cpp_string istr;
+  source_location loc_first = tok->src_loc;
 
   /* Try to avoid the overhead of creating and destroying an obstack
      for the common case of just one string.  */
@@ -767,7 +768,7 @@ lex_string (const cpp_token *tok, tree *
 
   if ((c_lex_string_translate
        ? cpp_interpret_string : cpp_interpret_string_notranslate)
-      (parse_in, strs, concats + 1, &istr, wide))
+      (parse_in, strs, concats + 1, &istr, wide, loc_first))
     {
       value = build_string (istr.len, (char *) istr.text);
       free ((void *) istr.text);
@@ -776,7 +777,8 @@ lex_string (const cpp_token *tok, tree *
 	{
 	  int xlated = cpp_interpret_string_notranslate (parse_in, strs,
 							 concats + 1,
-							 &istr, wide);
+							 &istr, wide,
+							 loc_first);
 	  /* Assume that, if we managed to translate the string above,
 	     then the untranslated parsing will always succeed.  */
 	  gcc_assert (xlated);
@@ -817,9 +819,10 @@ lex_string (const cpp_token *tok, tree *
   return objc_string ? CPP_OBJC_STRING : wide ? CPP_WSTRING : CPP_STRING;
 }
 
-/* Converts a (possibly wide) character constant token into a tree.  */
-static tree
-lex_charconst (const cpp_token *token)
+/* Convert TOKEN, a (possibly wide) character constant token, into
+   a tree constant.  */
+tree
+c_interpret_charconst (const cpp_token *token)
 {
   cppchar_t result;
   tree type, value;
===================================================================
Index: gcc/c-ppoutput.c
--- gcc/c-ppoutput.c	28 Nov 2004 23:29:41 -0000	1.24
+++ gcc/c-ppoutput.c	10 Feb 2005 00:29:26 -0000
@@ -55,7 +55,7 @@ static void cb_define (cpp_reader *, sou
 static void cb_undef (cpp_reader *, source_location, cpp_hashnode *);
 static void cb_include (cpp_reader *, source_location, const unsigned char *,
 			const char *, int);
-static void cb_ident (cpp_reader *, source_location, const cpp_string *);
+static void cb_ident (cpp_reader *, const cpp_token *);
 static void cb_def_pragma (cpp_reader *, source_location);
 static void cb_read_pch (cpp_reader *pfile, const char *name,
 			 int fd, const char *orig_name);
@@ -300,11 +300,10 @@ cb_line_change (cpp_reader *pfile, const
 }
 
 static void
-cb_ident (cpp_reader *pfile ATTRIBUTE_UNUSED, source_location line,
-	  const cpp_string *str)
+cb_ident (cpp_reader *pfile ATTRIBUTE_UNUSED, const cpp_token *tok)
 {
-  maybe_print_line (line);
-  fprintf (print.outf, "#ident %s\n", str->text);
+  maybe_print_line (tok->src_loc);
+  fprintf (print.outf, "#ident %s\n", tok->val.str.text);
   print.src_line++;
 }
 
===================================================================
Index: gcc/c-pragma.h
--- gcc/c-pragma.h	29 Nov 2004 18:53:54 -0000	1.45
+++ gcc/c-pragma.h	10 Feb 2005 00:29:26 -0000
@@ -67,14 +67,13 @@ extern void add_to_renaming_pragma_list 
 extern enum cpp_ttype c_lex (tree *);
 extern enum cpp_ttype c_lex_with_flags (tree *, unsigned char *);
 
+extern tree c_interpret_number (const cpp_token *);
+extern tree c_interpret_charconst (const cpp_token *);
+
 /* If 1, then lex strings into the execution character set.  
    If 0, lex strings into the host character set.
    If -1, lex both, and chain them together, such that the former
    is the TREE_CHAIN of the latter.  */
 extern int c_lex_string_translate;
 
-/* If true, strings should be passed to the caller of c_lex completely
-   unmolested (no concatenation, no translation).  */
-extern bool c_lex_return_raw_strings;
-
 #endif /* GCC_C_PRAGMA_H */
===================================================================
Index: gcc/cp/parser.c
--- gcc/cp/parser.c	9 Feb 2005 02:53:38 -0000	1.313
+++ gcc/cp/parser.c	10 Feb 2005 00:29:55 -0000
@@ -61,7 +61,7 @@ typedef struct cp_token GTY (())
   /* The value associated with this token, if any.  */
   tree value;
   /* The location at which this token was found.  */
-  location_t location;
+  source_location location;
 } cp_token;
 
 /* We use a stack of token pointer for saving token sets.  */
@@ -70,12 +70,7 @@ DEF_VEC_MALLOC_P (cp_token_position);
 
 static const cp_token eof_token =
 {
-  CPP_EOF, RID_MAX, 0, 0, 0, NULL_TREE,
-#if USE_MAPPED_LOCATION
-  0
-#else
-  {0, 0}
-#endif
+  CPP_EOF, RID_MAX, 0, 0, 0, NULL_TREE, 0
 };
 
 /* The cp_lexer structure represents the C++ lexer.  It is responsible
@@ -248,9 +243,6 @@ cp_lexer_new_main (void)
   /* Tell cpplib we want CPP_PRAGMA tokens.  */
   cpp_get_options (parse_in)->defer_pragmas = true;
 
-  /* Tell c_lex not to merge string constants.  */
-  c_lex_return_raw_strings = true;
-
   c_common_no_more_pch ();
 
   /* Allocate the memory.  */
@@ -289,11 +281,6 @@ cp_lexer_new_main (void)
   lexer->last_token = pos;
   lexer->next_token = lexer->buffer_length ? buffer : (cp_token *)&eof_token;
 
-  /* Pragma processing (via cpp_handle_deferred_pragma) may result in
-     direct calls to c_lex.  Those callers all expect c_lex to do
-     string constant concatenation.  */
-  c_lex_return_raw_strings = false;
-
   gcc_assert (lexer->next_token->type != CPP_PURGED);
   return lexer;
 }
@@ -371,19 +358,30 @@ cp_lexer_saving_tokens (const cp_lexer* 
 }
 
 /* Store the next token from the preprocessor in *TOKEN.  Return true
-   if we reach EOF.  */
+   if we reach EOF.
+
+   This routine does not use c_lex_with_flags because we need to store
+   cpplib's source_location values in order to feed them back to
+   cpp_interpret_string later.  In !USE_MAPPED_LOCATION mode, the
+   conversion from source_location to location_t values is
+   irreversible.  */
 
 static void
-cp_lexer_get_preprocessor_token (cp_lexer *lexer ATTRIBUTE_UNUSED ,
-                                 cp_token *token)
+cp_lexer_get_preprocessor_token (cp_lexer *ARG_UNUSED (lexer),
+				 cp_token *token)
 {
   static int is_extern_c = 0;
+  const cpp_token *ctoken;
 
-   /* Get a new token from the preprocessor.  */
-  token->type = c_lex_with_flags (&token->value, &token->flags);
-  token->location = input_location;
-  token->in_system_header = in_system_header;
+ retry:
+  ctoken = cpp_get_token (parse_in);
+  token->type     = ctoken->type;
+  token->flags    = ctoken->flags;
+  token->location = ctoken->src_loc;
+  token->keyword  = RID_MAX;
+  token->value    = 0;
 
+  token->in_system_header = in_system_header;
   /* On some systems, some header files are surrounded by an 
      implicit extern "C" block.  Set a flag in the token if it
      comes from such a header.  */
@@ -391,22 +389,112 @@ cp_lexer_get_preprocessor_token (cp_lexe
   pending_lang_change = 0;
   token->implicit_extern_c = is_extern_c > 0;
 
-  /* Check to see if this token is a keyword.  */
-  if (token->type == CPP_NAME
-      && C_IS_RESERVED_WORD (token->value))
+  switch (ctoken->type)
     {
-      /* Mark this token as a keyword.  */
-      token->type = CPP_KEYWORD;
-      /* Record which keyword.  */
-      token->keyword = C_RID_CODE (token->value);
-      /* Update the value.  Some keywords are mapped to particular
-	 entities, rather than simply having the value of the
-	 corresponding IDENTIFIER_NODE.  For example, `__const' is
-	 mapped to `const'.  */
-      token->value = ridpointers[token->keyword];
+    case CPP_PADDING:
+      /* Padding tokens can just be discarded.  */
+      goto retry;
+
+    case CPP_NAME:
+      /* Identifier.  Convert cpplib's version of the symbol structure
+	 to the front end's version, and check for a keyword.  */
+      token->value = HT_IDENT_TO_GCC_IDENT (HT_NODE (ctoken->val.node));
+      if (C_IS_RESERVED_WORD (token->value))
+	{
+	  token->type = CPP_KEYWORD;
+	  token->keyword = C_RID_CODE (token->value);
+	  /* Update the value.  Some keywords are mapped to particular
+	     entities, rather than simply having the value of the
+	     corresponding IDENTIFIER_NODE.  For example, `__const' is
+	     mapped to `const'.  */
+	  token->value = ridpointers[token->keyword];
+	}
+      else
+	token->keyword = RID_MAX;
+      break;
+
+    case CPP_NUMBER:
+      token->value = c_interpret_number (ctoken);
+      break;
+
+    case CPP_CHAR:
+    case CPP_WCHAR:
+      token->value = c_interpret_charconst (ctoken);
+      break;
+
+    case CPP_STRING:
+    case CPP_WSTRING:
+    case CPP_PRAGMA:
+      /* String and pragma processing is deferred until we know the
+	 context.  Pragmas look like strings at this stage.
+
+	 FIXME: Do not build an entire STRING_CST node at this point,
+	 it's guaranteed to be thrown away and reconstructed later.
+	 Ideal would be to have a global vector of cpp_string
+	 structures, and have the token value point into that vector.
+	 This would not only avoid the throwaway tree nodes, it would
+	 eliminate the need for scratch memory in
+	 cp_parser_string_literal.  */
+      token->value = build_string (ctoken->val.str.len,
+				   (char *) ctoken->val.str.text);
+      break;
+
+    case CPP_ATSIGN:
+      /* In Objective-C++, an @ may give the next token special
+	 significance.  FIXME: not yet implemented; fall through
+	 to erroneous token cases.  */
+
+    case CPP_HASH:
+    case CPP_PASTE:
+      /* These are preprocessor operators, and should never appear in
+	 running text.  */
+      {
+	unsigned char name[4];
+	*cpp_spell_token (parse_in, ctoken, name) = 0;
+	error ("stray %qs in program", name);
+      }
+      goto retry;
+
+    case CPP_OTHER:
+      /* Either an un-terminated string constant, or a character which
+	 doesn't correspond to a normal C token.  Always an error.  */
+      {
+	int c = ctoken->val.str.text[0];
+	if (c == '"' || c == '\'')
+	  error ("missing terminating %c character", c);
+	else if (ISGRAPH (c))
+	  error ("stray %qc in program", c);
+	else
+	  error ("stray %<\\%o%> in program", c);
+      }
+      goto retry;
+
+    case CPP_HEADER_NAME:
+    case CPP_COMMENT:
+    case CPP_MACRO_ARG:
+      /* These tokens should not be visible outside cpplib.  */
+      gcc_unreachable ();
+
+    default:
+      /* All other tokens are punctuation, and are entirely defined by
+	 their token type.  */
+      break;
     }
-  else
-    token->keyword = RID_MAX;
+}
+
+/* Helper function: convert a source_location to a location_t.  */
+static inline location_t
+cp_lexer_get_location_t (source_location cpp_loc)
+{
+#ifdef USE_MAPPED_LOCATION
+  return cpp_loc;
+#else
+  location_t fe_loc;
+  const struct line_map *map = linemap_lookup (&line_table, cpp_loc);
+  fe_loc.file = map->to_file;
+  fe_loc.line = SOURCE_LINE (map, cpp_loc);
+  return fe_loc;
+#endif
 }
 
 /* Update the globals input_location and in_system_header from TOKEN.  */
@@ -415,7 +503,7 @@ cp_lexer_set_source_position_from_token 
 {
   if (token->type != CPP_EOF)
     {
-      input_location = token->location;
+      input_location = cp_lexer_get_location_t (token->location);
       in_system_header = token->in_system_header;
     }
 }
@@ -552,7 +640,7 @@ cp_lexer_purge_token (cp_lexer *lexer)
   
   gcc_assert (tok != &eof_token);
   tok->type = CPP_PURGED;
-  tok->location = UNKNOWN_LOCATION;
+  tok->location = 0;
   tok->value = NULL_TREE;
   tok->keyword = RID_MAX;
 
@@ -586,7 +674,7 @@ cp_lexer_purge_tokens_after (cp_lexer *l
   for ( tok += 1; tok != peek; tok += 1)
     {
       tok->type = CPP_PURGED;
-      tok->location = UNKNOWN_LOCATION;
+      tok->location = 0;
       tok->value = NULL_TREE;
       tok->keyword = RID_MAX;
     }
@@ -2478,6 +2566,7 @@ cp_parser_string_literal (cp_parser *par
   struct obstack str_ob;
   cpp_string str, istr, *strs;
   cp_token *tok;
+  source_location loc_first;
 
   tok = cp_lexer_peek_token (parser->lexer);
   if (!cp_parser_is_string_literal (tok))
@@ -2486,6 +2575,8 @@ cp_parser_string_literal (cp_parser *par
       return error_mark_node;
     }
 
+  loc_first = tok->location;
+
   /* Try to avoid the overhead of creating and destroying an obstack
      for the common case of just one string.  */
   if (!cp_parser_is_string_literal
@@ -2531,7 +2622,7 @@ cp_parser_string_literal (cp_parser *par
     }
 
   if ((translate ? cpp_interpret_string : cpp_interpret_string_notranslate)
-      (parse_in, strs, count, &istr, wide))
+      (parse_in, strs, count, &istr, wide, loc_first))
     {
       value = build_string (istr.len, (char *)istr.text);
       free ((void *)istr.text);
@@ -5879,7 +5970,7 @@ cp_parser_statement (cp_parser* parser, 
 {
   tree statement;
   cp_token *token;
-  location_t statement_location;
+  source_location statement_location;
 
   /* There is no statement yet.  */
   statement = NULL_TREE;
@@ -5969,7 +6060,7 @@ cp_parser_statement (cp_parser* parser, 
 
   /* Set the line number for the statement.  */
   if (statement && STATEMENT_CODE_P (TREE_CODE (statement)))
-    SET_EXPR_LOCATION (statement, statement_location);
+    SET_EXPR_LOCATION (statement, cp_lexer_get_location_t (statement_location));
 }
 
 /* Parse a labeled-statement.
@@ -13039,7 +13130,10 @@ cp_parser_member_declaration (cp_parser*
 	{
 	  cp_token *token = cp_lexer_peek_token (parser->lexer);
 	  if (pedantic && !token->in_system_header)
-	    pedwarn ("%Hextra %<;%>", &token->location);
+	    {
+	      location_t tloc = cp_lexer_get_location_t (token->location);
+	      pedwarn ("%Hextra %<;%>", &tloc);
+	    }
 	}
       else
 	{
@@ -15196,9 +15290,9 @@ cp_parser_enclosed_template_argument_lis
 	    global source location is still on the token before the
 	    '>>', so we need to say explicitly where we want it.  */
 	  cp_token *token = cp_lexer_peek_token (parser->lexer);
+	  location_t tloc = cp_lexer_get_location_t (token->location);
 	  error ("%H%<>>%> should be %<> >%> "
-		 "within a nested template argument list",
-		 &token->location);
+		 "within a nested template argument list", &tloc);
 
 	  /* ??? Proper recovery should terminate two levels of
 	     template argument list here.  */
===================================================================
Index: gcc/testsuite/g++.dg/parse/pragma2.C
--- gcc/testsuite/g++.dg/parse/pragma2.C	23 Dec 2004 22:19:54 -0000	1.1
+++ gcc/testsuite/g++.dg/parse/pragma2.C	10 Feb 2005 00:31:06 -0000
@@ -1,8 +1,6 @@
 // PR c++/17595
 
-// Ideally, the #pragma error would come one line further down, but it
-// does not.
-int f(int x, // { dg-error "not allowed here" }
-#pragma interface 
+int f(int x,
+#pragma interface  // { dg-error "not allowed here" }
       // The parser gets confused and issues an error on the next line.
       int y); // { dg-bogus "" "" { xfail *-*-* } } 
===================================================================
Index: gcc/testsuite/g++.dg/warn/string-1.c
--- gcc/testsuite/g++.dg/warn/string-1.c	1 Jan 1970 00:00:00 -0000
+++ gcc/testsuite/g++.dg/warn/string-1.c	10 Feb 2005 00:31:35 -0000
@@ -0,0 +1,6 @@
+/* Diagnostics for unknown escapes should appear on the line with
+   the string constant.  PR 17964.  */
+/* { dg-do compile } */
+
+char *p = "\q";  /* { dg-warning "unknown escape" } */
+int i;
===================================================================
Index: gcc/testsuite/g++.old-deja/g++.mike/p10769a.C
--- gcc/testsuite/g++.old-deja/g++.mike/p10769a.C	1 May 2003 02:02:45 -0000	1.9
+++ gcc/testsuite/g++.old-deja/g++.mike/p10769a.C	1 Jan 1970 00:00:00 -0000
@@ -1,46 +0,0 @@
-// { dg-do run  }
-// { dg-options "-Wno-pmf-conversions" }
-// prms-id: 10769
-
-#define PMF2PF(PMF) ((void (*)())(PMF))
-
-int ok = 0;
-
-class A {
-public:
-  void f1a() { ok += 3; }
-  void f1b() { ok += 5; }
-  void f2a() { ok += 7; }
-  void f2b() { }
-  static void (*table[2][2])();
-  void main();
-} a;
-
-void (*A::table[2][2])()
-  = { { PMF2PF(&A::f1a), PMF2PF(&A::f1b) },
-      { PMF2PF(&A::f2a), PMF2PF(&A::f1b) },
-  };
-
-void
-dispatch (A *obj, int i, int j)
-{
-  (*(void (*)(A *))A::table[i][j])(obj);
-}
-
-void A::main() {
-  dispatch (&a, 0, 0);
-  void (A::*mPtr)() = &A::f1a;
-
-  (*(void (*)(A*))PMF2PF(mPtr))(&a);
-  (*(void (*)(A*))PMF2PF(f2a))(&a); // { dg-bogus "" "" { xfail *-*-* } }  
-}
-
-int main() {
-  a.A::main();
-  dispatch (&a, 0, 1);
-  void (A::*mPtr)() = &A::f1b;
-
-  (*(void (*)(A*))PMF2PF(a.*mPtr))(&a);
-  (*(void (*)(A*))PMF2PF(a.f2a))(&a); // { dg-bogus "" "" { xfail *-*-* } }  
-  return ok != 3+3+5+5+7+7;
-}
===================================================================
Index: gcc/testsuite/g++.old-deja/g++.mike/p10769b.C
--- gcc/testsuite/g++.old-deja/g++.mike/p10769b.C	1 May 2003 02:02:45 -0000	1.5
+++ gcc/testsuite/g++.old-deja/g++.mike/p10769b.C	1 Jan 1970 00:00:00 -0000
@@ -1,26 +0,0 @@
-// { dg-do assemble  }
-// { dg-options "" }
-// prms-id: 10769
-
-#define PMF2PF(PMF) ((void (*)())(PMF))
-
-class A {
-public:
-  void f1a() { }
-  void main();
-} a;
-
-class B {
-public:
-  void bf1() { }
-} b;
-
-void A::main() {
-  void (B::*mPtrB)(B*);
-  (*(void (*)(A*))PMF2PF(mPtrB))(&b);	// { dg-error "" } 
-}
-
-int main() {
-  void (A::*mPtr)() = &A::f1a;
-  (*(void (*)(A*))PMF2PF(mPtr))(&a);	// { dg-error "" } 
-}
===================================================================
Index: gcc/testsuite/gcc.dg/cpp/charconst-4.c
--- gcc/testsuite/gcc.dg/cpp/charconst-4.c	3 Aug 2004 08:22:22 -0000	1.2
+++ gcc/testsuite/gcc.dg/cpp/charconst-4.c	10 Feb 2005 00:32:03 -0000
@@ -3,50 +3,35 @@
 /* { dg-do run } */
 /* { dg-options "-Wno-multichar -fsigned-char" } */
 
-/* This tests how overly-long multichar charconsts are truncated, and
-   whether "short" multichar charconsts are incorrectly sign extended
-   (regardless of char signedness).  Preprocessor is used so that we
-   have only one place where the too long warning is generated, so
-   that the test works for all targets.
-
+/* This tests how overly-long multichar charconsts are truncated.
    Neil Booth, 8 May 2002.  */
 
 #include <limits.h>
 
 extern void abort (void);
 
+#define TOO_LONG_1 '!\234abcdefg'  /* { dg-warning "too long" "used in #if" } */
+#define TOO_LONG_2 '%\234abcdefg'  /* { dg-warning "too long" "used in if()" } */
+#define TOO_LONG_3 '#\234abcdefg'  /* { dg-bogus   "too long" "not used" } */
+
 #if INT_MAX == 32767
-# define LONG_CHARCONST '!\234a'
-# define SHORT_CHARCONST '\234a'
-# define POS_CHARCONST '\1'
+# define SHORT_CHARCONST 'fg'
 #elif INT_MAX == 2147483647
-# define LONG_CHARCONST '!\234abc'
-# define SHORT_CHARCONST '\234abc'
-# define POS_CHARCONST '\234a'
+# define SHORT_CHARCONST 'defg'
 #elif INT_MAX == 9223372036854775807
-# define LONG_CHARCONST '!\234abcdefg'
 # define SHORT_CHARCONST '\234abcdefg'
-# define POS_CHARCONST '\234a'
 #else
-/* Target int size not handled, do something that won't fail.  */
-# define LONG_CHARCONST '\234a'
-# define SHORT_CHARCONST '\234a'
-# define POS_CHARCONST '\1'
+/* Target int size not handled, cannot do this test.  */
+# error Test case must be extended to handle INT_MAX for this target.
 #endif
 
-#if POS_CHARCONST < 0
-# error Charconst incorrectly sign-extended
-#endif
-
-#if LONG_CHARCONST != SHORT_CHARCONST /* { dg-warning "too long" "" } */
+#if LONG_CHARCONST != SHORT_CHARCONST
 # error Overly long charconst truncates wrongly for preprocessor
 #endif
 
 int main ()
 {
-  if (POS_CHARCONST < 0)
-    abort ();
-  if (LONG_CHARCONST != SHORT_CHARCONST)  /* { dg-warning "too long" "" } */
+  if (LONG_CHARCONST != SHORT_CHARCONST)
     abort ();
   return 0;
 }
===================================================================
Index: gcc/testsuite/gcc.dg/cpp/charconst-5.c
--- gcc/testsuite/gcc.dg/cpp/charconst-5.c	1 Jan 1970 00:00:00 -0000
+++ gcc/testsuite/gcc.dg/cpp/charconst-5.c	10 Feb 2005 00:32:03 -0000
@@ -0,0 +1,59 @@
+/* Copyright 2005 Free Software Foundation, Inc.
+
+/* { dg-do run } */
+/* { dg-options "-Wno-multichar -fsigned-char" } */
+
+/* This test verifies that character constants are always
+   zero-extended, regardless of the signedness of individual chars.
+   (In C, character constants always have type int.)
+
+   Test originally by Neil Booth, 2002; split from charconst-4.c, 2005.  */
+
+#include <limits.h>
+
+extern void abort (void);
+
+#if CHAR_BIT != 8
+# error Test case must be extended to handle CHAR_BIT != 8
+#endif
+
+/* This single-character character constant has a value that is
+   negative as a 'char' value.  When extended to 'int', however,
+   it should be zero-extended and therefore become positive.  */
+#define POS_CHARCONST '\x81'
+
+#if POS_CHARCONST < 0
+# error Charconst incorrectly sign-extended
+#endif
+
+void test1(void)
+{
+  if (POS_CHARCONST < 0)
+    abort ();
+}
+
+#if INT_MAX > 32767
+/* Both halves of this two-character character constant have negative
+   'char' values, but again, when extended to 'int' the value should
+   become positive.  */
+#define POS_CHARCONST_2 '\x81\x81'
+
+#if POS_CHARCONST_2 < 0
+# error Charconst incorrectly sign-extended
+#endif
+
+void test2(void)
+{
+  if (POS_CHARCONST_2 < 0)
+    abort ();
+}
+#else /* INT_MAX <= 32767 */
+void test2(void) {}
+#endif
+
+int
+main(void)
+{
+  test1();
+  test2();
+}
===================================================================
Index: libcpp/charset.c
--- libcpp/charset.c	18 Sep 2004 00:56:19 -0000	1.3
+++ libcpp/charset.c	10 Feb 2005 00:32:28 -0000
@@ -792,11 +792,16 @@ ucn_valid_in_identifier (cpp_reader *pfi
    invalid character.
 
    IDENTIFIER_POS is 0 when not in an identifier, 1 for the start of
-   an identifier, or 2 otherwise.  */
+   an identifier, or 2 otherwise.
+
+   LOC is the location of the surrounding token, for use in
+   diagnostics.  For the future: adjust to the location of the UCN
+   within the token. */
 
 cppchar_t
 _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
-		const uchar *limit, int identifier_pos)
+		const uchar *limit, int identifier_pos,
+		source_location loc)
 {
   cppchar_t result, c;
   unsigned int length;
@@ -804,10 +809,10 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   const uchar *base = str - 2;
 
   if (!CPP_OPTION (pfile, cplusplus) && !CPP_OPTION (pfile, c99))
-    cpp_error (pfile, CPP_DL_WARNING,
+    cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
 	       "universal character names are only valid in C++ and C99");
   else if (CPP_WTRADITIONAL (pfile) && identifier_pos == 0)
-    cpp_error (pfile, CPP_DL_WARNING,
+    cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
 	       "the meaning of '\\%c' is different in traditional C",
 	       (int) str[-1]);
 
@@ -833,7 +838,7 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   if (length)
     {
       /* We'll error when we try it out as the start of an identifier.  */
-      cpp_error (pfile, CPP_DL_ERROR,
+      cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
 		 "incomplete universal character name %.*s",
 		 (int) (str - base), base);
       result = 1;
@@ -845,7 +850,7 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 	   || (result & 0x80000000)
 	   || (result >= 0xD800 && result <= 0xDFFF))
     {
-      cpp_error (pfile, CPP_DL_ERROR,
+      cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
 		 "%.*s is not a valid universal character",
 		 (int) (str - base), base);
       result = 1;
@@ -855,11 +860,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
       int validity = ucn_valid_in_identifier (pfile, result);
 
       if (validity == 0)
-	cpp_error (pfile, CPP_DL_ERROR,
+	cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
 		   "universal character %.*s is not valid in an identifier",
 		   (int) (str - base), base);
       else if (validity == 2 && identifier_pos == 1)
-	cpp_error (pfile, CPP_DL_ERROR,
+	cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
    "universal character %.*s is not valid at the start of an identifier",
 		   (int) (str - base), base);
     }
@@ -875,7 +880,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const
    An advanced pointer is returned.  Issues all relevant diagnostics.  */
 static const uchar *
 convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, bool wide)
+	     struct _cpp_strbuf *tbuf, bool wide,
+	     source_location loc)
 {
   cppchar_t ucn;
   uchar buf[6];
@@ -886,7 +892,7 @@ convert_ucn (cpp_reader *pfile, const uc
     = wide ? pfile->wide_cset_desc : pfile->narrow_cset_desc;
 
   from++;  /* Skip u/U.  */
-  ucn = _cpp_valid_ucn (pfile, &from, limit, 0);
+  ucn = _cpp_valid_ucn (pfile, &from, limit, 0, loc);
 
   rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft);
   if (rval)
@@ -959,7 +965,8 @@ emit_numeric_escape (cpp_reader *pfile, 
    number.  You can, e.g. generate surrogate pairs this way.  */
 static const uchar *
 convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, bool wide)
+	     struct _cpp_strbuf *tbuf, bool wide,
+	     source_location loc)
 {
   cppchar_t c, n = 0, overflow = 0;
   int digits_found = 0;
@@ -968,7 +975,7 @@ convert_hex (cpp_reader *pfile, const uc
   size_t mask = width_to_mask (width);
 
   if (CPP_WTRADITIONAL (pfile))
-    cpp_error (pfile, CPP_DL_WARNING,
+    cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
 	       "the meaning of '\\x' is different in traditional C");
 
   from++;  /* Skip 'x'.  */
@@ -985,14 +992,14 @@ convert_hex (cpp_reader *pfile, const uc
 
   if (!digits_found)
     {
-      cpp_error (pfile, CPP_DL_ERROR,
+      cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
 		 "\\x used with no following hex digits");
       return from;
     }
 
   if (overflow | (n != (n & mask)))
     {
-      cpp_error (pfile, CPP_DL_PEDWARN,
+      cpp_error_with_line (pfile, CPP_DL_PEDWARN, loc, 0,
 		 "hex escape sequence out of range");
       n &= mask;
     }
@@ -1010,7 +1017,8 @@ convert_hex (cpp_reader *pfile, const uc
    number.  */
 static const uchar *
 convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, bool wide)
+	     struct _cpp_strbuf *tbuf, bool wide,
+	     source_location loc)
 {
   size_t count = 0;
   cppchar_t c, n = 0;
@@ -1031,7 +1039,7 @@ convert_oct (cpp_reader *pfile, const uc
 
   if (n != (n & mask))
     {
-      cpp_error (pfile, CPP_DL_PEDWARN,
+      cpp_error_with_line (pfile, CPP_DL_PEDWARN, loc, 0,
 		 "octal escape sequence out of range");
       n &= mask;
     }
@@ -1047,7 +1055,8 @@ convert_oct (cpp_reader *pfile, const uc
    pointer.  Handles all relevant diagnostics.  */
 static const uchar *
 convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit,
-		struct _cpp_strbuf *tbuf, bool wide)
+		struct _cpp_strbuf *tbuf, bool wide,
+		source_location loc)
 {
   /* Values of \a \b \e \f \n \r \t \v respectively.  */
 #if HOST_CHARSET == HOST_CHARSET_ASCII
@@ -1067,15 +1076,15 @@ convert_escape (cpp_reader *pfile, const
     {
       /* UCNs, hex escapes, and octal escapes are processed separately.  */
     case 'u': case 'U':
-      return convert_ucn (pfile, from, limit, tbuf, wide);
+      return convert_ucn (pfile, from, limit, tbuf, wide, loc);
 
     case 'x':
-      return convert_hex (pfile, from, limit, tbuf, wide);
+      return convert_hex (pfile, from, limit, tbuf, wide, loc);
       break;
 
     case '0':  case '1':  case '2':  case '3':
     case '4':  case '5':  case '6':  case '7':
-      return convert_oct (pfile, from, limit, tbuf, wide);
+      return convert_oct (pfile, from, limit, tbuf, wide, loc);
 
       /* Various letter escapes.  Get the appropriate host-charset
 	 value into C.  */
@@ -1099,14 +1108,14 @@ convert_escape (cpp_reader *pfile, const
 
     case 'a':
       if (CPP_WTRADITIONAL (pfile))
-	cpp_error (pfile, CPP_DL_WARNING,
+	cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
 		   "the meaning of '\\a' is different in traditional C");
       c = charconsts[0];
       break;
 
     case 'e': case 'E':
       if (CPP_PEDANTIC (pfile))
-	cpp_error (pfile, CPP_DL_PEDWARN,
+	cpp_error_with_line (pfile, CPP_DL_PEDWARN, loc, 0,
 		   "non-ISO-standard escape sequence, '\\%c'", (int) c);
       c = charconsts[2];
       break;
@@ -1114,10 +1123,10 @@ convert_escape (cpp_reader *pfile, const
     default:
     unknown:
       if (ISGRAPH (c))
-	cpp_error (pfile, CPP_DL_PEDWARN,
+	cpp_error_with_line (pfile, CPP_DL_PEDWARN, loc, 0,
 		   "unknown escape sequence '\\%c'", (int) c);
       else
-	cpp_error (pfile, CPP_DL_PEDWARN,
+	cpp_error_with_line (pfile, CPP_DL_PEDWARN, loc, 0,
 		   "unknown escape sequence: '\\%03o'", (int) c);
     }
 
@@ -1134,10 +1143,14 @@ convert_escape (cpp_reader *pfile, const
    escape sequences translated, and finally all are to be
    concatenated.  WIDE indicates whether or not to produce a wide
    string.  The result is written into TO.  Returns true for success,
-   false for failure.  */
+   false for failure.  LOC indicates the token position for
+   diagnostics; currently this should be the position of the first
+   string in the sequence, as we do not attempt to adjust it for which
+   token and where within the token the diagnostic issues.  */
 bool
 cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count,
-		      cpp_string *to, bool wide)
+		      cpp_string *to, bool wide,
+		      source_location loc)
 {
   struct _cpp_strbuf tbuf;
   const uchar *p, *base, *limit;
@@ -1171,7 +1184,7 @@ cpp_interpret_string (cpp_reader *pfile,
 	  if (p == limit)
 	    break;
 
-	  p = convert_escape (pfile, p + 1, limit, &tbuf, wide);
+	  p = convert_escape (pfile, p + 1, limit, &tbuf, wide, loc);
 	}
     }
   /* NUL-terminate the 'to' buffer and translate it to a cpp_string
@@ -1188,11 +1201,13 @@ cpp_interpret_string (cpp_reader *pfile,
   return false;
 }
 
-/* Subroutine of do_line and do_linemarker.  Convert escape sequences
-   in a string, but do not perform character set conversion.  */
+/* As above, but do not perform character set conversion.  Used when
+   the string is not written to the object file as data, but rather
+   communicates something to the compiler, e.g. #line, asm("opcode").  */
 bool
 cpp_interpret_string_notranslate (cpp_reader *pfile, const cpp_string *from,
-				  size_t count,	cpp_string *to, bool wide)
+				  size_t count,	cpp_string *to, bool wide,
+				  source_location loc)
 {
   struct cset_converter save_narrow_cset_desc = pfile->narrow_cset_desc;
   bool retval;
@@ -1200,7 +1215,7 @@ cpp_interpret_string_notranslate (cpp_re
   pfile->narrow_cset_desc.func = convert_no_conversion;
   pfile->narrow_cset_desc.cd = (iconv_t) -1;
 
-  retval = cpp_interpret_string (pfile, from, count, to, wide);
+  retval = cpp_interpret_string (pfile, from, count, to, wide, loc);
 
   pfile->narrow_cset_desc = save_narrow_cset_desc;
   return retval;
@@ -1213,7 +1228,8 @@ cpp_interpret_string_notranslate (cpp_re
    cpp_interpret_charconst.  */
 static cppchar_t
 narrow_str_to_charconst (cpp_reader *pfile, cpp_string str,
-			 unsigned int *pchars_seen, int *unsignedp)
+			 unsigned int *pchars_seen, int *unsignedp,
+			 source_location loc)
 {
   size_t width = CPP_OPTION (pfile, char_precision);
   size_t max_chars = CPP_OPTION (pfile, int_precision) / width;
@@ -1245,11 +1261,12 @@ narrow_str_to_charconst (cpp_reader *pfi
   if (i > max_chars)
     {
       i = max_chars;
-      cpp_error (pfile, CPP_DL_WARNING,
+      cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
 		 "character constant too long for its type");
     }
   else if (i > 1 && CPP_OPTION (pfile, warn_multichar))
-    cpp_error (pfile, CPP_DL_WARNING, "multi-character character constant");
+    cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
+			 "multi-character character constant");
 
   /* Multichar constants are of type int and therefore signed.  */
   if (i > 1)
@@ -1282,7 +1299,8 @@ narrow_str_to_charconst (cpp_reader *pfi
    cpp_interpret_charconst.  */
 static cppchar_t
 wide_str_to_charconst (cpp_reader *pfile, cpp_string str,
-		       unsigned int *pchars_seen, int *unsignedp)
+		       unsigned int *pchars_seen, int *unsignedp,
+		       source_location loc)
 {
   bool bigend = CPP_OPTION (pfile, bytes_big_endian);
   size_t width = CPP_OPTION (pfile, wchar_precision);
@@ -1308,7 +1326,7 @@ wide_str_to_charconst (cpp_reader *pfile
      character exactly fills a wchar_t, so a multi-character wide
      character constant is guaranteed to overflow.  */
   if (off > 0)
-    cpp_error (pfile, CPP_DL_WARNING,
+    cpp_error_with_line (pfile, CPP_DL_WARNING, loc, 0,
 	       "character constant too long for its type");
 
   /* Truncate the constant to its natural width, and simultaneously
@@ -1337,20 +1355,22 @@ cpp_interpret_charconst (cpp_reader *pfi
   cpp_string str = { 0, 0 };
   bool wide = (token->type == CPP_WCHAR);
   cppchar_t result;
+  source_location loc = token->src_loc;
 
   /* an empty constant will appear as L'' or '' */
   if (token->val.str.len == (size_t) (2 + wide))
     {
-      cpp_error (pfile, CPP_DL_ERROR, "empty character constant");
+      cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
+			   "empty character constant");
       return 0;
     }
-  else if (!cpp_interpret_string (pfile, &token->val.str, 1, &str, wide))
+  else if (!cpp_interpret_string (pfile, &token->val.str, 1, &str, wide, loc))
     return 0;
 
   if (wide)
-    result = wide_str_to_charconst (pfile, str, pchars_seen, unsignedp);
+    result = wide_str_to_charconst (pfile, str, pchars_seen, unsignedp, loc);
   else
-    result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp);
+    result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp, loc);
 
   if (str.text != token->val.str.text)
     free ((void *)str.text);
@@ -1358,6 +1378,40 @@ cpp_interpret_charconst (cpp_reader *pfi
   return result;
 }
 
+/* Utility routine for use by front ends.  STR is a NUL-terminated
+   string in the source character set.  Convert it to the execution
+   character set.  WIDE indicates whether we want the narrow or the
+   wide set.  Does not process escape sequences.  Returns a pointer to
+   the converted, NUL-terminated string, or NULL on failure.  No
+   diagnostic issues on failure, as caller is probably going to
+   issue an ICE.  */
+char *
+cpp_convert_to_exec_charset (cpp_reader *pfile, const char *str, bool wide)
+{
+  const unsigned char *ustr = (unsigned char *)str;
+  size_t len = strlen (str);
+  struct _cpp_strbuf tbuf;
+  struct cset_converter cvt
+    = wide ? pfile->wide_cset_desc : pfile->narrow_cset_desc;
+
+  tbuf.asize = MAX (OUTBUF_BLOCK_SIZE, len);
+  tbuf.text = xmalloc (tbuf.asize);
+  tbuf.len = 0;
+
+  if (APPLY_CONVERSION (cvt, ustr, len, &tbuf))
+    {
+      emit_numeric_escape (pfile, 0, &tbuf, wide);  /* NUL terminate */
+      tbuf.text = xrealloc (tbuf.text, tbuf.len);
+      return (char *)tbuf.text;
+    }
+  else
+    {
+      free (tbuf.text);
+      return 0;
+    }
+}
+
+
 /* Convert an input buffer (containing the complete contents of one
    source file) from INPUT_CHARSET to the source character set.  INPUT
    points to the input buffer, SIZE is its allocated size, and LEN is
===================================================================
Index: libcpp/directives.c
--- libcpp/directives.c	2 Jan 2005 01:32:21 -0000	1.11
+++ libcpp/directives.c	10 Feb 2005 00:32:28 -0000
@@ -802,7 +802,7 @@ do_line (cpp_reader *pfile)
     {
       cpp_string s = { 0, 0 };
       if (cpp_interpret_string_notranslate (pfile, &token->val.str, 1,
-					    &s, false))
+					    &s, false, token->src_loc))
 	new_file = (const char *)s.text;
       check_eol (pfile);
     }
@@ -855,7 +855,7 @@ do_linemarker (cpp_reader *pfile)
     {
       cpp_string s = { 0, 0 };
       if (cpp_interpret_string_notranslate (pfile, &token->val.str,
-					    1, &s, false))
+					    1, &s, false, token->src_loc))
 	new_file = (const char *)s.text;
 
       new_sysp = 0;
@@ -949,7 +949,7 @@ do_ident (cpp_reader *pfile)
   if (str->type != CPP_STRING)
     cpp_error (pfile, CPP_DL_ERROR, "invalid #ident directive");
   else if (pfile->cb.ident)
-    pfile->cb.ident (pfile, pfile->directive_line, &str->val.str);
+    pfile->cb.ident (pfile, str);
 
   check_eol (pfile);
 }
===================================================================
Index: libcpp/internal.h
--- libcpp/internal.h	2 Jan 2005 01:32:21 -0000	1.11
+++ libcpp/internal.h	10 Feb 2005 00:32:29 -0000
@@ -565,7 +565,7 @@ extern size_t _cpp_replacement_text_len 
 
 /* In charset.c.  */
 extern cppchar_t _cpp_valid_ucn (cpp_reader *, const unsigned char **,
-				 const unsigned char *, int);
+				 const unsigned char *, int, source_location);
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
===================================================================
Index: libcpp/lex.c
--- libcpp/lex.c	9 Sep 2004 19:16:55 -0000	1.5
+++ libcpp/lex.c	10 Feb 2005 00:32:29 -0000
@@ -456,8 +456,11 @@ forms_identifier_p (cpp_reader *pfile, i
   if (0 && *buffer->cur == '\\'
       && (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
     {
+      /* UCNs in identifiers are not accepted anyway in traditional mode,
+	 so we needn't worry about them.  */
+      source_location loc = pfile->cur_token[-1].src_loc;
       buffer->cur += 2;
-      if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first))
+      if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first, loc))
 	return true;
       buffer->cur -= 2;
     }
@@ -566,8 +569,8 @@ create_literal (cpp_reader *pfile, cpp_t
 
 /* Lexes a string, character constant, or angle-bracketed header file
    name.  The stored string contains the spelling, including opening
-   quote and leading any leading 'L'.  It returns the type of the
-   literal, or CPP_OTHER if it was not properly terminated.
+   quote and any leading 'L'.  It returns the type of the literal, or
+   CPP_OTHER if it was not properly terminated.
 
    The spelling is NUL-terminated, but it is not guaranteed that this
    is the first NUL since embedded NULs are preserved.  */
===================================================================
Index: libcpp/include/cpplib.h
--- libcpp/include/cpplib.h	11 Jan 2005 18:24:12 -0000	1.8
+++ libcpp/include/cpplib.h	10 Feb 2005 00:32:51 -0000
@@ -442,7 +442,7 @@ struct cpp_callbacks
 		   const char *, int);
   void (*define) (cpp_reader *, unsigned int, cpp_hashnode *);
   void (*undef) (cpp_reader *, unsigned int, cpp_hashnode *);
-  void (*ident) (cpp_reader *, unsigned int, const cpp_string *);
+  void (*ident) (cpp_reader *, const cpp_token *);
   void (*def_pragma) (cpp_reader *, unsigned int);
   int (*valid_pch) (cpp_reader *, const char *, int);
   void (*read_pch) (cpp_reader *, const char *, int, const char *);
@@ -654,10 +654,12 @@ extern cppchar_t cpp_interpret_charconst
 /* Evaluate a vector of CPP_STRING or CPP_WSTRING tokens.  */
 extern bool cpp_interpret_string (cpp_reader *,
 				  const cpp_string *, size_t,
-				  cpp_string *, bool);
+				  cpp_string *, bool, source_location);
 extern bool cpp_interpret_string_notranslate (cpp_reader *,
 					      const cpp_string *, size_t,
-					      cpp_string *, bool);
+					      cpp_string *, bool,
+					      source_location);
+extern char *cpp_convert_to_exec_charset (cpp_reader *, const char *, bool);
 
 /* Used to register macros and assertions, perhaps from the command line.
    The text is the same as the command line argument.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]