This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PR 18785: alternative patch


PR 18785 reports, in essence, that now that we have this
-fexec-charset command line option, it needs to affect the values that
are currently hardwired as TARGET_NEWLINE, TARGET_DIGIT0, etc.  For
example, 

extern int isdigit(int);
int main(void) { return isdigit('3'); }

will be miscompiled with -fexec-charset=IBM1047 (aka EBCDIC).  Worse,
it'll be miscompiled on a target where this is the default, because
the narrow-execution-charset setting only affects cpplib, not the
value of TARGET_DIGIT0.

In December, Roger Sayle proposed a patch for this which attempted to
decide whether the host and target execution character sets were the
same; if not, it built up a large conversion table.  For reference,
that patch is archived at
<http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01925.html>.

I'm not fond of this approach; I prefer to inquire, via cpplib, the
target-execution-charset value of characters on a case-by-case basis.
This allows much simpler logic at the sites where the values are used,
and avoids introducing new globals.  It might be a little slower than
the big table, but it is unlikely that these values will ever be
queried on a critical path.

The appended patch implements my preferred approach.  The bulk of it
is introducing the new cpplib API and the new langhook that wraps it.
Non-C-family front ends that implement -fexec-charset will need to do
their own thing; we can talk about refactoring cpplib to make its
charset.c usable apart from the rest of it (rather like mkdeps.c and
line-map.c are now) if that's appropriate, in the 4.1 timeframe.
The fix for the exact problem reported in PR18785 is then the very
simple change to builtins.c:fold_builtin_isdigit.  Joseph pointed out
back in December that printf handling needs similar treatment, but
let's get the basic approach nailed down first.

I also eliminated all other uses of the target-character-value
macros.  Besides the above, they were used in three places to
pretty-print string constants - either to be read by the user, or to
be read by the assembler.  This is definitely wrong when applied to
the assembler.  GCC's not expecting the assembler to do conversion;
when (hosted on an ASCII machine) it emits

  .string "1234\nabcd"

it is shorthand for

  .byte 49,50,51,52,10,61,62,63,64,0

even when -fexec-charset=IBM1047.  (If that string appeared in the
*input*, we'd see

  .string "\361\362\363\364%\201\202\203\204"

in the assembly.)  Now, the only way we can be certain that the
assembler is doing what we think is to stop using .string and similar
directives altogether, but I think it will work in practice to assume
that the assembler is working in the host character set.  Accordingly,
MIPS and ARM now just query ISPRINT to decide between an octal escape
and a literal character.  Interestingly, i386 is still generating "\n"
in its assembly output; I'm not sure where that's coming from.

The question is a little murkier applied to c-pretty-print.c.
However, I think that when looking at debug dumps, the user is going
to prefer to see strings that correspond directly to the values the
compiler is working with - in other words, they're in the same boat as
the assembler.  So I did the same thing there.

Lightly tested - above testcase compiled by an amd64-linux native
compiler with no special options; -fno-signed-char;
-fexec-charset=IBM1047; and -fexec-charset=IBM1047 -fno-signed-char;
and the assembly output hand-inspected in all cases.

Thoughts?

zw

libcpp:
        * charset.c (LAST_POSSIBLY_BASIC_SOURCE_CHAR): New helper macro.
        (cpp_host_to_exec_charset): New function.
        * include/cpplib.h: Declare cpp_host_to_exec_charset.

gcc:
        * langhooks.h (struct lang_hooks): Add to_target_charset.
        * langhooks.c (lhd_to_target_charset): New function.
        * langhooks-def.h: Declare lhd_to_target_charset.
        (LANG_HOOKS_TO_TARGET_CHARSET): New macro.
        (LANG_HOOKS_INITIALIZER): Update.
        * c-common.c (c_common_to_target_charset): New function.
        * c-common.h: Declare it.
        * c-objc-common.h (LANG_HOOKS_TO_TARGET_CHARSET): Set to
        c_common_to_target_charset.

        * defaults.c (TARGET_BELL, TARGET_BS, TARGET_CR, TARGET_DIGIT0)
        (TARGET_ESC, TARGET_FF, TARGET_NEWLINE, TARGET_TAB, TARGET_VT):
        Delete definitions.
        * system.h: Poison them.
        * doc/tm.texi: Don't discuss them.
        * builtins.c (fold_builtin_isdigit): Use lang_hooks.to_target_charset.
        * c-pretty-print.c (pp_c_integer_constant): Don't use pp_c_char.
        (pp_c_char): Do not attempt to generate letter escapes for
        newline, tab, etc.
        * config/arm/arm.c (output_ascii_pseudo_op): Likewise.
        * config/mips/mips.c (mips_output_ascii): Likewise.
gcc/cp:
        * cp-objcp-common.h (LANG_HOOKS_TO_TARGET_CHARSET): Set to
        c_common_to_target_charset.  Delete bogus comment.

===================================================================
Index: gcc/builtins.c
--- gcc/builtins.c	12 Feb 2005 11:34:20 -0000	1.422
+++ gcc/builtins.c	15 Feb 2005 02:07:10 -0000
@@ -7619,11 +7619,18 @@ fold_builtin_isdigit (tree arglist)
   else
     {
       /* Transform isdigit(c) -> (unsigned)(c) - '0' <= 9.  */
-      /* According to the C standard, isdigit is unaffected by locale.  */
-      tree arg = TREE_VALUE (arglist);
-      arg = fold_convert (unsigned_type_node, arg);
+      /* According to the C standard, isdigit is unaffected by locale.
+	 However, it definitely is affected by the target character set.  */
+      tree arg;
+      unsigned HOST_WIDE_INT target_digit0
+	= lang_hooks.to_target_charset ('0');
+
+      if (target_digit0 == 0)
+	return NULL_TREE;
+
+      arg = fold_convert (unsigned_type_node, TREE_VALUE (arglist));
       arg = build2 (MINUS_EXPR, unsigned_type_node, arg,
-		    build_int_cst (unsigned_type_node, TARGET_DIGIT0));
+		    build_int_cst (unsigned_type_node, target_digit0));
       arg = build2 (LE_EXPR, integer_type_node, arg,
 		    build_int_cst (unsigned_type_node, 9));
       arg = fold (arg);
===================================================================
Index: gcc/c-common.c
--- gcc/c-common.c	12 Feb 2005 00:26:44 -0000	1.604
+++ gcc/c-common.c	15 Feb 2005 02:07:10 -0000
@@ -5620,6 +5620,27 @@ c_warn_unused_result (tree *top_p)
     }
 }
 
+/* Convert a character from the host to the target execution character
+   set.  cpplib handles this, mostly.  */
+
+HOST_WIDE_INT
+c_common_to_target_charset (HOST_WIDE_INT c)
+{
+  /* Character constants in GCC proper are sign-extended under -fsigned-char,
+     zero-extended under -fno-signed-char.  cpplib insists that characters
+     and character constants are always unsigned.  Hence we must convert
+     back and forth.  */
+  cppchar_t uc = ((cppchar_t)c) & ((((cppchar_t)1) << CHAR_BIT)-1);
+
+  uc = cpp_host_to_exec_charset (parse_in, uc);
+
+  if (flag_signed_char)
+    return ((HOST_WIDE_INT)uc) << (HOST_BITS_PER_WIDE_INT - CHAR_TYPE_SIZE)
+			       >> (HOST_BITS_PER_WIDE_INT - CHAR_TYPE_SIZE);
+  else
+    return uc;
+}
+
 /* Build the result of __builtin_offsetof.  EXPR is a nested sequence of
    component references, with an INDIRECT_REF at the bottom; much like
    the traditional rendering of offsetof as a macro.  Returns the folded
===================================================================
Index: gcc/c-common.h
--- gcc/c-common.h	27 Jan 2005 07:32:11 -0000	1.276
+++ gcc/c-common.h	15 Feb 2005 02:07:10 -0000
@@ -688,12 +688,14 @@ extern bool c_promoting_integer_type_p (
 extern int self_promoting_args_p (tree);
 extern tree strip_array_types (tree);
 extern tree strip_pointer_operator (tree);
+extern HOST_WIDE_INT c_common_to_target_charset (HOST_WIDE_INT);
 
 /* This is the basic parsing function.  */
 extern void c_parse_file (void);
 /* This is misnamed, it actually performs end-of-compilation processing.  */
 extern void finish_file	(void);
 
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if this statement should be considered a full-expression,
===================================================================
Index: gcc/c-objc-common.h
--- gcc/c-objc-common.h	2 Nov 2004 20:29:16 -0000	2.2
+++ gcc/c-objc-common.h	15 Feb 2005 02:07:10 -0000
@@ -117,6 +117,8 @@ extern void c_initialize_diagnostics (di
 #define LANG_HOOKS_TYPE_PROMOTES_TO c_type_promotes_to
 #undef LANG_HOOKS_REGISTER_BUILTIN_TYPE
 #define LANG_HOOKS_REGISTER_BUILTIN_TYPE c_register_builtin_type
+#undef LANG_HOOKS_TO_TARGET_CHARSET
+#define LANG_HOOKS_TO_TARGET_CHARSET c_common_to_target_charset
 
 /* The C front end's scoping structure is very different from
    that expected by the language-independent code; it is best
===================================================================
Index: gcc/c-pretty-print.c
--- gcc/c-pretty-print.c	7 Sep 2004 10:18:59 -0000	1.57
+++ gcc/c-pretty-print.c	15 Feb 2005 02:07:10 -0000
@@ -712,50 +712,37 @@ pp_c_function_definition (c_pretty_print
 
 /* Expressions.  */
 
-/* Print out a c-char.  */
+/* Print out a c-char.  This is called solely for characters which are
+   in the *target* execution character set.  We ought to convert them
+   back to the *host* execution character set before printing, but we
+   have no way to do this at present.  A decent compromise is to print
+   all characters as if they were in the host execution character set,
+   and not attempt to recover any named escape characters, but render
+   all unprintables as octal escapes.  If the host and target character
+   sets are the same, this produces relatively readable output.  If they
+   are not the same, strings may appear as gibberish, but that's okay
+   (in fact, it may well be what the reader wants, e.g. if they are looking
+   to see if conversion to the target character set happened correctly).
+
+   A special case: we need to prefix \, ", and ' with backslashes.  It is
+   correct to do so for the *host*'s \, ", and ', because the rest of the
+   file appears in the host character set.  */
 
 static void
 pp_c_char (c_pretty_printer *pp, int c)
 {
-  switch (c)
+  if (ISPRINT (c))
     {
-    case TARGET_NEWLINE:
-      pp_string (pp, "\\n");
-      break;
-    case TARGET_TAB:
-      pp_string (pp, "\\t");
-      break;
-    case TARGET_VT:
-      pp_string (pp, "\\v");
-      break;
-    case TARGET_BS:
-      pp_string (pp, "\\b");
-      break;
-    case TARGET_CR:
-      pp_string (pp, "\\r");
-      break;
-    case TARGET_FF:
-      pp_string (pp, "\\f");
-      break;
-    case TARGET_BELL:
-      pp_string (pp, "\\a");
-      break;
-    case '\\':
-      pp_string (pp, "\\\\");
-      break;
-    case '\'':
-      pp_string (pp, "\\'");
-      break;
-    case '\"':
-      pp_string (pp, "\\\"");
-      break;
-    default:
-      if (ISPRINT (c))
-	pp_character (pp, c);
-      else
-	pp_scalar (pp, "\\%03o", (unsigned) c);
-      break;
+      switch (c)
+	{
+	case '\\': pp_string (pp, "\\\\"); break;
+	case '\'': pp_string (pp, "\\\'"); break;
+	case '\"': pp_string (pp, "\\\""); break;
+	default:   pp_character (pp, c);
+	}
     }
+  else
+    pp_scalar (pp, "\\%03o", (unsigned) c);
 }
 
 /* Print out a STRING literal.  */
@@ -785,7 +772,7 @@ pp_c_integer_constant (c_pretty_printer 
     {
       if (tree_int_cst_sgn (i) < 0)
         {
-          pp_c_char (pp, '-');
+          pp_character (pp, '-');
           i = build_int_cst_wide (NULL_TREE,
 				  -TREE_INT_CST_LOW (i),
 				  ~TREE_INT_CST_HIGH (i)
===================================================================
Index: gcc/defaults.h
--- gcc/defaults.h	18 Jan 2005 11:36:04 -0000	1.167
+++ gcc/defaults.h	15 Feb 2005 02:07:10 -0000
@@ -36,19 +36,6 @@ Software Foundation, 59 Temple Place - S
 		  obstack_chunk_alloc,			\
 		  obstack_chunk_free)
 
-/* Define default standard character escape sequences.  */
-#ifndef TARGET_BELL
-#  define TARGET_BELL 007
-#  define TARGET_BS 010
-#  define TARGET_CR 015
-#  define TARGET_DIGIT0 060
-#  define TARGET_ESC 033
-#  define TARGET_FF 014
-#  define TARGET_NEWLINE 012
-#  define TARGET_TAB 011
-#  define TARGET_VT 013
-#endif
-
 /* Store in OUTPUT a string (made with alloca) containing an
    assembler-name for a local static variable or function named NAME.
    LABELNO is an integer which is different for each call.  */
===================================================================
Index: gcc/langhooks-def.h
--- gcc/langhooks-def.h	18 Jan 2005 11:36:15 -0000	1.96
+++ gcc/langhooks-def.h	15 Feb 2005 02:07:10 -0000
@@ -68,6 +68,7 @@ extern bool lhd_decl_ok_for_sibcall (tre
 extern const char *lhd_comdat_group (tree);
 extern tree lhd_expr_size (tree);
 extern size_t lhd_tree_size (enum tree_code);
+extern HOST_WIDE_INT lhd_to_target_charset (HOST_WIDE_INT);
 
 /* Declarations of default tree inlining hooks.  */
 extern tree lhd_tree_inlining_walk_subtrees (tree *, int *, walk_tree_fn,
@@ -122,6 +123,7 @@ extern int lhd_gimplify_expr (tree *, tr
 #define LANG_HOOKS_TREE_SIZE		lhd_tree_size
 #define LANG_HOOKS_TYPES_COMPATIBLE_P	lhd_types_compatible_p
 #define LANG_HOOKS_BUILTIN_FUNCTION	builtin_function
+#define LANG_HOOKS_TO_TARGET_CHARSET	lhd_to_target_charset
 
 #define LANG_HOOKS_FUNCTION_INIT	lhd_do_nothing_f
 #define LANG_HOOKS_FUNCTION_FINAL	lhd_do_nothing_f
@@ -285,6 +287,7 @@ extern tree lhd_make_node (enum tree_cod
   LANG_HOOKS_GET_CALLEE_FNDECL, \
   LANG_HOOKS_PRINT_ERROR_FUNCTION, \
   LANG_HOOKS_EXPR_SIZE, \
+  LANG_HOOKS_TO_TARGET_CHARSET, \
   LANG_HOOKS_ATTRIBUTE_TABLE, \
   LANG_HOOKS_COMMON_ATTRIBUTE_TABLE, \
   LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE, \
===================================================================
Index: gcc/langhooks.c
--- gcc/langhooks.c	14 Oct 2004 23:15:19 -0000	1.79
+++ gcc/langhooks.c	15 Feb 2005 02:07:12 -0000
@@ -546,3 +546,9 @@ lhd_make_node (enum tree_code code)
 {
   return make_node (code);
 }
+
+HOST_WIDE_INT
+lhd_to_target_charset (HOST_WIDE_INT c)
+{
+  return c;
+}
===================================================================
Index: gcc/langhooks.h
--- gcc/langhooks.h	14 Oct 2004 23:15:20 -0000	1.102
+++ gcc/langhooks.h	15 Feb 2005 02:07:12 -0000
@@ -374,6 +374,15 @@ struct lang_hooks
      semantics in cases that it doesn't want to handle specially.  */
   tree (*expr_size) (tree);
 
+  /* Convert a character from the host's to the target's character
+     set.  The character should be in what C calls the "basic source
+     character set" (roughly, the set of characters defined by plain
+     old ASCII).  The default is to return the character unchanged,
+     which is correct in most circumstances.  Note that both argument
+     and result should be sign-extended under -fsigned-char,
+     zero-extended under -fno-signed-char.  */
+  HOST_WIDE_INT (*to_target_charset) (HOST_WIDE_INT);
+
   /* Pointers to machine-independent attribute tables, for front ends
      using attribs.c.  If one is NULL, it is ignored.  Respectively, a
      table of attributes specific to the language, a table of
===================================================================
Index: gcc/system.h
--- gcc/system.h	30 Dec 2004 03:07:37 -0000	1.242
+++ gcc/system.h	15 Feb 2005 02:07:13 -0000
@@ -660,7 +660,8 @@ extern void fancy_abort (const char *, i
 	PUT_SDB_SRC_FILE STABS_GCC_MARKER DBX_OUTPUT_FUNCTION_END	   \
 	DBX_OUTPUT_GCC_MARKER DBX_FINISH_SYMBOL SDB_GENERATE_FAKE	   \
 	NON_SAVING_SETJMP TARGET_LATE_RTL_PROLOGUE_EPILOGUE		   \
-	CASE_DROPS_THROUGH
+	CASE_DROPS_THROUGH TARGET_BELL TARGET_BS TARGET_CR TARGET_DIGIT0   \
+        TARGET_ESC TARGET_FF TARGET_NEWLINE TARGET_TAB TARGET_VT
 
 /* Hooks that are no longer used.  */
  #pragma GCC poison LANG_HOOKS_FUNCTION_MARK LANG_HOOKS_FUNCTION_FREE	\
===================================================================
Index: gcc/config/arm/arm.c
--- gcc/config/arm/arm.c	1 Feb 2005 14:06:51 -0000	1.428
+++ gcc/config/arm/arm.c	15 Feb 2005 02:07:14 -0000
@@ -8634,8 +8634,14 @@ int_log2 (HOST_WIDE_INT power)
   return shift;
 }
 
-/* Output a .ascii pseudo-op, keeping track of lengths.  This is because
-   /bin/as is horribly restrictive.  */
+/* Output a .ascii pseudo-op, keeping track of lengths.  This is
+   because /bin/as is horribly restrictive.  The judgement about
+   whether or not each character is 'printable' (and can be output as
+   is) or not (and must be printed with an octal escape) must be made
+   with reference to the *host* character set -- the situation is
+   similar to that discussed in the comments above pp_c_char in
+   c-pretty-print.c.  */
+
 #define MAX_ASCII_LEN 51
 
 void
@@ -8656,57 +8662,20 @@ output_ascii_pseudo_op (FILE *stream, co
 	  len_so_far = 0;
 	}
 
-      switch (c)
+      if (ISPRINT (c))
 	{
-	case TARGET_TAB:
-	  fputs ("\\t", stream);
-	  len_so_far += 2;
-	  break;
-
-	case TARGET_FF:
-	  fputs ("\\f", stream);
-	  len_so_far += 2;
-	  break;
-
-	case TARGET_BS:
-	  fputs ("\\b", stream);
-	  len_so_far += 2;
-	  break;
-
-	case TARGET_CR:
-	  fputs ("\\r", stream);
-	  len_so_far += 2;
-	  break;
-
-	case TARGET_NEWLINE:
-	  fputs ("\\n", stream);
-	  c = p [i + 1];
-	  if ((c >= ' ' && c <= '~')
-	      || c == TARGET_TAB)
-	    /* This is a good place for a line break.  */
-	    len_so_far = MAX_ASCII_LEN;
-	  else
-	    len_so_far += 2;
-	  break;
-
-	case '\"':
-	case '\\':
-	  putc ('\\', stream);
-	  len_so_far++;
-	  /* Drop through.  */
-
-	default:
-	  if (c >= ' ' && c <= '~')
+	  if (c == '\\' || c == '\"')
 	    {
-	      putc (c, stream);
+	      putc ('\\', stream);
 	      len_so_far++;
 	    }
-	  else
-	    {
-	      fprintf (stream, "\\%03o", c);
-	      len_so_far += 4;
-	    }
-	  break;
+	  putc (c, stream);
+	  len_so_far++;
+	}
+      else
+	{
+	  fprintf (stream, "\\%03o", c);
+	  len_so_far += 4;
 	}
     }
 
===================================================================
Index: gcc/config/mips/mips.c
--- gcc/config/mips/mips.c	7 Feb 2005 15:53:35 -0000	1.485
+++ gcc/config/mips/mips.c	15 Feb 2005 02:07:16 -0000
@@ -5135,56 +5135,20 @@ mips_output_ascii (FILE *stream, const c
     {
       register int c = string[i];
 
-      switch (c)
+      if (ISPRINT (c))
 	{
-	case '\"':
-	case '\\':
-	  putc ('\\', stream);
-	  putc (c, stream);
-	  cur_pos += 2;
-	  break;
-
-	case TARGET_NEWLINE:
-	  fputs ("\\n", stream);
-	  if (i+1 < len
-	      && (((c = string[i+1]) >= '\040' && c <= '~')
-		  || c == TARGET_TAB))
-	    cur_pos = 32767;		/* break right here */
-	  else
-	    cur_pos += 2;
-	  break;
-
-	case TARGET_TAB:
-	  fputs ("\\t", stream);
-	  cur_pos += 2;
-	  break;
-
-	case TARGET_FF:
-	  fputs ("\\f", stream);
-	  cur_pos += 2;
-	  break;
-
-	case TARGET_BS:
-	  fputs ("\\b", stream);
-	  cur_pos += 2;
-	  break;
-
-	case TARGET_CR:
-	  fputs ("\\r", stream);
-	  cur_pos += 2;
-	  break;
-
-	default:
-	  if (c >= ' ' && c < 0177)
+	  if (c == '\\' || c == '\"')
 	    {
-	      putc (c, stream);
+	      putc ('\\', stream);
 	      cur_pos++;
 	    }
-	  else
-	    {
-	      fprintf (stream, "\\%03o", c);
-	      cur_pos += 4;
-	    }
+	  putc (c, stream);
+	  cur_pos++;
+	}
+      else
+	{
+	  fprintf (stream, "\\%03o", c);
+	  cur_pos += 4;
 	}
 
       if (cur_pos > 72 && i+1 < len)
===================================================================
Index: gcc/cp/cp-objcp-common.h
--- gcc/cp/cp-objcp-common.h	2 Nov 2004 20:29:21 -0000	1.5
+++ gcc/cp/cp-objcp-common.h	15 Feb 2005 02:07:16 -0000
@@ -159,6 +159,8 @@ extern tree objcp_tsubst_copy_and_build 
 #define LANG_HOOKS_TYPE_PROMOTES_TO cxx_type_promotes_to
 #undef LANG_HOOKS_REGISTER_BUILTIN_TYPE
 #define LANG_HOOKS_REGISTER_BUILTIN_TYPE c_register_builtin_type
+#undef LANG_HOOKS_TO_TARGET_CHARSET
+#define LANG_HOOKS_TO_TARGET_CHARSET c_common_to_target_charset
 #undef LANG_HOOKS_GIMPLIFY_EXPR
 #define LANG_HOOKS_GIMPLIFY_EXPR cp_gimplify_expr
 
===================================================================
Index: gcc/doc/tm.texi
--- gcc/doc/tm.texi	30 Jan 2005 15:36:09 -0000	1.411
+++ gcc/doc/tm.texi	15 Feb 2005 02:07:24 -0000
@@ -31,7 +31,6 @@ through the macros defined in the @file{
 * Per-Function Data::   Defining data structures for per-function information.
 * Storage Layout::      Defining sizes and alignments of data.
 * Type Layout::         Defining sizes and properties of basic user data types.
-* Escape Sequences::    Defining the value of target character escape sequences
 * Registers::           Naming and describing the hardware registers.
 * Register Classes::    Defining the classes of hardware registers.
 * Stack and Calling::   Defining which way the stack grows and by how much.
@@ -1816,42 +1815,6 @@ specified by @code{TARGET_VTABLE_ENTRY_A
 of words in each data entry.
 @end defmac
 
-@node Escape Sequences
-@section Target Character Escape Sequences
-@cindex escape sequences
-
-By default, GCC assumes that the C character escape sequences and other
-characters take on their ASCII values for the target.  If this is not
-correct, you must explicitly define all of the macros below.  All of
-them must evaluate to constants; they are used in @code{case}
-statements.
-
-@findex TARGET_BELL
-@findex TARGET_BS
-@findex TARGET_CR
-@findex TARGET_DIGIT0
-@findex TARGET_ESC
-@findex TARGET_FF
-@findex TARGET_NEWLINE
-@findex TARGET_TAB
-@findex TARGET_VT
-@multitable {@code{TARGET_NEWLINE}} {Escape} {ASCII character}
-@item Macro                 @tab Escape             @tab ASCII character
-@item @code{TARGET_BELL}    @tab @kbd{\a}           @tab @code{07}, @code{BEL}
-@item @code{TARGET_BS}      @tab @kbd{\b}           @tab @code{08}, @code{BS}
-@item @code{TARGET_CR}      @tab @kbd{\r}           @tab @code{0D}, @code{CR}
-@item @code{TARGET_DIGIT0}  @tab @kbd{0}            @tab @code{30}, @code{ZERO}
-@item @code{TARGET_ESC}     @tab @kbd{\e}, @kbd{\E} @tab @code{1B}, @code{ESC}
-@item @code{TARGET_FF}      @tab @kbd{\f}           @tab @code{0C}, @code{FF}
-@item @code{TARGET_NEWLINE} @tab @kbd{\n}           @tab @code{0A}, @code{LF}
-@item @code{TARGET_TAB}     @tab @kbd{\t}           @tab @code{09}, @code{HT}
-@item @code{TARGET_VT}      @tab @kbd{\v}           @tab @code{0B}, @code{VT}
-@end multitable
-
-@noindent
-Note that the @kbd{\e} and @kbd{\E} escapes are GNU extensions, not
-part of the C standard.
-
 @node Registers
 @section Register Usage
 @cindex register usage
===================================================================
Index: libcpp/charset.c
--- libcpp/charset.c	18 Sep 2004 00:56:19 -0000	1.3
+++ libcpp/charset.c	15 Feb 2005 02:07:51 -0000
@@ -81,8 +81,10 @@ Foundation, 59 Temple Place - Suite 330,
 
 #if HOST_CHARSET == HOST_CHARSET_ASCII
 #define SOURCE_CHARSET "UTF-8"
+#define LAST_POSSIBLY_BASIC_SOURCE_CHAR 0x7e
 #elif HOST_CHARSET == HOST_CHARSET_EBCDIC
 #define SOURCE_CHARSET "UTF-EBCDIC"
+#define LAST_POSSIBLY_BASIC_SOURCE_CHAR 0xFF
 #else
 #error "Unrecognized basic host character set"
 #endif
@@ -714,6 +716,63 @@ _cpp_destroy_iconv (cpp_reader *pfile)
     }
 }
 
+/* Utility routine for use by a full compiler.  C is a character taken
+   from the *basic* source character set, encoded in the host's
+   execution encoding.  Convert it to (the target's) execution
+   encoding, and return that value.
+
+   Issues oan internal error if C's representation in the narrow
+   execution character set fails to be a single-byte value (C99
+   5.2.1p3: "The representation of each member of the source and
+   execution character sets shall fit in a byte.")  May also issue an
+   internal error if C fails to be a member of the basic source
+   character set (testing this exactly is too hard, especially when
+   the host character set is EBCDIC).  */
+cppchar_t
+cpp_host_to_exec_charset (cpp_reader *pfile, cppchar_t c)
+{
+  uchar sbuf[1];
+  struct _cpp_strbuf tbuf;
+
+  /* This test is merely an approximation, but it suffices to catch
+     the most important thing, which is that we don't get handed a
+     character outside the unibyte range of the host character set.  */
+  if (c > LAST_POSSIBLY_BASIC_SOURCE_CHAR)
+    {
+      cpp_error (pfile, CPP_DL_ICE,
+		 "character 0x%lx is not in the basic source character set\n",
+		 (unsigned long)c);
+      return 0;
+    }
+
+  /* Being a character in the unibyte range of the host character set,
+     we can safely splat it into a one-byte buffer and trust that that
+     is a well-formed string.  */
+  sbuf[0] = c;
+
+  /* This should never need to reallocate, but just in case... */
+  tbuf.asize = 1;
+  tbuf.text = xmalloc (tbuf.asize);
+  tbuf.len = 0;
+
+  if (!APPLY_CONVERSION (pfile->narrow_cset_desc, sbuf, 1, &tbuf))
+    {
+      cpp_errno (pfile, CPP_DL_ICE, "converting to execution character set");
+      return 0;
+    }
+  if (tbuf.len != 1)
+    {
+      cpp_error (pfile, CPP_DL_ICE,
+		 "character 0x%lx is not unibyte in execution character set",
+		 (unsigned long)c);
+      return 0;
+    }
+  c = tbuf.text[0];
+  free(tbuf.text);
+  return c;
+}
+
+
 
 /* Utility routine that computes a mask of the form 0000...111... with
    WIDTH 1-bits.  */
@@ -727,8 +786,6 @@ width_to_mask (size_t width)
     return ((size_t) 1 << width) - 1;
 }
 
-
-
 /* Returns 1 if C is valid in an identifier, 2 if C is valid except at
    the start of an identifier, and 0 if C is not valid in an
    identifier.  We assume C has already gone through the checks of
===================================================================
Index: libcpp/include/cpplib.h
--- libcpp/include/cpplib.h	11 Jan 2005 18:24:12 -0000	1.8
+++ libcpp/include/cpplib.h	15 Feb 2005 02:08:02 -0000
@@ -659,6 +659,9 @@ extern bool cpp_interpret_string_notrans
 					      const cpp_string *, size_t,
 					      cpp_string *, bool);
 
+/* Convert a host character constant to the execution character set.  */
+extern cppchar_t cpp_host_to_exec_charset (cpp_reader *, cppchar_t);
+
 /* Used to register macros and assertions, perhaps from the command line.
    The text is the same as the command line argument.  */
 extern void cpp_define (cpp_reader *, const char *);
@@ -743,12 +746,6 @@ cpp_num cpp_num_sign_extend (cpp_num, si
 #define CPP_DL_WARNING_P(l)	(CPP_DL_EXTRACT (l) >= CPP_DL_WARNING \
 				 && CPP_DL_EXTRACT (l) <= CPP_DL_PEDWARN)
 
-/* N.B. The error-message-printer prototypes have not been nicely
-   formatted because exgettext needs to see 'msgid' on the same line
-   as the name of the function in order to work properly.  Only the
-   string argument gets a name in an effort to keep the lines from
-   getting ridiculously oversized.  */
-
 /* Output a diagnostic of some kind.  */
 extern void cpp_error (cpp_reader *, int, const char *msgid, ...)
   ATTRIBUTE_PRINTF_3;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]