This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Revision 3: utf-16 and utf-32 support in C and C++


Oracle has a full copyright assignment in place with the FSF.

Please refer to the following message in the archives for the original
posting of this patch:

	http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00827.html

and the previous revisions in:

	http://gcc.gnu.org/ml/gcc-patches/2008-03/msg01474.html
	http://gcc.gnu.org/ml/gcc-patches/2008-03/msg02025.html

This 3rd revised patch addresses more feedback provided on this list.  This
patch is not incremental - it replaces the previous posting.  The changelog
entries mentioned in this message also replace the original entries.  The
description that follows describes the changes to the previous patch.

This patch does not contain documentation yet (in the extensions section),
because feedback may still causes changes to be made.  I'll be working on
the documentation in the next few days.  That way the code and the doc will
be finalised together.

- #if 0 constructs that were left in the code accidentally have been removed.
- Mangling no longer depends on C++0x mode.  The compiler will simply not
  use those types unless C++0x is enabled, so there is no need to make this
  copnditional.
- The types are created regardless of the compiler mode, yet they are not
  registered as builtin types unless C++0x mode is enabled.  Disabling the
  recognition of the char16_t/char32_t keywords in non-C++0x mode was not
  sufficient to ensure the compiler would not use these types.
- The parsing of [uU]["']...["'] literals is now controlled with a new flag
  in lang_flags and cpp_options, and the appropriate flag is set for modes
  where these literals are legal.

ChangeLog entries:
------------------
libcpp/ChangeLog:
2008-04-14  Kris Van Hees <kris.van.hees@oracle.com>

	* include/cpp-id-data.h (UC): Was U, conflicts with U"..." literal.
	* include/cpplib.h (CHAR16, CHAR32, STRING16, STRING32): New tokens.
	(struct cpp_options): Added uliterals.
	(cpp_interpret_string): Update prototype.
	(cpp_interpret_string_notranslate): Idem.
	* charset.c (init_iconv_desc): New width member in cset_converter.
	(cpp_init_iconv): Add support for char{16,32}_cset_desc.
	(convert_ucn): Idem.
	(emit_numeric_escape): Idem.
	(convert_hex): Idem.
	(convert_oct): Idem.
	(convert_escape): Idem.
	(converter_for_type): New function.
	(cpp_interpret_string): Use converter_for_type, support u and U prefix.
	(cpp_interpret_string_notranslate): Match changed prototype.
	(wide_str_to_charconst): Use converter_for_type.
	(cpp_interpret_charconst): Add support for CPP_CHAR{16,32}.
	* directives.c (linemarker_dir): Macro U changed to UC.
	(parse_include): Idem.
	(register_pragma_1): Idem.
	(restore_registered_pragmas): Idem.
	(get__Pragma_string): Support CPP_STRING{16,32}.
	* expr.c (eval_token): Support CPP_CHAR{16,32}.
	* init.c (struct lang_flags): Added uliterals.
	(lang_defaults): Idem.
	* internal.h (struct cset_converter) <width>: New field.
	(struct cpp_reader) <char16_cset_desc>: Idem.
	(struct cpp_reader) <char32_cset_desc>: Idem.
	* lex.c (digraph_spellings): Macro U changed to UC.
	(OP, TK): Idem.
	(lex_string): Add support for u'...', U'...', u"..." and U"...".
	(_cpp_lex_direct): Idem.
	* macro.c (_cpp_builtin_macro_text): Macro U changed to UC.
	(stringify_arg): Support CPP_CHAR{16,32} and CPP_STRING{16,32}.

gcc/ChangeLog:
2008-04-14  Kris Van Hees <kris.van.hees@oracle.com>
	  
	* c-common.c (CHAR16_TYPE, CHAR32_TYPE): New macros.
	(fname_as_string): Match updated cpp_interpret_string prototype.
	(fix_string_type): Support char16_t* and char32_t*.
	(c_common_nodes_and_builtins): Add char16_t and char32_t (and
	derivative) nodes.  Register as builtin if C++0x.
	(c_parse_error): Support CPP_CHAR{16,32}.
	* c-common.h (RID_CHAR16, RID_CHAR32): New elements. 
	(enum c_tree_index) <CTI_CHAR16_TYPE, CTI_SIGNED_CHAR16_TYPE,
	CTI_UNSIGNED_CHAR16_TYPE, CTI_CHAR32_TYPE, CTI_SIGNED_CHAR32_TYPE,
	CTI_UNSIGNED_CHAR32_TYPE, CTI_CHAR16_ARRAY_TYPE,
	CTI_CHAR32_ARRAY_TYPE>: New elements.
	(char16_type_node, signed_char16_type_node, unsigned_char16_type_node,
	char32_type_node, signed_char32_type_node, char16_array_type_node,
	char32_array_type_node): New defines.
	* c-lex.c (cb_ident): Match updated cpp_interpret_string prototype.
	(c_lex_with_flags): Support CPP_CHAR{16,32} and CPP_STRING{16,32}.
	(lex_string): Support CPP_STRING{16,32}, match updated
	cpp_interpret_string and cpp_interpret_string_notranslate prototypes.
	(lex_charconst): Support CPP_CHAR{16,32}.
	* c-parser.c (c_parser_postfix_expression): Support CPP_CHAR{16,32}
	and CPP_STRING{16,32}.

gcc/cp/ChangeLog:
2008-04-14  Kris Van Hees <kris.van.hees@oracle.com>

	* cvt.c (type_promotes_to): Support char16_t and char32_t.
	* decl.c (grokdeclarator): Disallow signed/unsigned/short/long on
	char16_t and char32_t.
	* lex.c (reswords): Add char16_t and char32_t (for c++0x).
	* mangle.c (write_builtin_type): Mangle char16_t/char32_t as vendor
	extended builtin type "u8char{16,32}_t".
	* parser.c (cp_lexer_next_token_is_decl_specifier_keyword): Support
	RID_CHAR{16,32}.
	(cp_lexer_print_token): Support CPP_STRING{16,32}.
	(cp_parser_is_string_literal): Idem.
	(cp_parser_string_literal): Idem.
	(cp_parser_primary_expression): Support CPP_CHAR{16,32} and
	CPP_STRING{16,32}.
	(cp_parser_simple_type_specifier): Support RID_CHAR{16,32}. 
	* tree.c (char_type_p): Support char16_t and char32_t as char types.
	* typeck.c (string_conv_p): Support char16_t and char32_t.

gcc/testsuite/ChangeLog:
2008-04-14  Kris Van Hees <kris.van.hees@oracle.com>

	Tests for char16_t and char32_t support.
	* g++.dg/ext/utf-cvt.C: New
	* g++.dg/ext/utf-cxx0x.C: New
	* g++.dg/ext/utf-cxx98.C: New
	* g++.dg/ext/utf-dflt.C: New
	* g++.dg/ext/utf-gnuxx0x.C: New
	* g++.dg/ext/utf-gnuxx98.C: New
	* g++.dg/ext/utf-mangle.C: New
	* g++.dg/ext/utf-typedef-cxx0x.C: New
	* g++.dg/ext/utf-typedef-cxx98.C: New
	* g++.dg/ext/utf-typespec.C: New
	* g++.dg/ext/utf16-1.C: New
	* g++.dg/ext/utf16-2.C: New
	* g++.dg/ext/utf16-3.C: New
	* g++.dg/ext/utf16-4.C: New
	* g++.dg/ext/utf32-1.C: New
	* g++.dg/ext/utf32-2.C: New
	* g++.dg/ext/utf32-3.C: New
	* g++.dg/ext/utf32-4.C: New
	* gcc.dg/utf-cvt.c: New
	* gcc.dg/utf-dflt.c: New
	* gcc.dg/utf16-1.c: New
	* gcc.dg/utf16-2.c: New
	* gcc.dg/utf16-3.c: New
	* gcc.dg/utf16-4.c: New
	* gcc.dg/utf32-1.c: New
	* gcc.dg/utf32-2.c: New
	* gcc.dg/utf32-3.c: New
	* gcc.dg/utf32-4.c: New

libiberty/ChangeLog:
2008-04-14  Kris Van Hees <kris.van.hees@oracle.com>

	* testsuite/demangle-expected: Added tests for char16_t and char32_t.

Bootstrapping and testing:
--------------------------
The source tree was built on the following platforms (target == host):

	i686-linux
	x86_64-linux
	ppc64-linux

Builds were done for both the unpatched tree and the patched tree, and
testsuite (make -k check) summary results were verified to be identical,
except for the added tests in the patched tree.  This was done to ensure
that the patch does not introduce regressions.

Index: gcc/c-lex.c
===================================================================
--- gcc/c-lex.c	(revision 134262)
+++ gcc/c-lex.c	(working copy)
@@ -174,7 +174,7 @@ cb_ident (cpp_reader * ARG_UNUSED (pfile
     {
       /* Convert escapes in the string.  */
       cpp_string cstr = { 0, 0 };
-      if (cpp_interpret_string (pfile, str, 1, &cstr, false))
+      if (cpp_interpret_string (pfile, str, 1, &cstr, CPP_STRING))
 	{
 	  ASM_OUTPUT_IDENT (asm_out_file, (const char *) cstr.text);
 	  free (CONST_CAST (unsigned char *, cstr.text));
@@ -361,6 +361,8 @@ c_lex_with_flags (tree *value, location_
 
 	    case CPP_STRING:
 	    case CPP_WSTRING:
+	    case CPP_STRING16:
+	    case CPP_STRING32:
 	      type = lex_string (tok, value, true, true);
 	      break;
 
@@ -410,11 +412,15 @@ c_lex_with_flags (tree *value, location_
 
     case CPP_CHAR:
     case CPP_WCHAR:
+    case CPP_CHAR16:
+    case CPP_CHAR32:
       *value = lex_charconst (tok);
       break;
 
     case CPP_STRING:
     case CPP_WSTRING:
+    case CPP_STRING16:
+    case CPP_STRING32:
       if ((lex_flags & C_LEX_RAW_STRINGS) == 0)
 	{
 	  type = lex_string (tok, value, false,
@@ -822,12 +828,12 @@ interpret_fixed (const cpp_token *token,
   return value;
 }
 
-/* Convert a series of STRING and/or WSTRING tokens into a tree,
-   performing string constant concatenation.  TOK is the first of
-   these.  VALP is the location to write the string into.  OBJC_STRING
-   indicates whether an '@' token preceded the incoming token.
+/* Convert a series of STRING, WSTRING, STRING16 and/or STRING32 tokens
+   into a tree, performing string constant concatenation.  TOK is the
+   first of these.  VALP is the location to write the string into.
+   OBJC_STRING indicates whether an '@' token preceded the incoming token.
    Returns the CPP token type of the result (CPP_STRING, CPP_WSTRING,
-   or CPP_OBJC_STRING).
+   CPP_STRING32, CPP_STRING16, or CPP_OBJC_STRING).
 
    This is unfortunately more work than it should be.  If any of the
    strings in the series has an L prefix, the result is a wide string
@@ -842,19 +848,16 @@ static enum cpp_ttype
 lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
 {
   tree value;
-  bool wide = false;
   size_t concats = 0;
   struct obstack str_ob;
   cpp_string istr;
+  enum cpp_ttype type = tok->type;
 
   /* Try to avoid the overhead of creating and destroying an obstack
      for the common case of just one string.  */
   cpp_string str = tok->val.str;
   cpp_string *strs = &str;
 
-  if (tok->type == CPP_WSTRING)
-    wide = true;
-
  retry:
   tok = cpp_get_token (parse_in);
   switch (tok->type)
@@ -873,8 +876,15 @@ lex_string (const cpp_token *tok, tree *
       break;
 
     case CPP_WSTRING:
-      wide = true;
-      /* FALLTHROUGH */
+    case CPP_STRING16:
+    case CPP_STRING32:
+      if (type != tok->type)
+	{
+	  if (type == CPP_STRING)
+	    type = tok->type;
+	  else
+	    error ("unsupported non-standard concatenation of string literals");
+	}
 
     case CPP_STRING:
       if (!concats)
@@ -899,7 +909,7 @@ lex_string (const cpp_token *tok, tree *
 
   if ((translate
        ? cpp_interpret_string : cpp_interpret_string_notranslate)
-      (parse_in, strs, concats + 1, &istr, wide))
+      (parse_in, strs, concats + 1, &istr, type))
     {
       value = build_string (istr.len, (const char *) istr.text);
       free (CONST_CAST (unsigned char *, istr.text));
@@ -909,22 +919,52 @@ lex_string (const cpp_token *tok, tree *
       /* Callers cannot generally handle error_mark_node in this context,
 	 so return the empty string instead.  cpp_interpret_string has
 	 issued an error.  */
-      if (wide)
-	value = build_string (TYPE_PRECISION (wchar_type_node)
-			      / TYPE_PRECISION (char_type_node),
-			      "\0\0\0");  /* widest supported wchar_t
-					     is 32 bits */
-      else
-	value = build_string (1, "");
+      switch (type)
+	{
+	default:
+	case CPP_STRING:
+	  value = build_string (1, "");
+	  break;
+	case CPP_STRING16:
+	  value = build_string (TYPE_PRECISION (char16_type_node)
+				/ TYPE_PRECISION (char_type_node),
+				"\0");  /* char16_t is 16 bits */
+	  break;
+	case CPP_STRING32:
+	  value = build_string (TYPE_PRECISION (char32_type_node)
+				/ TYPE_PRECISION (char_type_node),
+				"\0\0\0");  /* char32_t is 32 bits */
+	  break;
+	case CPP_WSTRING:
+	  value = build_string (TYPE_PRECISION (wchar_type_node)
+				/ TYPE_PRECISION (char_type_node),
+				"\0\0\0");  /* widest supported wchar_t
+					       is 32 bits */
+	  break;
+        }
     }
 
-  TREE_TYPE (value) = wide ? wchar_array_type_node : char_array_type_node;
+  switch (type)
+    {
+    default:
+    case CPP_STRING:
+      TREE_TYPE (value) = char_array_type_node;
+      break;
+    case CPP_STRING16:
+      TREE_TYPE (value) = char16_array_type_node;
+      break;
+    case CPP_STRING32:
+      TREE_TYPE (value) = char32_array_type_node;
+      break;
+    case CPP_WSTRING:
+      TREE_TYPE (value) = wchar_array_type_node;
+    }
   *valp = fix_string_type (value);
 
   if (concats)
     obstack_free (&str_ob, 0);
 
-  return objc_string ? CPP_OBJC_STRING : wide ? CPP_WSTRING : CPP_STRING;
+  return objc_string ? CPP_OBJC_STRING : type;
 }
 
 /* Converts a (possibly wide) character constant token into a tree.  */
@@ -941,6 +981,10 @@ lex_charconst (const cpp_token *token)
 
   if (token->type == CPP_WCHAR)
     type = wchar_type_node;
+  else if (token->type == CPP_CHAR32)
+    type = char32_type_node;
+  else if (token->type == CPP_CHAR16)
+    type = char16_type_node;
   /* In C, a character constant has type 'int'.
      In C++ 'char', but multi-char charconsts have type 'int'.  */
   else if (!c_dialect_cxx () || chars_seen > 1)
Index: gcc/testsuite/gcc.dg/utf32-2.c
===================================================================
--- gcc/testsuite/gcc.dg/utf32-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf32-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char32_t* string constants. */
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -Wall -Werror" } */
+
+typedef unsigned int char32_t;
+
+extern void abort (void);
+
+char32_t	*s0 = U"ab";
+char32_t	*s1 = U"a\u0024";
+char32_t	*s2 = U"a\u2029";
+char32_t	*s3 = U"a\U00064321";
+
+#define A	0x00000061
+#define B	0x00000062
+#define D	0x00000024
+#define X	0x00002029
+#define Y	0x00064321
+
+int main ()
+{
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x00000000)
+	abort ();
+    if (s1[0] != A || s1[1] != D || s0[2] != 0x00000000)
+	abort ();
+    if (s2[0] != A || s2[1] != X || s0[2] != 0x00000000)
+	abort ();
+    if (s3[0] != A || s3[1] != Y || s3[2] != 0x00000000)
+	abort ();
+}
Index: gcc/testsuite/gcc.dg/utf32-4.c
===================================================================
--- gcc/testsuite/gcc.dg/utf32-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf32-4.c	(revision 0)
@@ -0,0 +1,20 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char32_t character constants. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+typedef unsigned int char32_t;
+
+char32_t	c0 = U'';		/* { dg-error "empty character" } */
+char32_t	c1 = U'ab';		/* { dg-warning "constant too long" } */
+char32_t	c2 = U'\U00064321';
+
+char32_t	c3 = 'a';
+char32_t	c4 = u'a';
+char32_t	c5 = u'\u2029';
+char32_t	c6 = u'\U00064321';	/* { dg-warning "constant too long" } */
+char32_t	c7 = L'a';
+char32_t	c8 = L'\u2029';
+char32_t	c9 = L'\U00064321';
+
+int main () {}
Index: gcc/testsuite/gcc.dg/utf16-2.c
===================================================================
--- gcc/testsuite/gcc.dg/utf16-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf16-2.c	(revision 0)
@@ -0,0 +1,32 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char16_t* string literals. */
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -Wall -Werror" } */
+
+typedef short unsigned int char16_t;
+
+extern void abort (void);
+
+char16_t	*s0 = u"ab";
+char16_t	*s1 = u"a\u0024";
+char16_t	*s2 = u"a\u2029";
+char16_t	*s3 = u"a\U00064321";
+
+#define A	0x0061
+#define B	0x0062
+#define D	0x0024
+#define X	0x2029
+#define Y1	0xD950
+#define Y2	0xDF21
+
+int main ()
+{
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x0000)
+	abort ();
+    if (s1[0] != A || s1[1] != D || s0[2] != 0x0000)
+	abort ();
+    if (s2[0] != A || s2[1] != X || s0[2] != 0x0000)
+	abort ();
+    if (s3[0] != A || s3[1] != Y1 || s3[2] != Y2 || s3[3] != 0x0000)
+	abort ();
+}
Index: gcc/testsuite/gcc.dg/utf16-4.c
===================================================================
--- gcc/testsuite/gcc.dg/utf16-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf16-4.c	(revision 0)
@@ -0,0 +1,20 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char16_t character constants. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+typedef short unsigned int char16_t;
+
+char16_t	c0 = u'';		/* { dg-error "empty character" } */
+char16_t	c1 = u'ab';		/* { dg-warning "constant too long" } */
+char16_t	c2 = u'\U00064321';	/* { dg-warning "constant too long" } */
+
+char16_t	c3 = 'a';
+char16_t	c4 = U'a';
+char16_t	c5 = U'\u2029';
+char16_t	c6 = U'\U00064321';	/* { dg-warning "implicitly truncated" } */
+char16_t	c7 = L'a';
+char16_t	c8 = L'\u2029';
+char16_t	c9 = L'\U00064321';	/* { dg-warning "implicitly truncated" } */
+
+int main () {}
Index: gcc/testsuite/gcc.dg/utf-badconcat.c
===================================================================
--- gcc/testsuite/gcc.dg/utf-badconcat.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf-badconcat.c	(revision 0)
@@ -0,0 +1,22 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test unsupported concatenation of char16_t/char32_t* string literals. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+void	*s0	= u"a"  "b";
+void	*s1	=  "a" u"b";
+void	*s2	= u"a" U"b";	/* { dg-error "non-standard concatenation" } */
+void	*s3	= U"a" u"b";	/* { dg-error "non-standard concatenation" } */
+void	*s4	= u"a" L"b";	/* { dg-error "non-standard concatenation" } */
+void	*s5	= L"a" u"b";	/* { dg-error "non-standard concatenation" } */
+void	*s6	= u"a" u"b";
+void	*s7	= U"a"  "b";
+void	*s8	=  "a" U"b";
+void	*s9	= U"a" L"b";	/* { dg-error "non-standard concatenation" } */
+void	*sa	= L"a" U"b";	/* { dg-error "non-standard concatenation" } */
+void	*sb	= U"a" U"b";
+void	*sc	= L"a"  "b";
+void	*sd	=  "a" L"b";
+void	*se	= L"a" L"b";
+
+int main () {}
Index: gcc/testsuite/gcc.dg/utf32-1.c
===================================================================
--- gcc/testsuite/gcc.dg/utf32-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf32-1.c	(revision 0)
@@ -0,0 +1,44 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char32_t character constants. */
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -Wall -Werror" } */
+
+typedef unsigned int char32_t;
+
+extern void abort (void);
+
+char32_t	c0 = U'a';
+char32_t	c1 = U'\0';
+char32_t	c2 = U'\u0024';
+char32_t	c3 = U'\u2029';
+char32_t	c4 = U'\U00064321';
+
+#define A	0x00000061
+#define D	0x00000024
+#define X	0x00002029
+#define Y	0x00064321
+
+int main ()
+{
+    if (sizeof (U'a') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\0') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\u0024') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\u2029') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\U00064321') != sizeof (char32_t))
+	abort ();
+
+    if (c0 != A)
+	abort ();
+    if (c1 != 0x0000)
+	abort ();
+    if (c2 != D)
+	abort ();
+    if (c3 != X)
+	abort ();
+    if (c4 != Y)
+	abort ();
+}
Index: gcc/testsuite/gcc.dg/utf32-3.c
===================================================================
--- gcc/testsuite/gcc.dg/utf32-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf32-3.c	(revision 0)
@@ -0,0 +1,48 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test concatenation of char32_t* string literals. */
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -Wall -Werror" } */
+
+typedef unsigned int char32_t;
+
+extern void abort (void);
+
+char32_t	*s0 = U"a" U"b";
+
+char32_t	*s1 = U"a" "b";
+char32_t	*s2 = "a" U"b";
+char32_t	*s3 = U"a" "\u2029";
+char32_t	*s4 = "\u2029" U"b";
+char32_t	*s5 = U"a" "\U00064321";
+char32_t	*s6 = "\U00064321" U"b";
+
+#define A	0x00000061
+#define B	0x00000062
+#define X	0x00002029
+#define Y	0x00064321
+
+int main ()
+{
+    if (sizeof ((U"a" U"b")[0]) != sizeof (char32_t))
+	abort ();
+    if (sizeof ((U"a"  "b")[0]) != sizeof (char32_t))
+	abort ();
+    if (sizeof (( "a" U"b")[0]) != sizeof (char32_t))
+	abort ();
+
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x00000000)
+	abort ();
+
+    if (s1[0] != A || s1[1] != B || s1[2] != 0x00000000)
+	abort ();
+    if (s2[0] != A || s2[1] != B || s2[2] != 0x00000000)
+	abort ();
+    if (s3[0] != A || s3[1] != X || s3[2] != 0x00000000)
+	abort ();
+    if (s4[0] != X || s4[1] != B || s4[2] != 0x00000000)
+	abort ();
+    if (s5[0] != A || s5[1] != Y || s5[2] != 0x00000000)
+	abort ();
+    if (s6[0] != Y || s6[1] != B || s6[2] != 0x00000000)
+	abort ();
+}
Index: gcc/testsuite/gcc.dg/utf16-1.c
===================================================================
--- gcc/testsuite/gcc.dg/utf16-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf16-1.c	(revision 0)
@@ -0,0 +1,67 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char16_t character constants. */
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -Wall -Werror" } */
+
+typedef short unsigned int char16_t;
+
+extern void abort (void);
+
+char16_t	c0 = u'a';
+char16_t	c1 = u'\0';
+char16_t	c2 = u'\u0024';
+char16_t	c3 = u'\u2029';
+char16_t	c4 = u'\u8010';
+
+char16_t	c5 = 'a';
+char16_t	c6 = U'a';
+char16_t	c7 = U'\u2029';
+char16_t	c8 = U'\u8010';
+char16_t	c9 = L'a';
+char16_t	ca = L'\u2029';
+char16_t	cb = L'\u8010';
+
+#define A	0x0061
+#define D	0x0024
+#define X	0x2029
+#define Y	0x8010
+
+int main ()
+{
+    if (sizeof (u'a') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\0') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\u0024') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\u2029') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\u8010') != sizeof (char16_t))
+	abort ();
+
+    if (c0 != A)
+	abort ();
+    if (c1 != 0x0000)
+	abort ();
+    if (c2 != D)
+	abort ();
+    if (c3 != X)
+	abort ();
+    if (c4 != Y)
+	abort ();
+
+    if (c5 != A)
+	abort ();
+    if (c6 != A)
+	abort ();
+    if (c7 != X)
+	abort ();
+    if (c8 != Y)
+	abort ();
+    if (c9 != A)
+	abort ();
+    if (ca != X)
+	abort ();
+    if (cb != Y)
+	abort ();
+}
Index: gcc/testsuite/gcc.dg/utf16-3.c
===================================================================
--- gcc/testsuite/gcc.dg/utf16-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf16-3.c	(revision 0)
@@ -0,0 +1,49 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test concatenation of char16_t* string literals. */
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -Wall -Werror" } */
+
+typedef short unsigned int char16_t;
+
+extern void abort (void);
+
+char16_t	*s0 = u"a" u"b";
+
+char16_t	*s1 = u"a" "b";
+char16_t	*s2 = "a" u"b";
+char16_t	*s3 = u"a" "\u2029";
+char16_t	*s4 = "\u2029" u"b";
+char16_t	*s5 = u"a" "\U00064321";
+char16_t	*s6 = "\U00064321" u"b";
+
+#define A	0x0061
+#define B	0x0062
+#define X	0x2029
+#define Y1	0xD950
+#define Y2	0xDF21
+
+int main ()
+{
+    if (sizeof ((u"a" u"b")[0]) != sizeof (char16_t))
+	abort ();
+    if (sizeof ((u"a"  "b")[0]) != sizeof (char16_t))
+	abort ();
+    if (sizeof (( "a" u"b")[0]) != sizeof (char16_t))
+	abort ();
+
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x0000)
+	abort ();
+
+    if (s1[0] != A || s1[1] != B || s1[2] != 0x0000)
+	abort ();
+    if (s2[0] != A || s2[1] != B || s2[2] != 0x0000)
+	abort ();
+    if (s3[0] != A || s3[1] != X || s3[2] != 0x0000)
+	abort ();
+    if (s4[0] != X || s4[1] != B || s4[2] != 0x0000)
+	abort ();
+    if (s5[0] != A || s5[1] != Y1 || s5[2] != Y2 || s5[3] != 0x0000)
+	abort ();
+    if (s6[0] != Y1 || s6[1] != Y2 || s6[2] != B || s6[3] != 0x0000)
+	abort ();
+}
Index: gcc/testsuite/gcc.dg/utf-cvt.c
===================================================================
--- gcc/testsuite/gcc.dg/utf-cvt.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf-cvt.c	(revision 0)
@@ -0,0 +1,49 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the char16_t and char32_t promotion rules. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99 -Wall -Wconversion -Wsign-conversion" } */
+
+typedef unsigned short	char16_t;
+typedef unsigned int	char32_t;
+
+extern void f_c (char);
+extern void fsc (signed char);
+extern void fuc (unsigned char);
+extern void f_s (short);
+extern void fss (signed short);
+extern void fus (unsigned short);
+extern void f_i (int);
+extern void fsi (signed int);
+extern void fui (unsigned int);
+extern void f_l (long);
+extern void fsl (signed long);
+extern void ful (unsigned long);
+
+void m (char16_t c0, char32_t c1)
+{
+    f_c (c0);				/* { dg-warning "alter its value" } */
+    fsc (c0);				/* { dg-warning "alter its value" } */
+    fuc (c0);				/* { dg-warning "alter its value" } */
+    f_s (c0);				/* { dg-warning "change the sign" } */
+    fss (c0);				/* { dg-warning "change the sign" } */
+    fus (c0);
+    f_i (c0);
+    fsi (c0);
+    fui (c0);
+    f_l (c0);
+    fsl (c0);
+    ful (c0);
+
+    f_c (c1);				/* { dg-warning "alter its value" } */
+    fsc (c1);				/* { dg-warning "alter its value" } */
+    fuc (c1);				/* { dg-warning "alter its value" } */
+    f_s (c1);				/* { dg-warning "alter its value" } */
+    fss (c1);				/* { dg-warning "alter its value" } */
+    fus (c1);				/* { dg-warning "alter its value" } */
+    f_i (c1);				/* { dg-warning "change the sign" } */
+    fsi (c1);				/* { dg-warning "change the sign" } */
+    fui (c1);
+    f_l (c1);				/* { dg-warning "change the sign" } */
+    fsl (c1);				/* { dg-warning "change the sign" } */
+    ful (c1);
+}
Index: gcc/testsuite/gcc.dg/utf-dflt.c
===================================================================
--- gcc/testsuite/gcc.dg/utf-dflt.c	(revision 0)
+++ gcc/testsuite/gcc.dg/utf-dflt.c	(revision 0)
@@ -0,0 +1,25 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* If not gnu99, the u and U prefixes should be parsed as separate tokens. */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+const unsigned short	c0	= u'a';		/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 6 } */
+const unsigned long	c1	= U'a';		/* { dg-error "undeclared" } */
+		/* { dg-error "expected ',' or ';'" "" { target *-*-* } 8 } */
+
+#define u	1 +
+#define U	2 +
+
+const unsigned short	c2	= u'a';
+const unsigned long	c3	= U'a';
+
+#undef u
+#undef U
+#define u	"a"
+#define U	"b"
+
+const void		*s0	= u"a";
+const void		*s1	= U"a";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf16-1.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf16-1.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf16-1.C	(revision 0)
@@ -0,0 +1,65 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char16_t character constants. */
+/* { dg-do run } */
+/* { dg-options "-std=c++0x -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+const static char16_t	c0 = u'a';
+const static char16_t	c1 = u'\0';
+const static char16_t	c2 = u'\u0024';
+const static char16_t	c3 = u'\u2029';
+const static char16_t	c4 = u'\u8010';
+
+const static char16_t	c5 = 'a';
+const static char16_t	c6 = U'a';
+const static char16_t	c7 = U'\u2029';
+const static char16_t	c8 = U'\u8010';
+const static char16_t	c9 = L'a';
+const static char16_t	ca = L'\u2029';
+const static char16_t	cb = L'\u8010';
+
+#define A	0x0061
+#define D	0x0024
+#define X	0x2029
+#define Y	0x8010
+
+int main ()
+{
+    if (sizeof (u'a') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\0') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\u0024') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\u2029') != sizeof (char16_t))
+	abort ();
+    if (sizeof (u'\u8010') != sizeof (char16_t))
+	abort ();
+
+    if (c0 != A)
+	abort ();
+    if (c1 != 0x0000)
+	abort ();
+    if (c2 != D)
+	abort ();
+    if (c3 != X)
+	abort ();
+    if (c4 != Y)
+	abort ();
+
+    if (c5 != A)
+	abort ();
+    if (c6 != A)
+	abort ();
+    if (c7 != X)
+	abort ();
+    if (c8 != Y)
+	abort ();
+    if (c9 != A)
+	abort ();
+    if (ca != X)
+	abort ();
+    if (cb != Y)
+	abort ();
+}
Index: gcc/testsuite/g++.dg/ext/utf32-4.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf32-4.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf32-4.C	(revision 0)
@@ -0,0 +1,18 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char32_t character constants. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x" } */
+
+const static char32_t	c0 = U'';		/* { dg-error "empty character" } */
+const static char32_t	c1 = U'ab';		/* { dg-warning "constant too long" } */
+const static char32_t	c2 = U'\U00064321';
+
+const static char32_t	c3 = 'a';
+const static char32_t	c4 = u'a';
+const static char32_t	c5 = u'\u2029';
+const static char32_t	c6 = u'\U00064321';	/* { dg-warning "constant too long" } */
+const static char32_t	c7 = L'a';
+const static char32_t	c8 = L'\u2029';
+const static char32_t	c9 = L'\U00064321';
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf-cxx98.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-cxx98.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-cxx98.C	(revision 0)
@@ -0,0 +1,29 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char16_t/char32_t in c++98. */
+/* Ensure u and U prefixes are parsed as separate tokens in c++98. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++98" } */
+
+const static char16_t	c0	= 'a';	/* { dg-error "not name a type" } */
+const static char32_t	c1	= 'a';	/* { dg-error "not name a type" } */
+
+const unsigned short	c2	= u'a';	/* { dg-error "not declared" } */
+	/* { dg-error "expected ',' or ';'" "" { target *-*-* } 10 } */
+const unsigned long	c3	= U'a';	/* { dg-error "not declared" } */
+	/* { dg-error "expected ',' or ';'" "" { target *-*-* } 12 } */
+
+#define u	1 +
+#define U	2 +
+
+const unsigned short	c5	= u'a';
+const unsigned long	c6	= U'a';
+
+#undef u
+#undef U
+#define u	"a"
+#define U	"b"
+
+const void		*s0	= u"a";
+const void		*s1	= U"a";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf16-2.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf16-2.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf16-2.C	(revision 0)
@@ -0,0 +1,30 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char16_t* string literals. */
+/* { dg-do run } */
+/* { dg-options "-std=c++0x -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+const static char16_t	*s0 = u"ab";
+const static char16_t	*s1 = u"a\u0024";
+const static char16_t	*s2 = u"a\u2029";
+const static char16_t	*s3 = u"a\U00064321";
+
+#define A	0x0061
+#define B	0x0062
+#define D	0x0024
+#define X	0x2029
+#define Y1	0xD950
+#define Y2	0xDF21
+
+int main ()
+{
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x0000)
+	abort ();
+    if (s1[0] != A || s1[1] != D || s0[2] != 0x0000)
+	abort ();
+    if (s2[0] != A || s2[1] != X || s0[2] != 0x0000)
+	abort ();
+    if (s3[0] != A || s3[1] != Y1 || s3[2] != Y2 || s3[3] != 0x0000)
+	abort ();
+}
Index: gcc/testsuite/g++.dg/ext/utf16-3.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf16-3.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf16-3.C	(revision 0)
@@ -0,0 +1,47 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test concatenation of char16_t* string literals. */
+/* { dg-do run } */
+/* { dg-options "-std=c++0x -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+const static char16_t	*s0 = u"a" u"b";
+
+const static char16_t	*s1 = u"a" "b";
+const static char16_t	*s2 = "a" u"b";
+const static char16_t	*s3 = u"a" "\u2029";
+const static char16_t	*s4 = "\u2029" u"b";
+const static char16_t	*s5 = u"a" "\U00064321";
+const static char16_t	*s6 = "\U00064321" u"b";
+
+#define A	0x0061
+#define B	0x0062
+#define X	0x2029
+#define Y1	0xD950
+#define Y2	0xDF21
+
+int main ()
+{
+    if (sizeof ((u"a" u"b")[0]) != sizeof (char16_t))
+	abort ();
+    if (sizeof ((u"a"  "b")[0]) != sizeof (char16_t))
+	abort ();
+    if (sizeof (( "a" u"b")[0]) != sizeof (char16_t))
+	abort ();
+
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x0000)
+	abort ();
+
+    if (s1[0] != A || s1[1] != B || s1[2] != 0x0000)
+	abort ();
+    if (s2[0] != A || s2[1] != B || s2[2] != 0x0000)
+	abort ();
+    if (s3[0] != A || s3[1] != X || s3[2] != 0x0000)
+	abort ();
+    if (s4[0] != X || s4[1] != B || s4[2] != 0x0000)
+	abort ();
+    if (s5[0] != A || s5[1] != Y1 || s5[2] != Y2 || s5[3] != 0x0000)
+	abort ();
+    if (s6[0] != Y1 || s6[1] != Y2 || s6[2] != B || s6[3] != 0x0000)
+	abort ();
+}
Index: gcc/testsuite/g++.dg/ext/utf-badconcat.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-badconcat.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-badconcat.C	(revision 0)
@@ -0,0 +1,22 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test unsupported concatenation of char16_t/char32_t* string literals. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x" } */
+
+const void *s0	= u"a"  "b";
+const void *s1	=  "a" u"b";
+const void *s2	= u"a" U"b";	/* { dg-error "non-standard concatenation" } */
+const void *s3	= U"a" u"b";	/* { dg-error "non-standard concatenation" } */
+const void *s4	= u"a" L"b";	/* { dg-error "non-standard concatenation" } */
+const void *s5	= L"a" u"b";	/* { dg-error "non-standard concatenation" } */
+const void *s6	= u"a" u"b";
+const void *s7	= U"a"  "b";
+const void *s8	=  "a" U"b";
+const void *s9	= U"a" L"b";	/* { dg-error "non-standard concatenation" } */
+const void *sa	= L"a" U"b";	/* { dg-error "non-standard concatenation" } */
+const void *sb	= U"a" U"b";
+const void *sc	= L"a"  "b";
+const void *sd	=  "a" L"b";
+const void *se	= L"a" L"b";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf-typedef-cxx0x.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-typedef-cxx0x.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-typedef-cxx0x.C	(revision 0)
@@ -0,0 +1,7 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Ensure that a typedef to char16_t/char32_t issues a warning in c++0x. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x" } */
+
+typedef short unsigned int	char16_t; /* { dg-warning "redeclaration" } */
+typedef unsigned int		char32_t; /* { dg-warning "redeclaration" } */
Index: gcc/testsuite/g++.dg/ext/utf16-4.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf16-4.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf16-4.C	(revision 0)
@@ -0,0 +1,18 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char16_t character constants. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x" } */
+
+const static char16_t	c0 = u'';		/* { dg-error "empty character" } */
+const static char16_t	c1 = u'ab';		/* { dg-warning "constant too long" } */
+const static char16_t	c2 = u'\U00064321';	/* { dg-warning "constant too long" } */
+
+const static char16_t	c3 = 'a';
+const static char16_t	c4 = U'a';
+const static char16_t	c5 = U'\u2029';
+const static char16_t	c6 = U'\U00064321';	/* { dg-warning "implicitly truncated" } */
+const static char16_t	c7 = L'a';
+const static char16_t	c8 = L'\u2029';
+const static char16_t	c9 = L'\U00064321';	/* { dg-warning "implicitly truncated" } */
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf-gnuxx98.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-gnuxx98.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-gnuxx98.C	(revision 0)
@@ -0,0 +1,29 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char16_t/char32_t in gnu++98. */
+/* Ensure u and U prefixes are parsed as separate tokens in gnu++98. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu++98" } */
+
+const static char16_t	c0	= 'a';	/* { dg-error "not name a type" } */
+const static char32_t	c1	= 'a';	/* { dg-error "not name a type" } */
+
+const unsigned short	c2	= u'a';	/* { dg-error "not declared" } */
+	/* { dg-error "expected ',' or ';'" "" { target *-*-* } 10 } */
+const unsigned long	c3	= U'a';	/* { dg-error "not declared" } */
+	/* { dg-error "expected ',' or ';'" "" { target *-*-* } 12 } */
+
+#define u	1 +
+#define U	2 +
+
+const unsigned short	c5	= u'a';
+const unsigned long	c6	= U'a';
+
+#undef u
+#undef U
+#define u	"a"
+#define U	"b"
+
+const void		*s0	= u"a";
+const void		*s1	= U"a";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf-cvt.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-cvt.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-cvt.C	(revision 0)
@@ -0,0 +1,46 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the char16_t and char32_t promotion rules. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x -Wall -Wconversion -Wsign-conversion -Wsign-promo" } */
+
+extern void f_c (char);
+extern void fsc (signed char);
+extern void fuc (unsigned char);
+extern void f_s (short);
+extern void fss (signed short);
+extern void fus (unsigned short);
+extern void f_i (int);
+extern void fsi (signed int);
+extern void fui (unsigned int);
+extern void f_l (long);
+extern void fsl (signed long);
+extern void ful (unsigned long);
+
+void m(char16_t c0, char32_t c1)
+{
+    f_c (c0);			/* { dg-warning "alter its value" } */
+    fsc (c0);			/* { dg-warning "alter its value" } */
+    fuc (c0);			/* { dg-warning "alter its value" } */
+    f_s (c0);			/* { dg-warning "change the sign" } */
+    fss (c0);			/* { dg-warning "change the sign" } */
+    fus (c0);
+    f_i (c0);
+    fsi (c0);
+    fui (c0);
+    f_l (c0);
+    fsl (c0);
+    ful (c0);
+
+    f_c (c1);			/* { dg-warning "alter its value" } */
+    fsc (c1);			/* { dg-warning "alter its value" } */
+    fuc (c1);			/* { dg-warning "alter its value" } */
+    f_s (c1);			/* { dg-warning "alter its value" } */
+    fss (c1);			/* { dg-warning "alter its value" } */
+    fus (c1);			/* { dg-warning "alter its value" } */
+    f_i (c1);			/* { dg-warning "change the sign" } */
+    fsi (c1);			/* { dg-warning "change the sign" } */
+    fui (c1);
+    f_l (c1);			/* { dg-warning "change the sign" } */
+    fsl (c1);			/* { dg-warning "change the sign" } */
+    ful (c1);
+}
Index: gcc/testsuite/g++.dg/ext/utf-dflt.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-dflt.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-dflt.C	(revision 0)
@@ -0,0 +1,29 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Expected errors for char16_t/char32_t in default std. */
+/* Ensure u and U prefixes are parsed as separate tokens in default std. */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+const static char16_t	c0	= 'a';	/* { dg-error "not name a type" } */
+const static char32_t	c1	= 'a';	/* { dg-error "not name a type" } */
+
+const unsigned short	c2	= u'a';	/* { dg-error "not declared" } */
+	/* { dg-error "expected ',' or ';'" "" { target *-*-* } 10 } */
+const unsigned long	c3	= U'a';	/* { dg-error "not declared" } */
+	/* { dg-error "expected ',' or ';'" "" { target *-*-* } 12 } */
+
+#define u	1 +
+#define U	2 +
+
+const unsigned short	c4	= u'a';
+const unsigned long	c5	= U'a';
+
+#undef u
+#undef U
+#define u	"a"
+#define U	"b"
+
+const void		*s0	= u"a";
+const void		*s1	= U"a";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf-cxx0x.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-cxx0x.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-cxx0x.C	(revision 0)
@@ -0,0 +1,14 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test parsing of u and U prefixes when also used as macros. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x" } */
+
+#define u	L
+#define U	L
+
+const unsigned short	c2	= u'a';
+const unsigned long	c3	= U'a';
+const void		*s0	= u"a";
+const void		*s1	= U"a";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf32-1.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf32-1.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf32-1.C	(revision 0)
@@ -0,0 +1,42 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char32_t character constants. */
+/* { dg-do run } */
+/* { dg-options "-std=c++0x -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+const static char32_t	c0 = U'a';
+const static char32_t	c1 = U'\0';
+const static char32_t	c2 = U'\u0024';
+const static char32_t	c3 = U'\u2029';
+const static char32_t	c4 = U'\U00064321';
+
+#define A	0x00000061
+#define D	0x00000024
+#define X	0x00002029
+#define Y	0x00064321
+
+int main ()
+{
+    if (sizeof (U'a') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\0') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\u0024') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\u2029') != sizeof (char32_t))
+	abort ();
+    if (sizeof (U'\U00064321') != sizeof (char32_t))
+	abort ();
+
+    if (c0 != A)
+	abort ();
+    if (c1 != 0x0000)
+	abort ();
+    if (c2 != D)
+	abort ();
+    if (c3 != X)
+	abort ();
+    if (c4 != Y)
+	abort ();
+}
Index: gcc/testsuite/g++.dg/ext/utf-typespec.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-typespec.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-typespec.C	(revision 0)
@@ -0,0 +1,25 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Ensure that type specifiers are not allowed for char16_t/char32_t. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++0x" } */
+
+signed char16_t		c0;		/* { dg-error "signed" } */
+signed char32_t		c1;		/* { dg-error "signed" } */
+unsigned char16_t	c2;		/* { dg-error "unsigned" } */
+unsigned char32_t	c3;		/* { dg-error "unsigned" } */
+
+short char16_t		c4;		/* { dg-error "short" } */
+long char16_t		c5;		/* { dg-error "long" } */
+short char32_t		c6;		/* { dg-error "short" } */
+long char32_t		c7;		/* { dg-error "long" } */
+
+signed short char16_t	c8;		/* { dg-error "signed" } */
+signed short char32_t	c9;		/* { dg-error "signed" } */
+signed long char16_t	ca;		/* { dg-error "signed" } */
+signed long char32_t	cb;		/* { dg-error "signed" } */
+unsigned short char16_t	cc;		/* { dg-error "unsigned" } */
+unsigned short char32_t	cd;		/* { dg-error "unsigned" } */
+unsigned long char16_t	ce;		/* { dg-error "unsigned" } */
+unsigned long char32_t	cf;		/* { dg-error "unsigned" } */
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf32-2.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf32-2.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf32-2.C	(revision 0)
@@ -0,0 +1,29 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the support for char32_t* string constants. */
+/* { dg-do run } */
+/* { dg-options "-std=c++0x -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+const static char32_t	*s0 = U"ab";
+const static char32_t	*s1 = U"a\u0024";
+const static char32_t	*s2 = U"a\u2029";
+const static char32_t	*s3 = U"a\U00064321";
+
+#define A	0x00000061
+#define B	0x00000062
+#define D	0x00000024
+#define X	0x00002029
+#define Y	0x00064321
+
+int main ()
+{
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x00000000)
+	abort ();
+    if (s1[0] != A || s1[1] != D || s0[2] != 0x00000000)
+	abort ();
+    if (s2[0] != A || s2[1] != X || s0[2] != 0x00000000)
+	abort ();
+    if (s3[0] != A || s3[1] != Y || s3[2] != 0x00000000)
+	abort ();
+}
Index: gcc/testsuite/g++.dg/ext/utf-gnuxx0x.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-gnuxx0x.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-gnuxx0x.C	(revision 0)
@@ -0,0 +1,14 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test parsing of u and U prefixes when also used as macros. */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu++0x" } */
+
+#define u	L
+#define U	L
+
+const unsigned short	c2	= u'a';
+const unsigned long	c3	= U'a';
+const void		*s0	= u"a";
+const void		*s1	= U"a";
+
+int main () {}
Index: gcc/testsuite/g++.dg/ext/utf32-3.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf32-3.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf32-3.C	(revision 0)
@@ -0,0 +1,46 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test concatenation of char32_t* string literals. */
+/* { dg-do run } */
+/* { dg-options "-std=c++0x -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+const static char32_t	*s0 = U"a" U"b";
+
+const static char32_t	*s1 = U"a" "b";
+const static char32_t	*s2 = "a" U"b";
+const static char32_t	*s3 = U"a" "\u2029";
+const static char32_t	*s4 = "\u2029" U"b";
+const static char32_t	*s5 = U"a" "\U00064321";
+const static char32_t	*s6 = "\U00064321" U"b";
+
+#define A	0x00000061
+#define B	0x00000062
+#define X	0x00002029
+#define Y	0x00064321
+
+int main ()
+{
+    if (sizeof ((U"a" U"b")[0]) != sizeof (char32_t))
+	abort ();
+    if (sizeof ((U"a"  "b")[0]) != sizeof (char32_t))
+	abort ();
+    if (sizeof (( "a" U"b")[0]) != sizeof (char32_t))
+	abort ();
+
+    if (s0[0] != A || s0[1] != B || s0[2] != 0x00000000)
+	abort ();
+
+    if (s1[0] != A || s1[1] != B || s1[2] != 0x00000000)
+	abort ();
+    if (s2[0] != A || s2[1] != B || s2[2] != 0x00000000)
+	abort ();
+    if (s3[0] != A || s3[1] != X || s3[2] != 0x00000000)
+	abort ();
+    if (s4[0] != X || s4[1] != B || s4[2] != 0x00000000)
+	abort ();
+    if (s5[0] != A || s5[1] != Y || s5[2] != 0x00000000)
+	abort ();
+    if (s6[0] != Y || s6[1] != B || s6[2] != 0x00000000)
+	abort ();
+}
Index: gcc/testsuite/g++.dg/ext/utf-typedef-cxx98.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-typedef-cxx98.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-typedef-cxx98.C	(revision 0)
@@ -0,0 +1,7 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Ensure that a typedef to char16_t/char32_t is fine in c++98. */
+/* { dg-do compile } */
+/* { dg-options "-std=c++98" } */
+
+typedef short unsigned int	char16_t;
+typedef unsigned int		char32_t;
Index: gcc/testsuite/g++.dg/ext/utf-mangle.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-mangle.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/utf-mangle.C	(revision 0)
@@ -0,0 +1,14 @@
+// Contributed by Kris Van Hees <kris.van.hees@oracle.com>
+// Test the support for char16_t character constants.
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+void f0 (char16_t c) {}
+void f1 (char32_t c) {}
+void f2 (char16_t *s) {}
+void f3 (char32_t *s) {}
+
+// { dg-final { scan-assembler "_Z2f0u8char16_t:" } }
+// { dg-final { scan-assembler "_Z2f1u8char32_t:" } }
+// { dg-final { scan-assembler "_Z2f2Pu8char16_t:" } }
+// { dg-final { scan-assembler "_Z2f3Pu8char32_t:" } }
Index: gcc/cp/typeck.c
===================================================================
--- gcc/cp/typeck.c	(revision 134262)
+++ gcc/cp/typeck.c	(working copy)
@@ -1722,12 +1722,14 @@ string_conv_p (const_tree totype, const_
 
   t = TREE_TYPE (totype);
   if (!same_type_p (t, char_type_node)
+      && !same_type_p (t, char16_type_node)
+      && !same_type_p (t, char32_type_node)
       && !same_type_p (t, wchar_type_node))
     return 0;
 
   if (TREE_CODE (exp) == STRING_CST)
     {
-      /* Make sure that we don't try to convert between char and wchar_t.  */
+      /* Make sure that we don't try to convert between char and wide chars.  */
       if (!same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (exp))), t))
 	return 0;
     }
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 134262)
+++ gcc/cp/decl.c	(working copy)
@@ -7729,6 +7729,13 @@ grokdeclarator (const cp_declarator *dec
 	error ("%<long%> or %<short%> specified with char for %qs", name);
       else if (long_p && short_p)
 	error ("%<long%> and %<short%> specified together for %qs", name);
+      else if (type == char16_type_node || type == char32_type_node)
+	{
+	  if (signed_p || unsigned_p)
+	    error ("%<signed%> or %<unsigned%> invalid for %qs", name);
+	  else if (short_p || long_p)
+	    error ("%<short%> or %<long%> invalid for %qs", name);
+	}
       else
 	{
 	  ok = 1;
Index: gcc/cp/cvt.c
===================================================================
--- gcc/cp/cvt.c	(revision 134262)
+++ gcc/cp/cvt.c	(working copy)
@@ -1219,6 +1219,8 @@ type_promotes_to (tree type)
   /* Normally convert enums to int, but convert wide enums to something
      wider.  */
   else if (TREE_CODE (type) == ENUMERAL_TYPE
+	   || type == char16_type_node
+	   || type == char32_type_node
 	   || type == wchar_type_node)
     {
       int precision = MAX (TYPE_PRECISION (type),
Index: gcc/cp/tree.c
===================================================================
--- gcc/cp/tree.c	(revision 134262)
+++ gcc/cp/tree.c	(working copy)
@@ -2474,6 +2474,8 @@ char_type_p (tree type)
   return (same_type_p (type, char_type_node)
 	  || same_type_p (type, unsigned_char_type_node)
 	  || same_type_p (type, signed_char_type_node)
+	  || same_type_p (type, char16_type_node)
+	  || same_type_p (type, char32_type_node)
 	  || same_type_p (type, wchar_type_node));
 }
 
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 134262)
+++ gcc/cp/mangle.c	(working copy)
@@ -1782,10 +1782,14 @@ write_builtin_type (tree type)
       break;
 
     case INTEGER_TYPE:
-      /* TYPE may still be wchar_t, since that isn't in
-	 integer_type_nodes.  */
+      /* TYPE may still be wchar_t, char16_t, or char32_t, since that
+	 isn't in integer_type_nodes.  */
       if (type == wchar_type_node)
 	write_char ('w');
+      else if (type == char16_type_node)
+	write_string ("u8char16_t");
+      else if (type == char32_type_node)
+	write_string ("u8char32_t");
       else if (TYPE_FOR_JAVA (type))
 	write_java_integer_type_codes (type);
       else
Index: gcc/cp/lex.c
===================================================================
--- gcc/cp/lex.c	(revision 134262)
+++ gcc/cp/lex.c	(working copy)
@@ -241,6 +241,8 @@ static const struct resword reswords[] =
   { "case",		RID_CASE,	0 },
   { "catch",		RID_CATCH,	0 },
   { "char",		RID_CHAR,	0 },
+  { "char16_t",		RID_CHAR16,	D_CXX0X },
+  { "char32_t",		RID_CHAR32,	D_CXX0X },
   { "class",		RID_CLASS,	0 },
   { "const",		RID_CONST,	0 },
   { "const_cast",	RID_CONSTCAST,	0 },
Index: gcc/cp/parser.c
===================================================================
--- gcc/cp/parser.c	(revision 134262)
+++ gcc/cp/parser.c	(working copy)
@@ -556,6 +556,8 @@ cp_lexer_next_token_is_decl_specifier_ke
     case RID_TYPENAME:
       /* Simple type specifiers.  */
     case RID_CHAR:
+    case RID_CHAR16:
+    case RID_CHAR32:
     case RID_WCHAR:
     case RID_BOOL:
     case RID_SHORT:
@@ -789,6 +791,8 @@ cp_lexer_print_token (FILE * stream, cp_
       break;
 
     case CPP_STRING:
+    case CPP_STRING16:
+    case CPP_STRING32:
     case CPP_WSTRING:
       fprintf (stream, " \"%s\"", TREE_STRING_POINTER (token->u.value));
       break;
@@ -2033,7 +2037,10 @@ cp_parser_parsing_tentatively (cp_parser
 static bool
 cp_parser_is_string_literal (cp_token* token)
 {
-  return (token->type == CPP_STRING || token->type == CPP_WSTRING);
+  return (token->type == CPP_STRING ||
+	  token->type == CPP_STRING16 ||
+	  token->type == CPP_STRING32 ||
+	  token->type == CPP_WSTRING);
 }
 
 /* Returns nonzero if TOKEN is the indicated KEYWORD.  */
@@ -2867,11 +2874,11 @@ static tree
 cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok)
 {
   tree value;
-  bool wide = false;
   size_t count;
   struct obstack str_ob;
   cpp_string str, istr, *strs;
   cp_token *tok;
+  enum cpp_ttype type;
 
   tok = cp_lexer_peek_token (parser->lexer);
   if (!cp_parser_is_string_literal (tok))
@@ -2880,6 +2887,8 @@ cp_parser_string_literal (cp_parser *par
       return error_mark_node;
     }
 
+  type = tok->type;
+
   /* Try to avoid the overhead of creating and destroying an obstack
      for the common case of just one string.  */
   if (!cp_parser_is_string_literal
@@ -2890,8 +2899,6 @@ cp_parser_string_literal (cp_parser *par
       str.text = (const unsigned char *)TREE_STRING_POINTER (tok->u.value);
       str.len = TREE_STRING_LENGTH (tok->u.value);
       count = 1;
-      if (tok->type == CPP_WSTRING)
-	wide = true;
 
       strs = &str;
     }
@@ -2906,8 +2913,14 @@ cp_parser_string_literal (cp_parser *par
 	  count++;
 	  str.text = (const unsigned char *)TREE_STRING_POINTER (tok->u.value);
 	  str.len = TREE_STRING_LENGTH (tok->u.value);
-	  if (tok->type == CPP_WSTRING)
-	    wide = true;
+
+	  if (type != tok->type)
+	    {
+	      if (type == CPP_STRING)
+		type = tok->type;
+	      else if (tok->type != CPP_STRING)
+		error ("unsupported non-standard concatenation of string literals");
+	    }
 
 	  obstack_grow (&str_ob, &str, sizeof (cpp_string));
 
@@ -2918,19 +2931,35 @@ cp_parser_string_literal (cp_parser *par
       strs = (cpp_string *) obstack_finish (&str_ob);
     }
 
-  if (wide && !wide_ok)
+  if (type != CPP_STRING && !wide_ok)
     {
       cp_parser_error (parser, "a wide string is invalid in this context");
-      wide = false;
+      type = CPP_STRING;
     }
 
   if ((translate ? cpp_interpret_string : cpp_interpret_string_notranslate)
-      (parse_in, strs, count, &istr, wide))
+      (parse_in, strs, count, &istr, type))
     {
       value = build_string (istr.len, (const char *)istr.text);
       free (CONST_CAST (unsigned char *, istr.text));
 
-      TREE_TYPE (value) = wide ? wchar_array_type_node : char_array_type_node;
+      switch (type)
+	{
+	default:
+	case CPP_STRING:
+	  TREE_TYPE (value) = char_array_type_node;
+	  break;
+	case CPP_STRING16:
+	  TREE_TYPE (value) = char16_array_type_node;
+	  break;
+	case CPP_STRING32:
+	  TREE_TYPE (value) = char32_array_type_node;
+	  break;
+	case CPP_WSTRING:
+	  TREE_TYPE (value) = wchar_array_type_node;
+	  break;
+	}
+
       value = fix_string_type (value);
     }
   else
@@ -3085,6 +3114,8 @@ cp_parser_primary_expression (cp_parser 
 	   string-literal
 	   boolean-literal  */
     case CPP_CHAR:
+    case CPP_CHAR16:
+    case CPP_CHAR32:
     case CPP_WCHAR:
     case CPP_NUMBER:
       token = cp_lexer_consume_token (parser->lexer);
@@ -3136,6 +3167,8 @@ cp_parser_primary_expression (cp_parser 
       return token->u.value;
 
     case CPP_STRING:
+    case CPP_STRING16:
+    case CPP_STRING32:
     case CPP_WSTRING:
       /* ??? Should wide strings be allowed when parser->translate_strings_p
 	 is false (i.e. in attributes)?  If not, we can kill the third
@@ -10762,6 +10795,8 @@ cp_parser_type_specifier (cp_parser* par
    simple-type-specifier:
      auto
      decltype ( expression )   
+     char16_t
+     char32_t
 
    GNU Extension:
 
@@ -10791,6 +10826,12 @@ cp_parser_simple_type_specifier (cp_pars
 	decl_specs->explicit_char_p = true;
       type = char_type_node;
       break;
+    case RID_CHAR16:
+      type = char16_type_node;
+      break;
+    case RID_CHAR32:
+      type = char32_type_node;
+      break;
     case RID_WCHAR:
       type = wchar_type_node;
       break;
@@ -17754,13 +17795,16 @@ cp_parser_set_decl_spec_type (cp_decl_sp
 {
   decl_specs->any_specifiers_p = true;
 
-  /* If the user tries to redeclare bool or wchar_t (with, for
-     example, in "typedef int wchar_t;") we remember that this is what
-     happened.  In system headers, we ignore these declarations so
-     that G++ can work with system headers that are not C++-safe.  */
+  /* If the user tries to redeclare bool, char16_t, char32_t, or wchar_t
+     (with, for example, in "typedef int wchar_t;") we remember that
+     this is what happened.  In system headers, we ignore these
+     declarations so that G++ can work with system headers that are not
+     C++-safe.  */
   if (decl_specs->specs[(int) ds_typedef]
       && !user_defined_p
       && (type_spec == boolean_type_node
+	  || type_spec == char16_type_node
+	  || type_spec == char32_type_node
 	  || type_spec == wchar_type_node)
       && (decl_specs->type
 	  || decl_specs->specs[(int) ds_long]
Index: gcc/c-common.c
===================================================================
--- gcc/c-common.c	(revision 134262)
+++ gcc/c-common.c	(working copy)
@@ -66,6 +66,14 @@ cpp_reader *parse_in;		/* Declared in c-
 #define PID_TYPE "int"
 #endif
 
+#ifndef CHAR16_TYPE
+#define CHAR16_TYPE "short unsigned int"
+#endif
+
+#ifndef CHAR32_TYPE
+#define CHAR32_TYPE "unsigned int"
+#endif
+
 #ifndef WCHAR_TYPE
 #define WCHAR_TYPE "int"
 #endif
@@ -123,6 +131,9 @@ cpp_reader *parse_in;		/* Declared in c-
 	tree signed_wchar_type_node;
 	tree unsigned_wchar_type_node;
 
+	tree char16_type_node;
+	tree char32_type_node;
+
 	tree float_type_node;
 	tree double_type_node;
 	tree long_double_type_node;
@@ -174,6 +185,16 @@ cpp_reader *parse_in;		/* Declared in c-
 
 	tree wchar_array_type_node;
 
+   Type `char16_t[SOMENUMBER]' or something like it.
+   Used when a UTF-16 string literal is created.
+
+	tree char16_array_type_node;
+
+   Type `char32_t[SOMENUMBER]' or something like it.
+   Used when a UTF-32 string literal is created.
+
+	tree char32_array_type_node;
+
    Type `int ()' -- used for implicit declaration of functions.
 
 	tree default_function_type;
@@ -777,7 +798,7 @@ fname_as_string (int pretty_p)
   strname.text = (unsigned char *) namep;
   strname.len = len - 1;
 
-  if (cpp_interpret_string (parse_in, &strname, 1, &cstr, false))
+  if (cpp_interpret_string (parse_in, &strname, 1, &cstr, CPP_STRING))
     {
       XDELETEVEC (namep);
       return (const char *) cstr.text;
@@ -857,14 +878,31 @@ fname_decl (unsigned int rid, tree id)
 tree
 fix_string_type (tree value)
 {
-  const int wchar_bytes = TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT;
-  const int wide_flag = TREE_TYPE (value) == wchar_array_type_node;
   int length = TREE_STRING_LENGTH (value);
   int nchars;
   tree e_type, i_type, a_type;
 
   /* Compute the number of elements, for the array type.  */
-  nchars = wide_flag ? length / wchar_bytes : length;
+  if (TREE_TYPE (value) == char_array_type_node || !TREE_TYPE (value))
+    {
+      nchars = length;
+      e_type = char_type_node;
+    }
+  else if (TREE_TYPE (value) == char16_array_type_node)
+    {
+      nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
+      e_type = char16_type_node;
+    }
+  else if (TREE_TYPE (value) == char32_array_type_node)
+    {
+      nchars = length / (TYPE_PRECISION (char32_type_node) / BITS_PER_UNIT);
+      e_type = char32_type_node;
+    }
+  else
+    {
+      nchars = length / (TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT);
+      e_type = wchar_type_node;
+    }
 
   /* C89 2.2.4.1, C99 5.2.4.1 (Translation limits).  The analogous
      limit in C++98 Annex B is very large (65536) and is not normative,
@@ -899,7 +937,6 @@ fix_string_type (tree value)
      construct the matching unqualified array type first.  The C front
      end does not require this, but it does no harm, so we do it
      unconditionally.  */
-  e_type = wide_flag ? wchar_type_node : char_type_node;
   i_type = build_index_type (build_int_cst (NULL_TREE, nchars - 1));
   a_type = build_array_type (e_type, i_type);
   if (c_dialect_cxx() || warn_write_strings)
@@ -3629,6 +3666,8 @@ c_define_builtins (tree va_list_ref_type
 void
 c_common_nodes_and_builtins (void)
 {
+  int char16_type_size;
+  int char32_type_size;
   int wchar_type_size;
   tree array_domain_type;
   tree va_list_ref_type_node;
@@ -3878,6 +3917,38 @@ c_common_nodes_and_builtins (void)
   wchar_array_type_node
     = build_array_type (wchar_type_node, array_domain_type);
 
+  /* Define 'char16_t'.  */
+  char16_type_node = get_identifier (CHAR16_TYPE);
+  char16_type_node = TREE_TYPE (identifier_global_value (char16_type_node));
+  char16_type_size = TYPE_PRECISION (char16_type_node);
+  if (c_dialect_cxx ())
+    {
+      char16_type_node = make_unsigned_type (char16_type_size);
+
+      if (cxx_dialect == cxx0x)
+	record_builtin_type (RID_CHAR16, "char16_t", char16_type_node);
+    }
+
+  /* This is for UTF-16 string constants.  */
+  char16_array_type_node
+    = build_array_type (char16_type_node, array_domain_type);
+
+  /* Define 'char32_t'.  */
+  char32_type_node = get_identifier (CHAR32_TYPE);
+  char32_type_node = TREE_TYPE (identifier_global_value (char32_type_node));
+  char32_type_size = TYPE_PRECISION (char32_type_node);
+  if (c_dialect_cxx ())
+    {
+      char32_type_node = make_unsigned_type (char32_type_size);
+
+      if (cxx_dialect == cxx0x)
+	record_builtin_type (RID_CHAR32, "char32_t", char32_type_node);
+    }
+
+  /* This is for UTF-32 string constants.  */
+  char32_array_type_node
+    = build_array_type (char32_type_node, array_domain_type);
+
   wint_type_node =
     TREE_TYPE (identifier_global_value (get_identifier (WINT_TYPE)));
 
@@ -6661,20 +6732,39 @@ c_parse_error (const char *gmsgid, enum 
 
   if (token == CPP_EOF)
     message = catenate_messages (gmsgid, " at end of input");
-  else if (token == CPP_CHAR || token == CPP_WCHAR)
+  else if (token == CPP_CHAR || token == CPP_WCHAR || token == CPP_CHAR16
+	   || token == CPP_CHAR32)
     {
       unsigned int val = TREE_INT_CST_LOW (value);
-      const char *const ell = (token == CPP_CHAR) ? "" : "L";
+      const char *prefix;
+
+      switch (token)
+	{
+	default:
+	  prefix = "";
+	  break;
+	case CPP_WCHAR:
+	  prefix = "L";
+	  break;
+	case CPP_CHAR16:
+	  prefix = "u";
+	  break;
+	case CPP_CHAR32:
+	  prefix = "U";
+	  break;
+        }
+
       if (val <= UCHAR_MAX && ISGRAPH (val))
 	message = catenate_messages (gmsgid, " before %s'%c'");
       else
 	message = catenate_messages (gmsgid, " before %s'\\x%x'");
 
-      error (message, ell, val);
+      error (message, prefix, val);
       free (message);
       message = NULL;
     }
-  else if (token == CPP_STRING || token == CPP_WSTRING)
+  else if (token == CPP_STRING || token == CPP_WSTRING || token == CPP_STRING16
+	   || token == CPP_STRING32)
     message = catenate_messages (gmsgid, " before string constant");
   else if (token == CPP_NUMBER)
     message = catenate_messages (gmsgid, " before numeric constant");
Index: gcc/c-common.h
===================================================================
--- gcc/c-common.h	(revision 134262)
+++ gcc/c-common.h	(working copy)
@@ -85,7 +85,7 @@ enum rid
   RID_NEW,      RID_OFFSETOF, RID_OPERATOR,
   RID_THIS,     RID_THROW,    RID_TRUE,
   RID_TRY,      RID_TYPENAME, RID_TYPEID,
-  RID_USING,
+  RID_USING,    RID_CHAR16,   RID_CHAR32,
 
   /* casts */
   RID_CONSTCAST, RID_DYNCAST, RID_REINTCAST, RID_STATCAST,
@@ -143,6 +143,8 @@ extern GTY ((length ("(int) RID_MAX"))) 
 
 enum c_tree_index
 {
+    CTI_CHAR16_TYPE,
+    CTI_CHAR32_TYPE,
     CTI_WCHAR_TYPE,
     CTI_SIGNED_WCHAR_TYPE,
     CTI_UNSIGNED_WCHAR_TYPE,
@@ -155,6 +157,8 @@ enum c_tree_index
     CTI_WIDEST_UINT_LIT_TYPE,
 
     CTI_CHAR_ARRAY_TYPE,
+    CTI_CHAR16_ARRAY_TYPE,
+    CTI_CHAR32_ARRAY_TYPE,
     CTI_WCHAR_ARRAY_TYPE,
     CTI_INT_ARRAY_TYPE,
     CTI_STRING_TYPE,
@@ -190,6 +194,8 @@ struct c_common_identifier GTY(())
   struct cpp_hashnode node;
 };
 
+#define char16_type_node		c_global_trees[CTI_CHAR16_TYPE]
+#define char32_type_node		c_global_trees[CTI_CHAR32_TYPE]
 #define wchar_type_node			c_global_trees[CTI_WCHAR_TYPE]
 #define signed_wchar_type_node		c_global_trees[CTI_SIGNED_WCHAR_TYPE]
 #define unsigned_wchar_type_node	c_global_trees[CTI_UNSIGNED_WCHAR_TYPE]
@@ -206,6 +212,8 @@ struct c_common_identifier GTY(())
 #define truthvalue_false_node		c_global_trees[CTI_TRUTHVALUE_FALSE]
 
 #define char_array_type_node		c_global_trees[CTI_CHAR_ARRAY_TYPE]
+#define char16_array_type_node		c_global_trees[CTI_CHAR16_ARRAY_TYPE]
+#define char32_array_type_node		c_global_trees[CTI_CHAR32_ARRAY_TYPE]
 #define wchar_array_type_node		c_global_trees[CTI_WCHAR_ARRAY_TYPE]
 #define int_array_type_node		c_global_trees[CTI_INT_ARRAY_TYPE]
 #define string_type_node		c_global_trees[CTI_STRING_TYPE]
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 134262)
+++ gcc/c-parser.c	(working copy)
@@ -5163,12 +5163,16 @@ c_parser_postfix_expression (c_parser *p
     {
     case CPP_NUMBER:
     case CPP_CHAR:
+    case CPP_CHAR16:
+    case CPP_CHAR32:
     case CPP_WCHAR:
       expr.value = c_parser_peek_token (parser)->value;
       expr.original_code = ERROR_MARK;
       c_parser_consume_token (parser);
       break;
     case CPP_STRING:
+    case CPP_STRING16:
+    case CPP_STRING32:
     case CPP_WSTRING:
       expr.value = c_parser_peek_token (parser)->value;
       expr.original_code = STRING_CST;
Index: libiberty/testsuite/demangle-expected
===================================================================
--- libiberty/testsuite/demangle-expected	(revision 134262)
+++ libiberty/testsuite/demangle-expected	(working copy)
@@ -3399,6 +3399,26 @@ foo(char)
 foo
 #
 --format=gnu-v3 --no-params
+_Z2f0u8char16_t
+f0(char16_t)
+f0
+#
+--format=gnu-v3 --no-params
+_Z2f0Pu8char16_t
+f0(char16_t*)
+f0
+#
+--format=gnu-v3 --no-params
+_Z2f0u8char32_t
+f0(char32_t)
+f0
+#
+--format=gnu-v3 --no-params
+_Z2f0Pu8char32_t
+f0(char32_t*)
+f0
+#
+--format=gnu-v3 --no-params
 2CBIL_Z3foocEE
 CB<foo(char)>
 CB<foo(char)>
Index: libcpp/macro.c
===================================================================
--- libcpp/macro.c	(revision 134262)
+++ libcpp/macro.c	(working copy)
@@ -158,7 +158,7 @@ _cpp_builtin_macro_text (cpp_reader *pfi
 		  {
 		    cpp_errno (pfile, CPP_DL_WARNING,
 			"could not determine file timestamp");
-		    pbuffer->timestamp = U"\"??? ??? ?? ??:??:?? ????\"";
+		    pbuffer->timestamp = UC"\"??? ??? ?? ??:??:?? ????\"";
 		  }
 	      }
 	  }
@@ -256,8 +256,8 @@ _cpp_builtin_macro_text (cpp_reader *pfi
 	      cpp_errno (pfile, CPP_DL_WARNING,
 			 "could not determine date and time");
 		
-	      pfile->date = U"\"??? ?? ????\"";
-	      pfile->time = U"\"??:??:??\"";
+	      pfile->date = UC"\"??? ?? ????\"";
+	      pfile->time = UC"\"??:??:??\"";
 	    }
 	}
 
@@ -375,8 +375,10 @@ stringify_arg (cpp_reader *pfile, macro_
 	  continue;
 	}
 
-      escape_it = (token->type == CPP_STRING || token->type == CPP_WSTRING
-		   || token->type == CPP_CHAR || token->type == CPP_WCHAR);
+      escape_it = (token->type == CPP_STRING || token->type == CPP_CHAR
+		   || token->type == CPP_WSTRING || token->type == CPP_STRING
+		   || token->type == CPP_STRING32 || token->type == CPP_CHAR32
+		   || token->type == CPP_STRING16 || token->type == CPP_CHAR16);
 
       /* Room for each char being written in octal, initial space and
 	 final quote and NUL.  */
Index: libcpp/directives.c
===================================================================
--- libcpp/directives.c	(revision 134262)
+++ libcpp/directives.c	(working copy)
@@ -188,7 +188,7 @@ DIRECTIVE_TABLE
    did use this notation in its preprocessed output.  */
 static const directive linemarker_dir =
 {
-  do_linemarker, U"#", 1, KANDR, IN_I
+  do_linemarker, UC"#", 1, KANDR, IN_I
 };
 
 #define SEEN_EOL() (pfile->cur_token[-1].type == CPP_EOF)
@@ -697,7 +697,7 @@ parse_include (cpp_reader *pfile, int *p
       const unsigned char *dir;
 
       if (pfile->directive == &dtable[T_PRAGMA])
-	dir = U"pragma dependency";
+	dir = UC"pragma dependency";
       else
 	dir = pfile->directive->name;
       cpp_error (pfile, CPP_DL_ERROR, "#%s expects \"FILENAME\" or <FILENAME>",
@@ -1085,7 +1085,7 @@ register_pragma_1 (cpp_reader *pfile, co
 
   if (space)
     {
-      node = cpp_lookup (pfile, U space, strlen (space));
+      node = cpp_lookup (pfile, UC space, strlen (space));
       entry = lookup_pragma_entry (*chain, node);
       if (!entry)
 	{
@@ -1114,7 +1114,7 @@ register_pragma_1 (cpp_reader *pfile, co
     }
 
   /* Check for duplicates.  */
-  node = cpp_lookup (pfile, U name, strlen (name));
+  node = cpp_lookup (pfile, UC name, strlen (name));
   entry = lookup_pragma_entry (*chain, node);
   if (entry == NULL)
     {
@@ -1262,7 +1262,7 @@ restore_registered_pragmas (cpp_reader *
     {
       if (pe->is_nspace)
 	sd = restore_registered_pragmas (pfile, pe->u.space, sd);
-      pe->pragma = cpp_lookup (pfile, U *sd, strlen (*sd));
+      pe->pragma = cpp_lookup (pfile, UC *sd, strlen (*sd));
       free (*sd);
       sd++;
     }
@@ -1491,7 +1491,8 @@ get__Pragma_string (cpp_reader *pfile)
   string = get_token_no_padding (pfile);
   if (string->type == CPP_EOF)
     _cpp_backup_tokens (pfile, 1);
-  if (string->type != CPP_STRING && string->type != CPP_WSTRING)
+  if (string->type != CPP_STRING && string->type != CPP_WSTRING
+      && string->type != CPP_STRING32 && string->type != CPP_STRING16)
     return NULL;
 
   paren = get_token_no_padding (pfile);
Index: libcpp/include/cpplib.h
===================================================================
--- libcpp/include/cpplib.h	(revision 134262)
+++ libcpp/include/cpplib.h	(working copy)
@@ -123,10 +123,14 @@ struct _cpp_file;
 									\
   TK(CHAR,		LITERAL) /* 'char' */				\
   TK(WCHAR,		LITERAL) /* L'char' */				\
+  TK(CHAR16,		LITERAL) /* u'char' */				\
+  TK(CHAR32,		LITERAL) /* U'char' */				\
   TK(OTHER,		LITERAL) /* stray punctuation */		\
 									\
   TK(STRING,		LITERAL) /* "string" */				\
   TK(WSTRING,		LITERAL) /* L"string" */			\
+  TK(STRING16,		LITERAL) /* u"string" */			\
+  TK(STRING32,		LITERAL) /* U"string" */			\
   TK(OBJC_STRING,	LITERAL) /* @"string" - Objective-C */		\
   TK(HEADER_NAME,	LITERAL) /* <stdio.h> in #include */		\
 									\
@@ -291,6 +295,9 @@ struct cpp_options
   /* Nonzero means to allow hexadecimal floats and LL suffixes.  */
   unsigned char extended_numbers;
 
+  /* Nonzero means process u/U prefix literals (UTF-16/32).  */
+  unsigned char uliterals;
+
   /* Nonzero means print names of header files (-H).  */
   unsigned char print_include_names;
 
@@ -712,10 +719,10 @@ extern cppchar_t cpp_interpret_charconst
 /* Evaluate a vector of CPP_STRING or CPP_WSTRING tokens.  */
 extern bool cpp_interpret_string (cpp_reader *,
 				  const cpp_string *, size_t,
-				  cpp_string *, bool);
+				  cpp_string *, enum cpp_ttype);
 extern bool cpp_interpret_string_notranslate (cpp_reader *,
 					      const cpp_string *, size_t,
-					      cpp_string *, bool);
+					      cpp_string *, enum cpp_ttype);
 
 /* Convert a host character constant to the execution character set.  */
 extern cppchar_t cpp_host_to_exec_charset (cpp_reader *, cppchar_t);
Index: libcpp/include/cpp-id-data.h
===================================================================
--- libcpp/include/cpp-id-data.h	(revision 134262)
+++ libcpp/include/cpp-id-data.h	(working copy)
@@ -22,7 +22,7 @@ Foundation, 51 Franklin Street, Fifth Fl
 typedef unsigned char uchar;
 #endif
 
-#define U (const unsigned char *)  /* Intended use: U"string" */
+#define UC (const unsigned char *)  /* Intended use: UC"string" */
 
 /* Chained list of answers to an assertion.  */
 struct answer GTY(())
Index: libcpp/init.c
===================================================================
--- libcpp/init.c	(revision 134262)
+++ libcpp/init.c	(working copy)
@@ -76,20 +76,21 @@ struct lang_flags
   char std;
   char cplusplus_comments;
   char digraphs;
+  char uliterals;
 };
 
 static const struct lang_flags lang_defaults[] =
-{ /*              c99 c++ xnum xid std  //   digr  */
-  /* GNUC89   */  { 0,  0,  1,   0,  0,   1,   1     },
-  /* GNUC99   */  { 1,  0,  1,   0,  0,   1,   1     },
-  /* STDC89   */  { 0,  0,  0,   0,  1,   0,   0     },
-  /* STDC94   */  { 0,  0,  0,   0,  1,   0,   1     },
-  /* STDC99   */  { 1,  0,  1,   0,  1,   1,   1     },
-  /* GNUCXX   */  { 0,  1,  1,   0,  0,   1,   1     },
-  /* CXX98    */  { 0,  1,  1,   0,  1,   1,   1     },
-  /* GNUCXX0X */  { 1,  1,  1,   0,  0,   1,   1     },
-  /* CXX0X    */  { 1,  1,  1,   0,  1,   1,   1     },
-  /* ASM      */  { 0,  0,  1,   0,  0,   1,   0     }
+{ /*              c99 c++ xnum xid std  //   digr ulit */
+  /* GNUC89   */  { 0,  0,  1,   0,  0,   1,   1,   0 },
+  /* GNUC99   */  { 1,  0,  1,   0,  0,   1,   1,   1 },
+  /* STDC89   */  { 0,  0,  0,   0,  1,   0,   0,   0 },
+  /* STDC94   */  { 0,  0,  0,   0,  1,   0,   1,   0 },
+  /* STDC99   */  { 1,  0,  1,   0,  1,   1,   1,   0 },
+  /* GNUCXX   */  { 0,  1,  1,   0,  0,   1,   1,   0 },
+  /* CXX98    */  { 0,  1,  1,   0,  1,   1,   1,   0 },
+  /* GNUCXX0X */  { 1,  1,  1,   0,  0,   1,   1,   1 },
+  /* CXX0X    */  { 1,  1,  1,   0,  1,   1,   1,   1 },
+  /* ASM      */  { 0,  0,  1,   0,  0,   1,   0,   0 }
   /* xid should be 1 for GNUC99, STDC99, GNUCXX, CXX98, GNUCXX0X, and
      CXX0X when no longer experimental (when all uses of identifiers
      in the compiler have been audited for correct handling of
@@ -112,6 +113,7 @@ cpp_set_lang (cpp_reader *pfile, enum c_
   CPP_OPTION (pfile, trigraphs)			 = l->std;
   CPP_OPTION (pfile, cplusplus_comments)	 = l->cplusplus_comments;
   CPP_OPTION (pfile, digraphs)			 = l->digraphs;
+  CPP_OPTION (pfile, uliterals)			 = l->uliterals;
 }
 
 /* Initialize library global state.  */
Index: libcpp/expr.c
===================================================================
--- libcpp/expr.c	(revision 134262)
+++ libcpp/expr.c	(working copy)
@@ -705,6 +705,8 @@ eval_token (cpp_reader *pfile, const cpp
 
     case CPP_WCHAR:
     case CPP_CHAR:
+    case CPP_CHAR16:
+    case CPP_CHAR32:
       {
 	cppchar_t cc = cpp_interpret_charconst (pfile, token,
 						&temp, &unsignedp);
@@ -863,6 +865,8 @@ _cpp_parse_expr (cpp_reader *pfile)
 	case CPP_NUMBER:
 	case CPP_CHAR:
 	case CPP_WCHAR:
+	case CPP_CHAR16:
+	case CPP_CHAR32:
 	case CPP_NAME:
 	case CPP_HASH:
 	  if (!want_value)
Index: libcpp/internal.h
===================================================================
--- libcpp/internal.h	(revision 134262)
+++ libcpp/internal.h	(working copy)
@@ -48,6 +48,7 @@ struct cset_converter
 {
   convert_f func;
   iconv_t cd;
+  int width;
 };
 
 #define BITS_PER_CPPCHAR_T (CHAR_BIT * sizeof (cppchar_t))
@@ -399,6 +400,14 @@ struct cpp_reader
   struct cset_converter narrow_cset_desc;
 
   /* Descriptor for converting from the source character set to the
+     UTF-16 execution character set.  */
+  struct cset_converter char16_cset_desc;
+
+  /* Descriptor for converting from the source character set to the
+     UTF-32 execution character set.  */
+  struct cset_converter char32_cset_desc;
+
+  /* Descriptor for converting from the source character set to the
      wide execution character set.  */
   struct cset_converter wide_cset_desc;
 
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 134262)
+++ libcpp/lex.c	(working copy)
@@ -39,10 +39,10 @@ struct token_spelling
 };
 
 static const unsigned char *const digraph_spellings[] =
-{ U"%:", U"%:%:", U"<:", U":>", U"<%", U"%>" };
+{ UC"%:", UC"%:%:", UC"<:", UC":>", UC"<%", UC"%>" };
 
-#define OP(e, s) { SPELL_OPERATOR, U s  },
-#define TK(e, s) { SPELL_ ## s,    U #e },
+#define OP(e, s) { SPELL_OPERATOR, UC s  },
+#define TK(e, s) { SPELL_ ## s,    UC #e },
 static const struct token_spelling token_spellings[N_TTYPES] = { TTYPE_TABLE };
 #undef OP
 #undef TK
@@ -611,8 +611,8 @@ create_literal (cpp_reader *pfile, cpp_t
 
 /* Lexes a string, character constant, or angle-bracketed header file
    name.  The stored string contains the spelling, including opening
-   quote and leading any leading 'L'.  It returns the type of the
-   literal, or CPP_OTHER if it was not properly terminated.
+   quote and leading any leading 'L', 'u' or 'U'.  It returns the type
+   of the literal, or CPP_OTHER if it was not properly terminated.
 
    The spelling is NUL-terminated, but it is not guaranteed that this
    is the first NUL since embedded NULs are preserved.  */
@@ -626,12 +626,16 @@ lex_string (cpp_reader *pfile, cpp_token
 
   cur = base;
   terminator = *cur++;
-  if (terminator == 'L')
+  if (terminator == 'L' || terminator == 'u' || terminator == 'U')
     terminator = *cur++;
   if (terminator == '\"')
-    type = *base == 'L' ? CPP_WSTRING: CPP_STRING;
+    type = (*base == 'L' ? CPP_WSTRING :
+	    *base == 'U' ? CPP_STRING32 :
+	    *base == 'u' ? CPP_STRING16 : CPP_STRING);
   else if (terminator == '\'')
-    type = *base == 'L' ? CPP_WCHAR: CPP_CHAR;
+    type = (*base == 'L' ? CPP_WCHAR :
+	    *base == 'U' ? CPP_CHAR32 :
+	    *base == 'u' ? CPP_CHAR16 : CPP_CHAR);
   else
     terminator = '>', type = CPP_HEADER_NAME;
 
@@ -965,11 +969,16 @@ _cpp_lex_direct (cpp_reader *pfile)
       }
 
     case 'L':
-      /* 'L' may introduce wide characters or strings.  */
-      if (*buffer->cur == '\'' || *buffer->cur == '"')
+    case 'u':
+    case 'U':
+      /* 'L', 'u' or 'U' may introduce wide characters or strings.  */
+      if (c == 'L' || CPP_OPTION (pfile, uliterals))
 	{
-	  lex_string (pfile, result, buffer->cur - 1);
-	  break;
+	  if (*buffer->cur == '\'' || *buffer->cur == '"')
+	    {
+	      lex_string (pfile, result, buffer->cur - 1);
+	      break;
+	    }
 	}
       /* Fall through.  */
 
@@ -977,12 +986,12 @@ _cpp_lex_direct (cpp_reader *pfile)
     case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
     case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
     case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
-    case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+    case 's': case 't':           case 'v': case 'w': case 'x':
     case 'y': case 'z':
     case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
     case 'G': case 'H': case 'I': case 'J': case 'K':
     case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
-    case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+    case 'S': case 'T':           case 'V': case 'W': case 'X':
     case 'Y': case 'Z':
       result->type = CPP_NAME;
       {
Index: libcpp/charset.c
===================================================================
--- libcpp/charset.c	(revision 134262)
+++ libcpp/charset.c	(working copy)
@@ -642,6 +642,7 @@ init_iconv_desc (cpp_reader *pfile, cons
     {
       ret.func = convert_no_conversion;
       ret.cd = (iconv_t) -1;
+      ret.width = -1;
       return ret;
     }
 
@@ -655,6 +656,7 @@ init_iconv_desc (cpp_reader *pfile, cons
       {
 	ret.func = conversion_tab[i].func;
 	ret.cd = conversion_tab[i].fake_cd;
+	ret.width = -1;
 	return ret;
       }
 
@@ -663,6 +665,7 @@ init_iconv_desc (cpp_reader *pfile, cons
     {
       ret.func = convert_using_iconv;
       ret.cd = iconv_open (to, from);
+      ret.width = -1;
 
       if (ret.cd == (iconv_t) -1)
 	{
@@ -683,6 +686,7 @@ init_iconv_desc (cpp_reader *pfile, cons
 		 from, to);
       ret.func = convert_no_conversion;
       ret.cd = (iconv_t) -1;
+      ret.width = -1;
     }
   return ret;
 }
@@ -716,7 +720,17 @@ cpp_init_iconv (cpp_reader *pfile)
     wcset = default_wcset;
 
   pfile->narrow_cset_desc = init_iconv_desc (pfile, ncset, SOURCE_CHARSET);
+  pfile->narrow_cset_desc.width = CPP_OPTION (pfile, char_precision);
+  pfile->char16_cset_desc = init_iconv_desc (pfile,
+					     be ? "UTF-16BE" : "UTF-16LE",
+					     SOURCE_CHARSET);
+  pfile->char16_cset_desc.width = 16;
+  pfile->char32_cset_desc = init_iconv_desc (pfile,
+					     be ? "UTF-32BE" : "UTF-32LE",
+					     SOURCE_CHARSET);
+  pfile->char32_cset_desc.width = 32;
   pfile->wide_cset_desc = init_iconv_desc (pfile, wcset, SOURCE_CHARSET);
+  pfile->wide_cset_desc.width = CPP_OPTION (pfile, wchar_precision);
 }
 
 /* Destroy iconv(3) descriptors set up by cpp_init_iconv, if necessary.  */
@@ -1051,15 +1065,13 @@ _cpp_valid_ucn (cpp_reader *pfile, const
    An advanced pointer is returned.  Issues all relevant diagnostics.  */
 static const uchar *
 convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, bool wide)
+	     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
 {
   cppchar_t ucn;
   uchar buf[6];
   uchar *bufp = buf;
   size_t bytesleft = 6;
   int rval;
-  struct cset_converter cvt
-    = wide ? pfile->wide_cset_desc : pfile->narrow_cset_desc;
   struct normalize_state nst = INITIAL_NORMALIZE_STATE;
 
   from++;  /* Skip u/U.  */
@@ -1086,14 +1098,15 @@ convert_ucn (cpp_reader *pfile, const uc
    function issues no diagnostics and never fails.  */
 static void
 emit_numeric_escape (cpp_reader *pfile, cppchar_t n,
-		     struct _cpp_strbuf *tbuf, bool wide)
+		     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
 {
-  if (wide)
+  size_t width = cvt.width;
+
+  if (width != CPP_OPTION (pfile, char_precision))
     {
       /* We have to render this into the target byte order, which may not
 	 be our byte order.  */
       bool bigend = CPP_OPTION (pfile, bytes_big_endian);
-      size_t width = CPP_OPTION (pfile, wchar_precision);
       size_t cwidth = CPP_OPTION (pfile, char_precision);
       size_t cmask = width_to_mask (cwidth);
       size_t nbwc = width / cwidth;
@@ -1136,12 +1149,11 @@ emit_numeric_escape (cpp_reader *pfile, 
    number.  You can, e.g. generate surrogate pairs this way.  */
 static const uchar *
 convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, bool wide)
+	     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
 {
   cppchar_t c, n = 0, overflow = 0;
   int digits_found = 0;
-  size_t width = (wide ? CPP_OPTION (pfile, wchar_precision)
-		  : CPP_OPTION (pfile, char_precision));
+  size_t width = cvt.width;
   size_t mask = width_to_mask (width);
 
   if (CPP_WTRADITIONAL (pfile))
@@ -1174,7 +1186,7 @@ convert_hex (cpp_reader *pfile, const uc
       n &= mask;
     }
 
-  emit_numeric_escape (pfile, n, tbuf, wide);
+  emit_numeric_escape (pfile, n, tbuf, cvt);
 
   return from;
 }
@@ -1187,12 +1199,11 @@ convert_hex (cpp_reader *pfile, const uc
    number.  */
 static const uchar *
 convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, bool wide)
+	     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
 {
   size_t count = 0;
   cppchar_t c, n = 0;
-  size_t width = (wide ? CPP_OPTION (pfile, wchar_precision)
-		  : CPP_OPTION (pfile, char_precision));
+  size_t width = cvt.width;
   size_t mask = width_to_mask (width);
   bool overflow = false;
 
@@ -1213,7 +1224,7 @@ convert_oct (cpp_reader *pfile, const uc
       n &= mask;
     }
 
-  emit_numeric_escape (pfile, n, tbuf, wide);
+  emit_numeric_escape (pfile, n, tbuf, cvt);
 
   return from;
 }
@@ -1224,7 +1235,7 @@ convert_oct (cpp_reader *pfile, const uc
    pointer.  Handles all relevant diagnostics.  */
 static const uchar *
 convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit,
-		struct _cpp_strbuf *tbuf, bool wide)
+		struct _cpp_strbuf *tbuf, struct cset_converter cvt)
 {
   /* Values of \a \b \e \f \n \r \t \v respectively.  */
 #if HOST_CHARSET == HOST_CHARSET_ASCII
@@ -1236,23 +1247,21 @@ convert_escape (cpp_reader *pfile, const
 #endif
 
   uchar c;
-  struct cset_converter cvt
-    = wide ? pfile->wide_cset_desc : pfile->narrow_cset_desc;
 
   c = *from;
   switch (c)
     {
       /* UCNs, hex escapes, and octal escapes are processed separately.  */
     case 'u': case 'U':
-      return convert_ucn (pfile, from, limit, tbuf, wide);
+      return convert_ucn (pfile, from, limit, tbuf, cvt);
 
     case 'x':
-      return convert_hex (pfile, from, limit, tbuf, wide);
+      return convert_hex (pfile, from, limit, tbuf, cvt);
       break;
 
     case '0':  case '1':  case '2':  case '3':
     case '4':  case '5':  case '6':  case '7':
-      return convert_oct (pfile, from, limit, tbuf, wide);
+      return convert_oct (pfile, from, limit, tbuf, cvt);
 
       /* Various letter escapes.  Get the appropriate host-charset
 	 value into C.  */
@@ -1312,6 +1321,27 @@ convert_escape (cpp_reader *pfile, const
   return from + 1;
 }
 
+/* TYPE is a token type.  The return value is the conversion needed to
+   convert from source to execution character set for the given type. */
+static struct cset_converter
+converter_for_type (cpp_reader *pfile, enum cpp_ttype type)
+{
+  switch (type)
+    {
+    default:
+	return pfile->narrow_cset_desc;
+    case CPP_CHAR16:
+    case CPP_STRING16:
+	return pfile->char16_cset_desc;
+    case CPP_CHAR32:
+    case CPP_STRING32:
+	return pfile->char32_cset_desc;
+    case CPP_WCHAR:
+    case CPP_WSTRING:
+	return pfile->wide_cset_desc;
+    }
+}
+
 /* FROM is an array of cpp_string structures of length COUNT.  These
    are to be converted from the source to the execution character set,
    escape sequences translated, and finally all are to be
@@ -1320,13 +1350,12 @@ convert_escape (cpp_reader *pfile, const
    false for failure.  */
 bool
 cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count,
-		      cpp_string *to, bool wide)
+		      cpp_string *to,  enum cpp_ttype type)
 {
   struct _cpp_strbuf tbuf;
   const uchar *p, *base, *limit;
   size_t i;
-  struct cset_converter cvt
-    = wide ? pfile->wide_cset_desc : pfile->narrow_cset_desc;
+  struct cset_converter cvt = converter_for_type (pfile, type);
 
   tbuf.asize = MAX (OUTBUF_BLOCK_SIZE, from->len);
   tbuf.text = XNEWVEC (uchar, tbuf.asize);
@@ -1335,7 +1364,7 @@ cpp_interpret_string (cpp_reader *pfile,
   for (i = 0; i < count; i++)
     {
       p = from[i].text;
-      if (*p == 'L') p++;
+      if (*p == 'L' || *p == 'u' || *p == 'U') p++;
       p++; /* Skip leading quote.  */
       limit = from[i].text + from[i].len - 1; /* Skip trailing quote.  */
 
@@ -1354,12 +1383,12 @@ cpp_interpret_string (cpp_reader *pfile,
 	  if (p == limit)
 	    break;
 
-	  p = convert_escape (pfile, p + 1, limit, &tbuf, wide);
+	  p = convert_escape (pfile, p + 1, limit, &tbuf, cvt);
 	}
     }
   /* NUL-terminate the 'to' buffer and translate it to a cpp_string
      structure.  */
-  emit_numeric_escape (pfile, 0, &tbuf, wide);
+  emit_numeric_escape (pfile, 0, &tbuf, cvt);
   tbuf.text = XRESIZEVEC (uchar, tbuf.text, tbuf.len);
   to->text = tbuf.text;
   to->len = tbuf.len;
@@ -1375,7 +1404,8 @@ cpp_interpret_string (cpp_reader *pfile,
    in a string, but do not perform character set conversion.  */
 bool
 cpp_interpret_string_notranslate (cpp_reader *pfile, const cpp_string *from,
-				  size_t count,	cpp_string *to, bool wide)
+				  size_t count,	cpp_string *to,
+				  enum cpp_ttype type ATTRIBUTE_UNUSED)
 {
   struct cset_converter save_narrow_cset_desc = pfile->narrow_cset_desc;
   bool retval;
@@ -1383,7 +1413,7 @@ cpp_interpret_string_notranslate (cpp_re
   pfile->narrow_cset_desc.func = convert_no_conversion;
   pfile->narrow_cset_desc.cd = (iconv_t) -1;
 
-  retval = cpp_interpret_string (pfile, from, count, to, wide);
+  retval = cpp_interpret_string (pfile, from, count, to, CPP_STRING);
 
   pfile->narrow_cset_desc = save_narrow_cset_desc;
   return retval;
@@ -1462,13 +1492,14 @@ narrow_str_to_charconst (cpp_reader *pfi
 /* Subroutine of cpp_interpret_charconst which performs the conversion
    to a number, for wide strings.  STR is the string structure returned
    by cpp_interpret_string.  PCHARS_SEEN and UNSIGNEDP are as for
-   cpp_interpret_charconst.  */
+   cpp_interpret_charconst.  TYPE is the token type.  */
 static cppchar_t
 wide_str_to_charconst (cpp_reader *pfile, cpp_string str,
-		       unsigned int *pchars_seen, int *unsignedp)
+		       unsigned int *pchars_seen, int *unsignedp,
+		       enum cpp_ttype type)
 {
   bool bigend = CPP_OPTION (pfile, bytes_big_endian);
-  size_t width = CPP_OPTION (pfile, wchar_precision);
+  size_t width = converter_for_type (pfile, type).width;
   size_t cwidth = CPP_OPTION (pfile, char_precision);
   size_t mask = width_to_mask (width);
   size_t cmask = width_to_mask (cwidth);
@@ -1490,7 +1521,7 @@ wide_str_to_charconst (cpp_reader *pfile
   /* Wide character constants have type wchar_t, and a single
      character exactly fills a wchar_t, so a multi-character wide
      character constant is guaranteed to overflow.  */
-  if (off > 0)
+  if (str.len > nbwc * 2)
     cpp_error (pfile, CPP_DL_WARNING,
 	       "character constant too long for its type");
 
@@ -1498,13 +1529,20 @@ wide_str_to_charconst (cpp_reader *pfile
      sign- or zero-extend to the full width of cppchar_t.  */
   if (width < BITS_PER_CPPCHAR_T)
     {
-      if (CPP_OPTION (pfile, unsigned_wchar) || !(result & (1 << (width - 1))))
+      if (type == CPP_CHAR16 || type == CPP_CHAR32
+	  || CPP_OPTION (pfile, unsigned_wchar)
+	  || !(result & (1 << (width - 1))))
 	result &= mask;
       else
 	result |= ~mask;
     }
 
-  *unsignedp = CPP_OPTION (pfile, unsigned_wchar);
+  if (type == CPP_CHAR16 || type == CPP_CHAR32
+      || CPP_OPTION (pfile, unsigned_wchar))
+    *unsignedp = 1;
+  else
+    *unsignedp = 0;
+
   *pchars_seen = 1;
   return result;
 }
@@ -1518,20 +1556,21 @@ cpp_interpret_charconst (cpp_reader *pfi
 			 unsigned int *pchars_seen, int *unsignedp)
 {
   cpp_string str = { 0, 0 };
-  bool wide = (token->type == CPP_WCHAR);
+  bool wide = (token->type != CPP_CHAR);
   cppchar_t result;
 
-  /* an empty constant will appear as L'' or '' */
+  /* an empty constant will appear as L'', u'', U'' or '' */
   if (token->val.str.len == (size_t) (2 + wide))
     {
       cpp_error (pfile, CPP_DL_ERROR, "empty character constant");
       return 0;
     }
-  else if (!cpp_interpret_string (pfile, &token->val.str, 1, &str, wide))
+  else if (!cpp_interpret_string (pfile, &token->val.str, 1, &str, token->type))
     return 0;
 
   if (wide)
-    result = wide_str_to_charconst (pfile, str, pchars_seen, unsignedp);
+    result = wide_str_to_charconst (pfile, str, pchars_seen, unsignedp,
+				    token->type);
   else
     result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp);
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]