This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support


On 12/5/18 11:16 AM, Jason Merrill wrote:
On 12/5/18 2:09 AM, Tom Honermann wrote:
On 12/3/18 5:01 PM, Jason Merrill wrote:
On 12/3/18 4:51 PM, Jason Merrill wrote:
On 11/5/18 2:39 PM, Tom Honermann wrote:
This patch adds support for the P0482R5 core language changes. This includes:
- The -fchar8_t and -fno_char8_t command line options.
- char8_t as a keyword.
- The char8_t builtin type as a non-aliasing unsigned integral
   character type of size 1.
- Use of char8_t as a simple type specifier.
- u8 character literals with type char8_t.
- u8 string literals with type array of const char8_t.
- User defined literal operators that accept char8_1 and char8_t pointer
   types.
- New __cpp_char8_t predefined feature test macro.
- New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
   macros .
- Name mangling and demangling for char8_t (using Du).

gcc/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>
      * c-family/c-common.c (c_common_reswords): Add char8_t.
      (fix_string_type): Use char8_t for the type of u8 string literals.
      (c_common_get_alias_set): char8_t doesn't alias.
      (c_common_nodes_and_builtins): Define char8_t as a builtin type in
      C++.
      (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
      (keyword_begins_type_specifier): Add RID_CHAR8.
      * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
      (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
      Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
      Define char8_type_node and char8_array_type_node.
      * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
      __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
      (c_cpp_builtins): Predefine __cpp_char8_t.
      * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
      type of CPP_UTF8STRING.
      (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
      * c-family/c.opt: Add the -fchar8_t command line option.

gcc/c/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * c/c-typeck.c (char_type_p): Add char8_type_node.
      (digest_init): Handle initialization by a u8 string literal of
      char8_t type.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
      * cp/decl.c (grokdeclarator): Handle invalid type specifier
      combinations involving char8_t.
      * cp/lex.c (init_reswords): Add char8_t as a reserved word.
      * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
      (Du).
      * cp/parser.c (cp_keyword_starts_decl_specifier_p,
      cp_parser_simple_type_specifier): Recognize char8_t as a simple
      type specifier.
      (cp_parser_string_literal): Use char8_array_type_node for the type
      of CPP_UTF8STRING.
      (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
      headers.
      * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.       * cp/tree.c (char_type_p): Recognize char8_t as a character type.
      * cp/typeck.c (string_conv_p): Handle conversions of u8 string
      literals of char8_t type.
      (check_literal_operator_args): Handle UDLs with u8 string literals
      of char8_t type.
      * cp/typeck2.c (digest_init_r): Disallow initializing a char array
      with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  <tom@honermann.net>
      * cp-demangle.c (cplus_demangle_builtin_types,
      cplus_demangle_type): Add name demangling for char8_t (Du).
      * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
      new char8_t type.

@@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
     return -1;

+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;

This seems unnecessary; doesn't the existing code have the same effect? I think we could do with just an adjustment to the existing comment.
I'm not sure.  I had concerns about unintended matching due to char8_t having an underlying type of unsigned char.

That shouldn't be a problem: if char8_t is a distinct type, it won't match unsigned char, and if it's the same as unsigned char, flag_char8_t will be false.

+  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+      || (flag_char8_t && type == char8_type_node)
+      bool char8_array = (flag_char8_t && !!comptypes (typ1, char8_type_node));
+       || (flag_char8_t && type == char8_type_node
In many places you check the flag and then for one of the char8 types. Since the types won't be used without the flag, checking the flag seems redundant?

This was again protection against unintended matching of the underlying unsigned char type, particularly when compiling as C. char8_type_node is constructed (in c_common_nodes_and_builtins) following the pattern in place for char16_t and char32_t with the following code:

+  char8_type_node = get_identifier (CHAR8_TYPE);
+  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+  char8_type_size = TYPE_PRECISION (char8_type_node);
+  if (c_dialect_cxx ())
+    {
+      char8_type_node = make_unsigned_type (char8_type_size);
+
+      if (flag_char8_t)
+        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+    }

My knowledge of gcc internals is weak, but I understand this to be, effectively, defining a type alias (of unsigned char) and then, if compiling as C++, re-defining it as a strong unsigned type.

I don't recall the details now, but at one point, I was missing a check for flag_char8_t in some location and I encountered some test failures as a result.

Since char8_type_node is always a distinct type in C++, we shouldn't need these checks in the C++ front end.  And since it's never a distinct type in C, the C front end (c/*) doesn't need to change at all.

-      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
+      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
+          && (typ1 == char_type_node || !flag_char8_t))

This looks wrong, or at least incomplete; we want to complain about mismatched types here even with -fchar8-t.  Perhaps we should replace all of this if/else with simply comparing typ1 and char_type, and complaining if they're different.  Talking about wide and non-wide isn't as useful as the actual types would be.

Well, I suppose it isn't quite that simple, since we still need to treat the ordinary character types as interchangeable.

We do need to complain about mismatched types and test gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that happens.

You don't seem to test initializing an array of signed/unsigned char, which I think will be broken by the change only considering char_type_node.

I think we want to specifically allow conversion from array of one ordinary character type to another, and otherwise complain about the types being different with a message like "cannot initialize array of %qT from array of %qT" rather than mess with terms like int-array and (non-)wide string.

Ping. Will you have a chance to update the patch soon, or would you like me to make these last changes myself?

Jason


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]