[PATCH 2/9]: C++ P0482R5 char8_t: Core language support

Tom Honermann tom@honermann.net
Mon Dec 17 21:47:00 GMT 2018


On 12/17/18 4:02 PM, Jason Merrill wrote:
> On 12/5/18 11:16 AM, Jason Merrill wrote:
>> On 12/5/18 2:09 AM, Tom Honermann wrote:
>>> On 12/3/18 5:01 PM, Jason Merrill wrote:
>>>> On 12/3/18 4:51 PM, Jason Merrill wrote:
>>>>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>>>>> This patch adds support for the P0482R5 core language changes.  
>>>>>> This includes:
>>>>>> - The -fchar8_t and -fno_char8_t command line options.
>>>>>> - char8_t as a keyword.
>>>>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>>>>    character type of size 1.
>>>>>> - Use of char8_t as a simple type specifier.
>>>>>> - u8 character literals with type char8_t.
>>>>>> - u8 string literals with type array of const char8_t.
>>>>>> - User defined literal operators that accept char8_1 and char8_t 
>>>>>> pointer
>>>>>>    types.
>>>>>> - New __cpp_char8_t predefined feature test macro.
>>>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>>>>    macros .
>>>>>> - Name mangling and demangling for char8_t (using Du).
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>
>>>>>>       * defaults.h: Define CHAR8_TYPE.
>>>>>>
>>>>>> gcc/c-family/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>>>>       (fix_string_type): Use char8_t for the type of u8 string 
>>>>>> literals.
>>>>>>       (c_common_get_alias_set): char8_t doesn't alias.
>>>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin 
>>>>>> type in
>>>>>>       C++.
>>>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>>>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>>>>       Define char8_type_node and char8_array_type_node.
>>>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>>>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node 
>>>>>> as the
>>>>>>       type of CPP_UTF8STRING.
>>>>>>       (lex_charconst): Use char8_type_node as the type of 
>>>>>> CPP_UTF8CHAR.
>>>>>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>>>>>
>>>>>> gcc/c/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>
>>>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>>>>       (digest_init): Handle initialization by a u8 string literal of
>>>>>>       char8_t type.
>>>>>>
>>>>>> gcc/cp/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>
>>>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>>>>       combinations involving char8_t.
>>>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for 
>>>>>> char8_t
>>>>>>       (Du).
>>>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a 
>>>>>> simple
>>>>>>       type specifier.
>>>>>>       (cp_parser_string_literal): Use char8_array_type_node for 
>>>>>> the type
>>>>>>       of CPP_UTF8STRING.
>>>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs 
>>>>>> in system
>>>>>>       headers.
>>>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for 
>>>>>> char8_t.
>>>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a character 
>>>>>> type.
>>>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>>>>>       literals of char8_t type.
>>>>>>       (check_literal_operator_args): Handle UDLs with u8 string 
>>>>>> literals
>>>>>>       of char8_t type.
>>>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a 
>>>>>> char array
>>>>>>       with a u8 string literal.
>>>>>>
>>>>>> libiberty/ChangeLog:
>>>>>>
>>>>>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>>>>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>>>>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to 
>>>>>> accommodate the
>>>>>>       new char8_t type.
>>>>>
>>>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>>>>    if (!TYPE_P (t))
>>>>>>      return -1;
>>>>>
>>>>>> +  /* Unlike char, char8_t doesn't alias. */
>>>>>> +  if (flag_char8_t && t == char8_type_node)
>>>>>> +    return -1;
>>>>>
>>>>> This seems unnecessary; doesn't the existing code have the same 
>>>>> effect? I think we could do with just an adjustment to the 
>>>>> existing comment.
>>> I'm not sure.  I had concerns about unintended matching due to 
>>> char8_t having an underlying type of unsigned char.
>>
>> That shouldn't be a problem: if char8_t is a distinct type, it won't 
>> match unsigned char, and if it's the same as unsigned char, 
>> flag_char8_t will be false.
>>
>>>>>> +  else if (flag_char8_t && TREE_TYPE (value) == 
>>>>>> char8_array_type_node)
>>>>>> +      || (flag_char8_t && type == char8_type_node)
>>>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>>>>>> char8_type_node));
>>>>>> +       || (flag_char8_t && type == char8_type_node
>>>>> In many places you check the flag and then for one of the char8 
>>>>> types. Since the types won't be used without the flag, checking 
>>>>> the flag seems redundant?
>>>
>>> This was again protection against unintended matching of the 
>>> underlying unsigned char type, particularly when compiling as C. 
>>> char8_type_node is constructed (in c_common_nodes_and_builtins) 
>>> following the pattern in place for char16_t and char32_t with the 
>>> following code:
>>>
>>> +  char8_type_node = get_identifier (CHAR8_TYPE);
>>> +  char8_type_node = TREE_TYPE (identifier_global_value 
>>> (char8_type_node));
>>> +  char8_type_size = TYPE_PRECISION (char8_type_node);
>>> +  if (c_dialect_cxx ())
>>> +    {
>>> +      char8_type_node = make_unsigned_type (char8_type_size);
>>> +
>>> +      if (flag_char8_t)
>>> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
>>> +    }
>>>
>>> My knowledge of gcc internals is weak, but I understand this to be, 
>>> effectively, defining a type alias (of unsigned char) and then, if 
>>> compiling as C++, re-defining it as a strong unsigned type.
>>>
>>> I don't recall the details now, but at one point, I was missing a 
>>> check for flag_char8_t in some location and I encountered some test 
>>> failures as a result.
>>
>> Since char8_type_node is always a distinct type in C++, we shouldn't 
>> need these checks in the C++ front end.  And since it's never a 
>> distinct type in C, the C front end (c/*) doesn't need to change at all.
>>
>>>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>>>>> +          && (typ1 == char_type_node || !flag_char8_t))
>>>>>
>>>>> This looks wrong, or at least incomplete; we want to complain 
>>>>> about mismatched types here even with -fchar8-t. Perhaps we should 
>>>>> replace all of this if/else with simply comparing typ1 and 
>>>>> char_type, and complaining if they're different.  Talking about 
>>>>> wide and non-wide isn't as useful as the actual types would be.
>>>>
>>>> Well, I suppose it isn't quite that simple, since we still need to 
>>>> treat the ordinary character types as interchangeable.
>>>
>>> We do need to complain about mismatched types and test 
>>> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that 
>>> happens.
>>
>> You don't seem to test initializing an array of signed/unsigned char, 
>> which I think will be broken by the change only considering 
>> char_type_node.
>>
>> I think we want to specifically allow conversion from array of one 
>> ordinary character type to another, and otherwise complain about the 
>> types being different with a message like "cannot initialize array of 
>> %qT from array of %qT" rather than mess with terms like int-array and 
>> (non-)wide string.
>
> Ping.  Will you have a chance to update the patch soon, or would you 
> like me to make these last changes myself?

Hi, Jason.  Updating the patch has remained high on my todo list, but 
I've been under water elsewhere.  I've been hoping to get to it in the 
next week or so.  If you have time available to help complete changes, 
sure, that would be great.

Tom.

>
> Jason




More information about the Gcc-patches mailing list