This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Revision 2: utf-16 and utf-32 support in C and C++


On Fri, Apr 04, 2008 at 04:39:16PM -0400, Jason Merrill wrote:
> Kris Van Hees wrote:
>
> Looking pretty good now, just a few nits:

First off, thanks for the review.

>> +#if 0
>
> My preference would be to omit the #if 0 blocks.  If you really want to 
> leave them in, please add a comment explaining why.

You are very correct.  I'm taking out those blocks (they were temporary changes
and shouldn't have been in the patch).
>
>> +      else if (type == char16_type_node && cxx_dialect == cxx0x)
>> +	write_string ("u8char16_t");
>> +      else if (type == char32_type_node && cxx_dialect == cxx0x)
>> +	write_string ("u8char32_t");
>
> The mangling doesn't need to be conditional on C++0x mode (but see below).

I agree (and continued below as well).

>> +	  if (type != tok->type)
>> +	    {
>> +	      if (type == CPP_STRING)
>> +		type = tok->type;
>> +	      else if (tok->type != CPP_STRING)
>
> The "if" here is redundant.

Actually, no.  Note that the two tests operate on different entities (type in
the first inner if, tok->type in the second).  This logic basically ensures
that string concatenation between different token types (for string literals)
is only allowed between CPP_STRING and any other literal type.  So, when the
types between the current concatenation result and the next token is different,
the concatenation is only allowed if either the current concatenation result is
of type CPP_STRING, or if the next token is of type CPP_STRING.

I'll add a comment to the code to explain this logic.

>> +#ifndef CHAR16_TYPE
>> +#define CHAR16_TYPE "short unsigned int"
>> +#endif
>> +
>> +#ifndef CHAR32_TYPE
>> +#define CHAR32_TYPE "unsigned int"
>> +#endif
>
> I'm not sure we can always rely on these being at least 16/32 bits, but I 
> suppose any targets for which they aren't can override the macros.  So this 
> bit is OK.
>
>> -  const int wide_flag = TREE_TYPE (value) == wchar_array_type_node;
>> +  const bool wide = TREE_TYPE (value)
>> +		    && TREE_TYPE (value) != char_array_type_node;
>
> The "wide" variable seems unnecessary: it's only used once, and it would be 
> clearer just to test for char_array_type_node before the others.

Agreed.

>> +  /* Define 'char16_t'.  */
>> +  char16_type_node = get_identifier (CHAR16_TYPE);
>> +  char16_type_node = TREE_TYPE (identifier_global_value 
>> (char16_type_node));
>> +  char16_type_size = TYPE_PRECISION (char16_type_node);
>> +  if (c_dialect_cxx () && cxx_dialect == cxx0x)
>> +    {
>> +      char16_type_node = make_unsigned_type (char16_type_size);
>> +      record_builtin_type (RID_CHAR16, "char16_t", char16_type_node);
>> +    }
>
> Hmm, now I see why you made char16_t mangling conditional on C++0x mode.  
> I'm very unhappy with changing the ABI of extended characters depending on 
> C++98 or 0x mode; if char16_type_node can come up at all in C++98 mode, it 
> must have the same meaning as in C++0x mode.  I'd much rather make the type 
> creation unconditional, and only have the compiler mode affect parsing.

Agreed, that is cleaner.  And in fact it makes for easier to understand logic.
In an earlier version of the patch I ran into trouble due to a partial type
creation.  And indeed, since the compiler mode already controls whether
char16_t/char32_t is parsed as a fundamental type or not, those checks are not
really needed.  Incidentally, various other places in my patch already used
that logic rather than explicitly testing.

>> Index: gcc/testsuite/gcc.dg/utf-dflt.c
>> Index: gcc/testsuite/g++.dg/ext/utf-gnuxx98.C
>
> I don't think these tests are testing what you intended; the u/U uses are 
> well-formed whether or not the macros are expanded.

Hm, good point.  I'll fix that as well.

>> +      /* 'L', 'u' or 'U' may introduce wide characters or strings.  */
>> +      if (c == 'L'
>> +	  || (CPP_OPTION (pfile, c99)
>> +	      && (CPP_OPTION (pfile, cplusplus) || !CPP_OPTION (pfile, std))))
>
> This test seems wrong.

I do believe it is correct, though perhaps there is a better way to encode the
same effect in a less confusing way.  It is correct given the following table
of language flags (from libcpp/init.c):

static const struct lang_flags lang_defaults[] =
{ /*              c99 c++ xnum xid std  //   digr  */
  /* GNUC89   */  { 0,  0,  1,   0,  0,   1,   1     },
  /* GNUC99   */  { 1,  0,  1,   0,  0,   1,   1     },
  /* STDC89   */  { 0,  0,  0,   0,  1,   0,   0     },
  /* STDC94   */  { 0,  0,  0,   0,  1,   0,   1     },
  /* STDC99   */  { 1,  0,  1,   0,  1,   1,   1     },
  /* GNUCXX   */  { 0,  1,  1,   0,  0,   1,   1     },
  /* CXX98    */  { 0,  1,  1,   0,  1,   1,   1     },
  /* GNUCXX0X */  { 1,  1,  1,   0,  0,   1,   1     },
  /* CXX0X    */  { 1,  1,  1,   0,  1,   1,   1     },
  /* ASM      */  { 0,  0,  1,   0,  0,   1,   0     }

The logic (c99 && (c++ || !std)) is only true for: GNUC99, GNUCXX0X, CXX0X.

Would it be better to use something like:

	if (c == 'L' ||
	    CPP_OPTION (pfile, lang) == CLK_GNUC99 ||
	    CPP_OPTION (pfile, lang) == CLK_GNUCXX0X ||
	    CPP_OPTION (pfile, lang) == CLK_CXX0X)

	Cheers,
	Kris


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]