[C++/preprocessor Patch] PR c++/53690

Paolo Carlini paolo.carlini@oracle.com
Wed Jul 1 20:30:00 GMT 2015


Hi,

this issue isn't a regression but it's getting a lot of attention and 
duplicates. Essentially, due to the lines at the very end of _cpp_valid_ucn:

   if (result == 0)
     result = 1;

   return result;
}

we can't possibly get the encoding of \U00000000 right in C++, that is 
zero, even after Jason's r152614 which added the following special case 
for C++ earlier in the function:

   else if ((result < 0xa0
         && !CPP_OPTION (pfile, cplusplus)
         && (result != 0x24 && result != 0x40 && result != 0x60))
        || (result & 0x80000000)
        || (result >= 0xD800 && result <= 0xDFFF))
     {
       cpp_error (pfile, CPP_DL_ERROR,
          "%.*s is not a valid universal character",
          (int) (str - base), base);
       result = 1;
     }

thus letting result == 0 thru until the very end of the function. Now, 
fixing the problem seems easy: just change the return type of the 
function to int, thus return zero for success; add a cppchar_t* 
parameter cp where *cp is computed exactly like the current return value 
*without* the final reset to one. Note that this is supposed to have 
*no* effects whatsoever outside C++ because currently, for all those 
languages, the final reset never triggers: in the second conditional 
above, a result == 0 triggers a cpp_error and result becomes == 1, thus 
the final conditional never sees result == 0.

Thus I'm finishing testing the below...

Thanks!
Paolo.

////////////////////////
-------------- next part --------------
/libcpp
2015-07-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR c++/53690
	* charset.c (_cpp_valid_ucn): Add cppchar_t * parameter and change
	return type to int.  Fix encoding of \u0000 and \U00000000 in C++.
	(convert_ucn): Adjust call.
	* lex.c (forms_identifier_p): Likewise.
	* internal.h (_cpp_valid_ucn): Adjust declaration.

/gcc/testsuite
2015-07-01  Paolo Carlini  <paolo.carlini@oracle.com>

	PR c++/53690
	* g++.dg/cpp/pr53690.C: New.
-------------- next part --------------
Index: gcc/testsuite/g++.dg/cpp/pr53690.C
===================================================================
--- gcc/testsuite/g++.dg/cpp/pr53690.C	(revision 0)
+++ gcc/testsuite/g++.dg/cpp/pr53690.C	(working copy)
@@ -0,0 +1,7 @@
+// PR c++/53690
+// { dg-do compile { target c++11 } }
+
+int array1[U'\U00000000' == 0 ? 1 : -1];
+int array2[U'\u0000' == 0 ? 1 : -1];
+int array3[u'\U00000000' == 0 ? 1 : -1];
+int array4[u'\u0000' == 0 ? 1 : -1];
Index: libcpp/charset.c
===================================================================
--- libcpp/charset.c	(revision 225265)
+++ libcpp/charset.c	(working copy)
@@ -972,21 +972,20 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppcha
    or 0060 (`), nor one in the range D800 through DFFF inclusive.
 
    *PSTR must be preceded by "\u" or "\U"; it is assumed that the
-   buffer end is delimited by a non-hex digit.  Returns zero if the
-   UCN has not been consumed.
+   buffer end is delimited by a non-hex digit.  Returns one if the
+   UCN has not been consumed, zero otherwise.
 
-   Otherwise the nonzero value of the UCN, whether valid or invalid,
-   is returned.  Diagnostics are emitted for invalid values.  PSTR
-   is updated to point one beyond the UCN, or to the syntactically
-   invalid character.
+   The value of the UCN, whether valid or invalid, is returned in *CP.
+   Diagnostics are emitted for invalid values.  PSTR is updated to point
+   one beyond the UCN, or to the syntactically invalid character.
 
    IDENTIFIER_POS is 0 when not in an identifier, 1 for the start of
    an identifier, or 2 otherwise.  */
 
-cppchar_t
+int
 _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
 		const uchar *limit, int identifier_pos,
-		struct normalize_state *nst)
+		struct normalize_state *nst, cppchar_t *cp)
 {
   cppchar_t result, c;
   unsigned int length;
@@ -1030,8 +1029,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
      multiple tokens in identifiers, so we can't give a helpful
      error message in that case.  */
   if (length && identifier_pos)
-    return 0;
-  
+    {
+      *cp = 0;
+      return 1;
+    }
+
   *pstr = str;
   if (length)
     {
@@ -1079,10 +1081,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **p
 		   (int) (str - base), base);
     }
 
-  if (result == 0)
-    result = 1;
-
-  return result;
+  *cp = result;
+  return 0;
 }
 
 /* Convert an UCN, pointed to by FROM, to UTF-8 encoding, then translate
@@ -1100,7 +1100,7 @@ convert_ucn (cpp_reader *pfile, const uchar *from,
   struct normalize_state nst = INITIAL_NORMALIZE_STATE;
 
   from++;  /* Skip u/U.  */
-  ucn = _cpp_valid_ucn (pfile, &from, limit, 0, &nst);
+  _cpp_valid_ucn (pfile, &from, limit, 0, &nst, &ucn);
 
   rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft);
   if (rval)
Index: libcpp/internal.h
===================================================================
--- libcpp/internal.h	(revision 225265)
+++ libcpp/internal.h	(working copy)
@@ -744,9 +744,10 @@ struct normalize_state
 #define NORMALIZE_STATE_UPDATE_IDNUM(st, c)	\
   ((st)->previous = (c), (st)->prev_class = 0)
 
-extern cppchar_t _cpp_valid_ucn (cpp_reader *, const unsigned char **,
-				 const unsigned char *, int,
-				 struct normalize_state *state);
+extern int _cpp_valid_ucn (cpp_reader *, const unsigned char **,
+			   const unsigned char *, int,
+			   struct normalize_state *state,
+			   cppchar_t *);
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
Index: libcpp/lex.c
===================================================================
--- libcpp/lex.c	(revision 225265)
+++ libcpp/lex.c	(working copy)
@@ -1244,9 +1244,10 @@ forms_identifier_p (cpp_reader *pfile, int first,
       && *buffer->cur == '\\'
       && (buffer->cur[1] == 'u' || buffer->cur[1] == 'U'))
     {
+      cppchar_t s;
       buffer->cur += 2;
-      if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
-			  state))
+      if (!_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
+			   state, &s))
 	return true;
       buffer->cur -= 2;
     }


More information about the Gcc-patches mailing list