Fix C99 checks for UCN digits at start of identifiers

Joseph S. Myers joseph@codesourcery.com
Fri Nov 15 09:09:00 GMT 2013


C99, but not C11, C++98, C++03 or C++11, disallows universal character
names for digits starting identifiers.  The cpplib logic for this gets
the "digit" property from Unicode data, but that data disagrees with
C99 Annex D, which considers Roman numerals (2160-2182), IDEOGRAPHIC
NUMBER ZERO (3007) and Suzhou numerals (3021-3029) to be special
characters instead of digits.

This patch fixes cpplib to follow C99's definition of digit.
C++98/C++03 have no restrictions on initial characters.  C11 and C++11
have identical list of permitted characters, and forbidden initial
characters, different from the lists in C99 and C++98/C++03; this
patch is preliminary to implementing support for the C11/C++11 lists.
In those lists, the forbidden initial characters appear to be
combining characters instead of digits.  (So I'll probably change the
C99, DIG, CXX flags in the followup to C99, N99 (meaning non-initial
character in C99), CXX (i.e. C++98/C++03), C11, N11.)

The new lists generally include large ranges of characters which may
not all be allocated in a particular Unicode version (meaning it will
be necessary to update the character composition information for
-Wnormalized= from Unicode from time to time, whereas that hasn't
mattered so much with the old smaller lists of characters).

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  Applied
to mainline.

gcc/testsuite:
2013-11-15  Joseph Myers  <joseph@codesourcery.com>

	* gcc.dg/cpp/ucnid-9.c: New test.

libcpp:
2013-11-15  Joseph Myers  <joseph@codesourcery.com>

	* ucnid.tab: Mark C99 digits as [C99DIG].
	* makeucnid.c (read_ucnid): Handle [C99DIG].
	(read_table): Don't check for digit characters.
	* ucnid.h: Regenerate.

Index: libcpp/makeucnid.c
===================================================================
--- libcpp/makeucnid.c	(revision 204827)
+++ libcpp/makeucnid.c	(working copy)
@@ -66,6 +66,8 @@ read_ucnid (const char *fname)
 	break;
       if (strcmp (line, "[C99]\n") == 0)
 	fl = C99;
+      if (strcmp (line, "[C99DIG]\n") == 0)
+	fl = C99|digit;
       else if (strcmp (line, "[CXX]\n") == 0)
 	fl = CXX;
       else if (isxdigit (line[0]))
@@ -104,10 +106,10 @@ read_ucnid (const char *fname)
   fclose (f);
 }
 
-/* Read UnicodeData.txt and set the 'digit' flag, and
-   also fill in the 'decomp' table to be the decompositions of
-   characters for which both the character decomposed and all the code
-   points in the decomposition are either C99 or CXX.  */
+/* Read UnicodeData.txt and fill in the 'decomp' table to be the
+   decompositions of characters for which both the character
+   decomposed and all the code points in the decomposition are either
+   C99 or CXX.  */
 
 static void
 read_table (char *fname)
@@ -135,11 +137,7 @@ read_table (char *fname)
       do {
 	l++;
       } while (*l != ';');
-      /* Category value; things starting with 'N' are numbers of some
-	 kind.  */
-      if (*++l == 'N')
-	flags[codepoint] |= digit;
-
+      /* Category value.  */
       do {
 	l++;
       } while (*l != ';');
Index: libcpp/ucnid.h
===================================================================
--- libcpp/ucnid.h	(revision 204827)
+++ libcpp/ucnid.h	(working copy)
@@ -714,13 +714,12 @@
 {   0|  0|  0|CID|NFC|NKC|  0,   0, 0x2132 },
 { C99|  0|  0|CID|NFC|  0|  0,   0, 0x2138 },
 {   0|  0|  0|CID|NFC|  0|  0,   0, 0x215f },
-{ C99|DIG|  0|CID|NFC|  0|  0,   0, 0x217f },
-{ C99|DIG|  0|CID|NFC|NKC|  0,   0, 0x2182 },
+{ C99|  0|  0|CID|NFC|  0|  0,   0, 0x217f },
+{ C99|  0|  0|CID|NFC|NKC|  0,   0, 0x2182 },
 {   0|  0|  0|CID|NFC|NKC|  0,   0, 0x3004 },
-{ C99|  0|  0|CID|NFC|NKC|  0,   0, 0x3006 },
-{ C99|DIG|  0|CID|NFC|NKC|  0,   0, 0x3007 },
+{ C99|  0|  0|CID|NFC|NKC|  0,   0, 0x3007 },
 {   0|  0|  0|CID|NFC|NKC|  0,   0, 0x3020 },
-{ C99|DIG|  0|CID|NFC|NKC|  0,   0, 0x3029 },
+{ C99|  0|  0|CID|NFC|NKC|  0,   0, 0x3029 },
 {   0|  0|  0|CID|NFC|NKC|  0,   0, 0x3040 },
 { C99|  0|CXX|CID|NFC|NKC|  0,   0, 0x3093 },
 {   0|  0|CXX|CID|NFC|NKC|  0,   0, 0x3094 },
Index: libcpp/ucnid.tab
===================================================================
--- libcpp/ucnid.tab	(revision 204827)
+++ libcpp/ucnid.tab	(working copy)
@@ -119,7 +119,7 @@ ac00-d7a3
 0b3d 1fbe 203f-2040 2102 2107 210a-2113 2115 2118-211d 2124 2126 2128
 212a-2131 2133-2138 2160-2182 3005-3007 3021-3029
 
-; Digits
+[C99DIG]
 0660-0669 06f0-06f9 0966-096f 09e6-09ef 0a66-0a6f 0ae6-0aef 0b66-0b6f
 0be7-0bef 0c66-0c6f 0ce6-0cef 0d66-0d6f 0e50-0e59 0ed0-0ed9 0f20-0f33
 
Index: gcc/testsuite/gcc.dg/cpp/ucnid-9.c
===================================================================
--- gcc/testsuite/gcc.dg/cpp/ucnid-9.c	(revision 0)
+++ gcc/testsuite/gcc.dg/cpp/ucnid-9.c	(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-do preprocess } */
+/* { dg-options "-std=c99 -pedantic -fextended-identifiers" } */
+
+\u2160
+\u2182
+\u3007
+\u3021
+\u3029

-- 
Joseph S. Myers
joseph@codesourcery.com



More information about the Gcc-patches mailing list