[PATCH] libcpp, contrib: Update to Unicode 15.1

Jakub Jelinek jakub@redhat.com
Tue Nov 14 07:33:54 GMT 2023


Hi!

On Tue, Nov 14, 2023 at 08:23:27AM +0100, Jakub Jelinek wrote:
> The following patch (in plaintext just a pseudo-patch where I've left out
> the too big parts of either wget downloaded or regenerated files out with
> ..., full patch attached compressed) updates to Unicode 15.1 from 15.0
> we had last year.  Apparently Unicode forgot to add a new range to 4-8 Table
> we are using, but from the other files it is clear what should have been
> added; I've filed a bugreport against Unicode.

Reposted, because the attachment was still too even after compression.
This compressed patch leaves out uname2c.h changes, will post that as
a separate mail.

2023-11-14  Jakub Jelinek  <jakub@redhat.com>

contrib/
	* unicode/README: Adjust glibc git commit hash, number of Unicode
	data files to be updated and latest Unicode version.
	* unicode/from_glibc/utf8_gen.py: Update from glibc.
	* unicode/UnicodeData.txt: Update from Unicode 15.1.
	* unicode/EastAsianWidth.txt: Likewise.
	* unicode/DerivedNormalizationProps.txt: Likewise.
	* unicode/NameAliases.txt: Likewise.
	* unicode/DerivedCoreProperties.txt: Likewise.
	* unicode/PropList.txt: Likewise.
libcpp/
	* makeucnid.cc (write_copyright): Update copyright year.
	* makeuname2c.cc (write_copyright): Likewise.
	(struct generated): Update latest Unicode version.
	(generated_ranges): Add 2ebf0-2ee5d CJK UNIFIED IDEOGRAPH
	range which was forgotten to be added to 4-8 table, but
	clearly is expected to be there from the 15.1 additions.
	* ucnid.h: Regenerated.
	* uname2c.h: Regenerated.
	* generated_cpp_wcwidth.h: Regenerated.

--- contrib/unicode/README.jj	2023-03-16 10:28:18.226187960 +0100
+++ contrib/unicode/README	2023-11-13 13:53:22.777991374 +0100
@@ -30,7 +30,7 @@ localedata/unicode-gen/unicode_utils.py
 localedata/unicode-gen/utf8_gen.py
 
 And the most recent versions added to GCC are from glibc git commit:
-4c721f24fc190d1dc935eb0bab283de7cf13182e
+71de3aead9fffe89556e80ebc94aa918d8ee7bca
 
 The script gen_wcwidth.py found here contains the GCC-specific code to
 map glibc's output to the lookup tables we require.  This script should not need
@@ -40,14 +40,14 @@ produce ucnid.h.
 
 The procedure to update GCC's Unicode support is the following:
 
-1.  Update the five Unicode data files from the above URLs.
+1.  Update the six Unicode data files from the above URLs.
 
 2.  Update the two glibc files in from_glibc/ from glibc's git.  Update
     the commit number above in this README.
 
 3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
     (where X.Y is the version of the Unicode standard corresponding to the
-    Unicode data files being used, most recently, 15.0.0).
+    Unicode data files being used, most recently, 15.1.0).
 
 4.  Update Unicode Copyright years in libcpp/makeucnid.cc and in
     libcpp/makeuname2c.cc up to the year in which the Unicode
--- contrib/unicode/from_glibc/utf8_gen.py.jj	2023-01-16 11:52:15.879737071 +0100
+++ contrib/unicode/from_glibc/utf8_gen.py	2023-10-12 09:42:01.018694503 +0200
@@ -350,7 +350,7 @@ if __name__ == "__main__":
             # the EastAsianWidth.txt file.
             if re.match(r'.*<reserved-.+>\.\.<reserved-.+>.*', LINE):
                 continue
-            if re.match(r'^[^;]*;[WF]', LINE):
+            if re.match(r'^[^;]*;\s*[WF]\s*', LINE):
                 EAST_ASIAN_WIDTH_LINES.append(LINE.strip())
     with open(ARGS.prop_list_file, mode='r') as PROP_LIST_FILE:
         PROP_LIST_LINES = []
--- contrib/unicode/UnicodeData.txt.jj	2023-03-14 12:24:55.545729148 +0100
+++ contrib/unicode/UnicodeData.txt	2023-08-28 18:08:58.000000000 +0200
@@ -11231,6 +11231,10 @@
 2FF9;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT;So;0;ON;;;;;N;;;;;
 2FFA;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT;So;0;ON;;;;;N;;;;;
 2FFB;IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID;So;0;ON;;;;;N;;;;;
+2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;;
+2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;;
+2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;;
+2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;;
 3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;;
 3001;IDEOGRAPHIC COMMA;Po;0;ON;;;;;N;;;;;
 3002;IDEOGRAPHIC FULL STOP;Po;0;ON;;;;;N;IDEOGRAPHIC PERIOD;;;;
@@ -11705,6 +11709,7 @@
 31E1;CJK STROKE HZZZG;So;0;ON;;;;;N;;;;;
 31E2;CJK STROKE PG;So;0;ON;;;;;N;;;;;
 31E3;CJK STROKE Q;So;0;ON;;;;;N;;;;;
+31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;;
 31F0;KATAKANA LETTER SMALL KU;Lo;0;L;;;;;N;;;;;
 31F1;KATAKANA LETTER SMALL SI;Lo;0;L;;;;;N;;;;;
 31F2;KATAKANA LETTER SMALL SU;Lo;0;L;;;;;N;;;;;
@@ -34035,6 +34040,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N
 2CEA1;<CJK Ideograph Extension E, Last>;Lo;0;L;;;;;N;;;;;
 2CEB0;<CJK Ideograph Extension F, First>;Lo;0;L;;;;;N;;;;;
 2EBE0;<CJK Ideograph Extension F, Last>;Lo;0;L;;;;;N;;;;;
+2EBF0;<CJK Ideograph Extension I, First>;Lo;0;L;;;;;N;;;;;
+2EE5D;<CJK Ideograph Extension I, Last>;Lo;0;L;;;;;N;;;;;
 2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
 2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
 2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;
--- contrib/unicode/EastAsianWidth.txt.jj	2023-03-14 12:24:55.496729855 +0100
+++ contrib/unicode/EastAsianWidth.txt	2023-08-28 18:08:56.000000000 +0200
@@ -1,11 +1,11 @@
-# EastAsianWidth-15.0.0.txt
-# Date: 2022-05-24, 17:40:20 GMT [KW, LI]
-# © 2022 Unicode®, Inc.
+# EastAsianWidth-15.1.0.txt
+# Date: 2023-07-28, 23:34:08 GMT
+# © 2023 Unicode®, Inc.
 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
 # For terms of use, see https://www.unicode.org/terms_of_use.html
 #
 # Unicode Character Database
-# For documentation, see https://www.unicode.org/reports/tr44/
+#   For documentation, see https://www.unicode.org/reports/tr44/
 #
 # East_Asian_Width Property
 #
...
--- contrib/unicode/DerivedNormalizationProps.txt.jj	2023-03-14 12:24:55.480730086 +0100
+++ contrib/unicode/DerivedNormalizationProps.txt	2023-08-28 18:08:56.000000000 +0200
@@ -1,6 +1,6 @@
-# DerivedNormalizationProps-15.0.0.txt
-# Date: 2022-04-02, 01:29:03 GMT
-# © 2022 Unicode®, Inc.
+# DerivedNormalizationProps-15.1.0.txt
+# Date: 2023-05-02, 13:20:58 GMT
+# © 2023 Unicode®, Inc.
 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
 # For terms of use, see https://www.unicode.org/terms_of_use.html
 #
...
--- contrib/unicode/NameAliases.txt.jj	2023-03-16 10:28:18.226187960 +0100
+++ contrib/unicode/NameAliases.txt	2023-08-28 18:08:56.000000000 +0200
@@ -1,6 +1,6 @@
-# NameAliases-15.0.0.txt
-# Date: 2022-07-26, 20:13:00 GMT [KW]
-# © 2022 Unicode®, Inc.
+# NameAliases-15.1.0.txt
+# Date: 2023-01-05
+# © 2023 Unicode®, Inc.
 # For terms of use, see https://www.unicode.org/terms_of_use.html
 #
 # Unicode Character Database
--- contrib/unicode/DerivedCoreProperties.txt.jj	2023-03-14 12:24:55.468730260 +0100
+++ contrib/unicode/DerivedCoreProperties.txt	2023-08-28 18:08:56.000000000 +0200
@@ -1,6 +1,6 @@
-# DerivedCoreProperties-15.0.0.txt
-# Date: 2022-08-05, 22:17:05 GMT
-# © 2022 Unicode®, Inc.
+# DerivedCoreProperties-15.1.0.txt
+# Date: 2023-08-07, 15:21:24 GMT
+# © 2023 Unicode®, Inc.
 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
 # For terms of use, see https://www.unicode.org/terms_of_use.html
 #
@@ -1397,11 +1397,12 @@ FFDA..FFDC    ; Alphabetic # Lo   [3] HA
 2B740..2B81D  ; Alphabetic # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; Alphabetic # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
 2CEB0..2EBE0  ; Alphabetic # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D  ; Alphabetic # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
 2F800..2FA1D  ; Alphabetic # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
 30000..3134A  ; Alphabetic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
 31350..323AF  ; Alphabetic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
 
-# Total code points: 137765
+# Total code points: 138387
 
 # ================================================
 
@@ -6853,11 +6854,12 @@ FFDA..FFDC    ; ID_Start # Lo   [3] HALF
 2B740..2B81D  ; ID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; ID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
 2CEB0..2EBE0  ; ID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D  ; ID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
 2F800..2FA1D  ; ID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
 30000..3134A  ; ID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
 31350..323AF  ; ID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
 
-# Total code points: 136345
+# Total code points: 136967
 
 # ================================================
 
@@ -7438,6 +7440,7 @@ FFDA..FFDC    ; ID_Start # Lo   [3] HALF
 1FE0..1FEC    ; ID_Continue # L&  [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA
 1FF2..1FF4    ; ID_Continue # L&   [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI
 1FF6..1FFC    ; ID_Continue # L&   [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
+200C..200D    ; ID_Continue # Cf   [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
 203F..2040    ; ID_Continue # Pc   [2] UNDERTIE..CHARACTER TIE
 2054          ; ID_Continue # Pc       INVERTED UNDERTIE
 2071          ; ID_Continue # Lm       SUPERSCRIPT LATIN SMALL LETTER I
@@ -7504,6 +7507,7 @@ FFDA..FFDC    ; ID_Start # Lo   [3] HALF
 309D..309E    ; ID_Continue # Lm   [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
 309F          ; ID_Continue # Lo       HIRAGANA DIGRAPH YORI
 30A1..30FA    ; ID_Continue # Lo  [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
+30FB          ; ID_Continue # Po       KATAKANA MIDDLE DOT
 30FC..30FE    ; ID_Continue # Lm   [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK
 30FF          ; ID_Continue # Lo       KATAKANA DIGRAPH KOTO
 3105..312F    ; ID_Continue # Lo  [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
@@ -7683,6 +7687,7 @@ FF10..FF19    ; ID_Continue # Nd  [10] F
 FF21..FF3A    ; ID_Continue # L&  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
 FF3F          ; ID_Continue # Pc       FULLWIDTH LOW LINE
 FF41..FF5A    ; ID_Continue # L&  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
+FF65          ; ID_Continue # Po       HALFWIDTH KATAKANA MIDDLE DOT
 FF66..FF6F    ; ID_Continue # Lo  [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU
 FF70          ; ID_Continue # Lm       HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
 FF71..FF9D    ; ID_Continue # Lo  [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N
@@ -8207,12 +8212,13 @@ FFDA..FFDC    ; ID_Continue # Lo   [3] H
 2B740..2B81D  ; ID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; ID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
 2CEB0..2EBE0  ; ID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D  ; ID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
 2F800..2FA1D  ; ID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
 30000..3134A  ; ID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
 31350..323AF  ; ID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
 E0100..E01EF  ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
 
-# Total code points: 139482
+# Total code points: 140108
 
 # ================================================
 
@@ -8962,11 +8968,12 @@ FFDA..FFDC    ; XID_Start # Lo   [3] HAL
 2B740..2B81D  ; XID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; XID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
 2CEB0..2EBE0  ; XID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D  ; XID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
 2F800..2FA1D  ; XID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
 30000..3134A  ; XID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
 31350..323AF  ; XID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
 
-# Total code points: 136322
+# Total code points: 136944
 
 # ================================================
 
@@ -9543,6 +9550,7 @@ FFDA..FFDC    ; XID_Start # Lo   [3] HAL
 1FE0..1FEC    ; XID_Continue # L&  [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA
 1FF2..1FF4    ; XID_Continue # L&   [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI
 1FF6..1FFC    ; XID_Continue # L&   [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
+200C..200D    ; XID_Continue # Cf   [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
 203F..2040    ; XID_Continue # Pc   [2] UNDERTIE..CHARACTER TIE
 2054          ; XID_Continue # Pc       INVERTED UNDERTIE
 2071          ; XID_Continue # Lm       SUPERSCRIPT LATIN SMALL LETTER I
@@ -9608,6 +9616,7 @@ FFDA..FFDC    ; XID_Start # Lo   [3] HAL
 309D..309E    ; XID_Continue # Lm   [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
 309F          ; XID_Continue # Lo       HIRAGANA DIGRAPH YORI
 30A1..30FA    ; XID_Continue # Lo  [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
+30FB          ; XID_Continue # Po       KATAKANA MIDDLE DOT
 30FC..30FE    ; XID_Continue # Lm   [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK
 30FF          ; XID_Continue # Lo       KATAKANA DIGRAPH KOTO
 3105..312F    ; XID_Continue # Lo  [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
@@ -9793,6 +9802,7 @@ FF10..FF19    ; XID_Continue # Nd  [10]
 FF21..FF3A    ; XID_Continue # L&  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
 FF3F          ; XID_Continue # Pc       FULLWIDTH LOW LINE
 FF41..FF5A    ; XID_Continue # L&  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
+FF65          ; XID_Continue # Po       HALFWIDTH KATAKANA MIDDLE DOT
 FF66..FF6F    ; XID_Continue # Lo  [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU
 FF70          ; XID_Continue # Lm       HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
 FF71..FF9D    ; XID_Continue # Lo  [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N
@@ -10317,12 +10327,13 @@ FFDA..FFDC    ; XID_Continue # Lo   [3]
 2B740..2B81D  ; XID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; XID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
 2CEB0..2EBE0  ; XID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D  ; XID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
 2F800..2FA1D  ; XID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
 30000..3134A  ; XID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
 31350..323AF  ; XID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
 E0100..E01EF  ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
 
-# Total code points: 139463
+# Total code points: 140089
 
 # ================================================
 
@@ -10335,6 +10346,15 @@ E0100..E01EF  ; XID_Continue # Mn [240]
 #  - FFF9..FFFB (Interlinear annotation format characters)
 #  - 13430..13440 (Egyptian hieroglyph format characters)
 #  - Prepended_Concatenation_Mark (Exceptional format characters that should be visible)
+#
+# There are currently no stability guarantees for DICP. However, the
+# values of DICP interact with the derivation of XID_Continue
+# and NFKC_CF, for which there are stability guarantees.
+# Maintainers of this property should note that in the
+# unlikely case that the DICP value changes for an existing character
+# which is also XID_Continue=Yes, then exceptions must be put
+# in place to ensure that the NFKC_CF mapping value for that
+# existing character does not change.
 
 00AD          ; Default_Ignorable_Code_Point # Cf       SOFT HYPHEN
 034F          ; Default_Ignorable_Code_Point # Mn       COMBINING GRAPHEME JOINER
@@ -11602,7 +11622,7 @@ E0100..E01EF  ; Grapheme_Extend # Mn [24
 2E80..2E99    ; Grapheme_Base # So  [26] CJK RADICAL REPEAT..CJK RADICAL RAP
 2E9B..2EF3    ; Grapheme_Base # So  [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
 2F00..2FD5    ; Grapheme_Base # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
-2FF0..2FFB    ; Grapheme_Base # So  [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
+2FF0..2FFF    ; Grapheme_Base # So  [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION
 3000          ; Grapheme_Base # Zs       IDEOGRAPHIC SPACE
 3001..3003    ; Grapheme_Base # Po   [3] IDEOGRAPHIC COMMA..DITTO MARK
 3004          ; Grapheme_Base # So       JAPANESE INDUSTRIAL STANDARD SYMBOL
@@ -11657,6 +11677,7 @@ E0100..E01EF  ; Grapheme_Extend # Mn [24
 3196..319F    ; Grapheme_Base # So  [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK
 31A0..31BF    ; Grapheme_Base # Lo  [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH
 31C0..31E3    ; Grapheme_Base # So  [36] CJK STROKE T..CJK STROKE Q
+31EF          ; Grapheme_Base # So       IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION
 31F0..31FF    ; Grapheme_Base # Lo  [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
 3200..321E    ; Grapheme_Base # So  [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU
 3220..3229    ; Grapheme_Base # No  [10] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH TEN
@@ -12497,11 +12518,12 @@ FFFC..FFFD    ; Grapheme_Base # So   [2]
 2B740..2B81D  ; Grapheme_Base # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; Grapheme_Base # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
 2CEB0..2EBE0  ; Grapheme_Base # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D  ; Grapheme_Base # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
 2F800..2FA1D  ; Grapheme_Base # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
 30000..3134A  ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
 31350..323AF  ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
 
-# Total code points: 146986
+# Total code points: 147613
 
 # ================================================

... 
--- contrib/unicode/PropList.txt.jj	2023-03-14 12:24:55.497729841 +0100
+++ contrib/unicode/PropList.txt	2023-08-28 18:08:56.000000000 +0200
@@ -1,6 +1,6 @@
-# PropList-15.0.0.txt
-# Date: 2022-08-05, 22:17:16 GMT
-# © 2022 Unicode®, Inc.
+# PropList-15.1.0.txt
+# Date: 2023-08-01, 21:56:53 GMT
+# © 2023 Unicode®, Inc.
 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
 # For terms of use, see https://www.unicode.org/terms_of_use.html
 #
...
--- libcpp/makeuname2c.cc.jj	2023-03-16 10:19:01.734373423 +0100
+++ libcpp/makeuname2c.cc	2023-11-13 13:42:08.912442830 +0100
@@ -69,7 +69,7 @@ struct entry { const char *name; unsigne
 static struct entry *entries;
 static unsigned long num_allocated, num_entries;
 
-/* Unicode 15 Table 4-8.  */
+/* Unicode 15.1 Table 4-8.  */
 struct generated {
   const char *prefix;
   /* max_high is a workaround for UnicodeData.txt inconsistencies
@@ -87,6 +87,7 @@ static struct generated generated_ranges
   { "CJK UNIFIED IDEOGRAPH-", 0x2b740, 0x2b81d, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x2b820, 0x2cea1, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x2ceb0, 0x2ebe0, 0, 1, 0 },
+  { "CJK UNIFIED IDEOGRAPH-", 0x2ebf0, 0x2ee5d, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x30000, 0x3134a, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x31350, 0x323af, 0, 1, 0 },
   { "TANGUT IDEOGRAPH-", 0x17000, 0x187f7, 0, 2, 0 },
@@ -669,7 +670,7 @@ write_copyright (void)
    <http://www.gnu.org/licenses/>.\n\
 \n\
 \n\
-   Copyright (C) 1991-2022 Unicode, Inc.  All rights reserved.\n\
+   Copyright (C) 1991-2023 Unicode, Inc.  All rights reserved.\n\
    Distributed under the Terms of Use in\n\
    http://www.unicode.org/copyright.html.\n\
 \n\
--- libcpp/makeucnid.cc.jj	2023-03-16 10:19:01.722373601 +0100
+++ libcpp/makeucnid.cc	2023-11-13 13:42:21.728263043 +0100
@@ -467,7 +467,7 @@ write_copyright (void)
    <http://www.gnu.org/licenses/>.\n\
 \n\
 \n\
-   Copyright (C) 1991-2022 Unicode, Inc.  All rights reserved.\n\
+   Copyright (C) 1991-2023 Unicode, Inc.  All rights reserved.\n\
    Distributed under the Terms of Use in\n\
    http://www.unicode.org/copyright.html.\n\
 \n\
--- libcpp/ucnid.h.jj	2023-03-16 10:19:01.735373409 +0100
+++ libcpp/ucnid.h	2023-11-13 13:42:50.819854928 +0100
@@ -16,7 +16,7 @@
    <http://www.gnu.org/licenses/>.
 
 
-   Copyright (C) 1991-2022 Unicode, Inc.  All rights reserved.
+   Copyright (C) 1991-2023 Unicode, Inc.  All rights reserved.
    Distributed under the Terms of Use in
    http://www.unicode.org/copyright.html.
 
@@ -1379,7 +1379,8 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|  0|  0,   0, 0x1ffe },
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x1fff },
 {   0|  0|  0|  0|  0|    0|    0|CID|  0|  0|  0,   0, 0x200a },
-{   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x200d },
+{   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x200b },
+{   0|  0|  0|C11|  0|CXX23|NXX23|CID|NFC|NKC|  0,   0, 0x200d },
 {   0|  0|  0|  0|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x2029 },
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x202e },
 {   0|  0|  0|  0|  0|    0|    0|CID|NFC|  0|  0,   0, 0x203e },
@@ -1625,7 +1626,7 @@ static const struct ucnrange ucnranges[]
 { C99|  0|CXX|C11|  0|CXX23|    0|  0|NFC|NKC|  0,   0, 0x30f4 },
 { C99|  0|CXX|C11|  0|CXX23|    0|CID|NFC|NKC|  0,   0, 0x30f6 },
 {   0|  0|CXX|C11|  0|CXX23|    0|  0|NFC|NKC|  0,   0, 0x30fa },
-{ C99|  0|CXX|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x30fb },
+{ C99|  0|CXX|C11|  0|CXX23|NXX23|CID|NFC|NKC|  0,   0, 0x30fb },
 { C99|  0|CXX|C11|  0|CXX23|    0|CID|NFC|NKC|  0,   0, 0x30fc },
 {   0|  0|CXX|C11|  0|CXX23|    0|CID|NFC|NKC|  0,   0, 0x30fd },
 {   0|  0|CXX|C11|  0|CXX23|    0|  0|NFC|NKC|  0,   0, 0x30fe },
@@ -1906,7 +1907,8 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CXX23|NXX23|CID|NFC|  0|  0,   0, 0xff3f },
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|  0|  0,   0, 0xff40 },
 {   0|  0|CXX|C11|  0|CXX23|    0|CID|NFC|  0|  0,   0, 0xff5a },
-{   0|  0|  0|C11|  0|    0|    0|CID|NFC|  0|  0,   0, 0xff65 },
+{   0|  0|  0|C11|  0|    0|    0|CID|NFC|  0|  0,   0, 0xff64 },
+{   0|  0|  0|C11|  0|CXX23|NXX23|CID|NFC|  0|  0,   0, 0xff65 },
 {   0|  0|CXX|C11|  0|CXX23|    0|CID|NFC|  0|  0,   0, 0xff9d },
 {   0|  0|CXX|C11|  0|CXX23|NXX23|CID|NFC|  0|  0,   0, 0xff9f },
 {   0|  0|CXX|C11|  0|CXX23|    0|CID|NFC|  0|  0,   0, 0xffbe },
@@ -2786,6 +2788,8 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CXX23|    0|CID|NFC|NKC|  0,   0, 0x2cea1 },
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x2ceaf },
 {   0|  0|  0|C11|  0|CXX23|    0|CID|NFC|NKC|  0,   0, 0x2ebe0 },
+{   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x2ebef },
+{   0|  0|  0|C11|  0|CXX23|    0|CID|NFC|NKC|  0,   0, 0x2ee5d },
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x2f7ff },
 {   0|  0|  0|C11|  0|CXX23|    0|  0|  0|  0|  0,   0, 0x2fa1d },
 {   0|  0|  0|C11|  0|    0|    0|CID|NFC|NKC|  0,   0, 0x2fffd },
--- libcpp/uname2c.h.jj	2023-03-16 10:19:01.739373350 +0100
+++ libcpp/uname2c.h	2023-11-13 13:42:43.912951822 +0100
@@ -16,7 +16,7 @@
    <http://www.gnu.org/licenses/>.
 
 
-   Copyright (C) 1991-2022 Unicode, Inc.  All rights reserved.
+   Copyright (C) 1991-2023 Unicode, Inc.  All rights reserved.
    Distributed under the Terms of Use in
    http://www.unicode.org/copyright.html.
 
@@ -52,7 +52,7 @@
    use or other dealings in these Data Files or Software without prior
    written authorization of the copyright holder.  */
 
-static const char uname2c_dict[59891] =
+static const char uname2c_dict[59919] =
 "DIVIDED BY HORIZONTAL BAR AND TOP HALF DIVIDED BY VERTICAL BARUIGHUR KIRGHIZ "
 "YEH WITH HAMZA ABOVE WITH ALEF MAKSURA LANTED EQUAL ABOVE GREATER-THAN ABOVE "
 "SLANTED EQUAL WITH EXCLAMATION MARK WITH LEFT RIGHT ARROW ABOVELANTED EQUAL A"
...
--- libcpp/generated_cpp_wcwidth.h.jj	2023-03-14 12:24:55.976722924 +0100
+++ libcpp/generated_cpp_wcwidth.h	2023-11-13 13:54:30.472042026 +0100
@@ -1,5 +1,5 @@
 /*  Generated by contrib/unicode/gen_wcwidth.py, with the help of glibc's
-    utf8_gen.py, using version 15.0.0 of the Unicode standard.  */
+    utf8_gen.py, using version 15.1.0 of the Unicode standard.  */
 
 static const cppchar_t wcwidth_range_ends[] = {
   0x2ff, 0x36f, 0x482, 0x489, 0x590, 0x5bd, 0x5be, 0x5bf,
...

	Jakub
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc14-unicode-15.1-update-1.patch.xz
Type: application/x-xz
Size: 95116 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20231114/63c7122f/attachment-0001.xz>


More information about the Gcc-patches mailing list