GNAT allows wide character codes to appear in character and string literals, and also optionally in identifiers, by means of the following possible encoding schemes:
In this encoding, a wide character is represented by the following five character sequence:
ESC a b c d
where a
, b
, c
, d
are the four hexadecimal
characters (using uppercase letters) of the wide character code. For
example, ESC A345 is used to represent the wide character with code
16#A345#
.
This scheme is compatible with use of the full Wide_Character set.
The wide character with encoding 16#abcd#
where the upper bit is on
(in other words, ‘a’ is in the range 8-F) is represented as two bytes,
16#ab#
and 16#cd#
. The second byte cannot be a format control
character, but is not required to be in the upper half. This method can
be also used for shift-JIS or EUC, where the internal coding matches the
external coding.
A wide character is represented by a two-character sequence,
16#ab#
and
16#cd#
, with the restrictions described for upper-half encoding as
described above. The internal character code is the corresponding JIS
character according to the standard algorithm for Shift-JIS
conversion. You can only use characters defined in the JIS code set table
with this encoding method.
A wide character is represented by a two-character sequence
16#ab#
and 16#cd#
, with both characters being in the upper
half. The internal character code is the corresponding JIS character
according to the EUC encoding algorithm. You can only use characters
defined in the JIS code set table with this encoding method.
A wide character is represented using UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO 10646-1/Am.2. Depending on the character value, the representation is a one, two, or three byte sequence:
16#0000#-16#007f#: 2#0xxxxxxx# 16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx# 16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
where the xxx
bits correspond to the left-padded bits of the
16-bit character value. Note that all lower half ASCII characters
are represented as ASCII bytes and all upper half characters and
other wide characters are represented as sequences of upper-half
(The full UTF-8 scheme allows for encoding 31-bit characters as
6-byte sequences the use of these sequences is documented in the
following section on wide wide characters.)
In this encoding, a wide character is represented by the following eight character sequence:
[ " a b c d " ]
where a
, b
, c
, d
are the four hexadecimal characters
(using uppercase letters) of the wide character code. For example,
[‘A345’] is used to represent the wide character with code
16#A345#
. You can also (though you are not required to) use the
Brackets coding for upper half characters. For example, you can
represent the code 16#A3#
as ['A3']
.
This scheme is compatible with use of the full Wide_Character
set,
and is also the method used for wide character encoding in some standard
ACATS (Ada Conformity Assessment Test Suite) test suite distributions.
|