Normally GNAT recognizes the Latin-1 character set in source program
identifiers, as described in the Ada Reference Manual.
This switch causes
GNAT to recognize alternate character sets in identifiers.
c is a
single character indicating the character set, as follows:
|‘1’||ISO 8859-1 (Latin-1) identifiers|
|‘2’||ISO 8859-2 (Latin-2) letters allowed in identifiers|
|‘3’||ISO 8859-3 (Latin-3) letters allowed in identifiers|
|‘4’||ISO 8859-4 (Latin-4) letters allowed in identifiers|
|‘5’||ISO 8859-5 (Cyrillic) letters allowed in identifiers|
|‘9’||ISO 8859-15 (Latin-9) letters allowed in identifiers|
|‘p’||IBM PC letters (code page 437) allowed in identifiers|
|‘8’||IBM PC letters (code page 850) allowed in identifiers|
|‘f’||Full upper-half codes allowed in identifiers|
|‘n’||No upper-half codes allowed in identifiers|
|‘w’||Wide-character codes (that is, codes greater than 255) allowed in identifiers|
See Foreign Language Representation for full details on the implementation of these character sets.
Specify the method of encoding for wide characters.
e is one of the following:
|‘h’||Hex encoding (brackets coding also recognized)|
|‘u’||Upper half encoding (brackets encoding also recognized)|
|‘s’||Shift/JIS encoding (brackets encoding also recognized)|
|‘e’||EUC encoding (brackets encoding also recognized)|
|‘8’||UTF-8 encoding (brackets encoding also recognized)|
|‘b’||Brackets encoding only (default value)|
For full details on these encoding
methods see Wide_Character Encodings.
Note that brackets coding is always accepted, even if one of the other
options is specified, so for example
-gnatW8 specifies that both
brackets and UTF-8 encodings will be recognized. The units that are
with’ed directly or indirectly will be scanned using the specified
representation scheme, and so if one of the non-brackets scheme is
used, it must be used consistently throughout the program. However,
since brackets encoding is always recognized, it may be conveniently
used in standard libraries, allowing these libraries to be used with
any of the available coding schemes.
Note that brackets encoding only applies to program text. Within comments, brackets are considered to be normal graphic characters, and bracket sequences are never recognized as wide characters.
-gnatW? parameter is present, then the default
representation is normally Brackets encoding only. However, if the
first three characters of the file are 16#EF# 16#BB# 16#BF# (the standard
byte order mark or BOM for UTF-8), then these three characters are
skipped and the default representation for the file is set to UTF-8.
Note that the wide character representation that is specified (explicitly or by default) for the main program also acts as the default encoding used for Wide_Text_IO files if not specifically overridden by a WCEM form parameter.
-gnatW? is specified, then characters (other than wide
characters represented using brackets notation) are treated as 8-bit
Latin-1 codes. The codes recognized are the Latin-1 graphic characters,
and ASCII format effectors (CR, LF, HT, VT). Other lower half control
characters in the range 16#00#..16#1F# are not accepted in program text
or in comments. Upper half control characters (16#80#..16#9F#) are rejected
in program text, but allowed and ignored in comments. Note in particular
that the Next Line (NEL) character whose encoding is 16#85# is not recognized
as an end of line in this default mode. If your source program contains
instances of the NEL character used as a line terminator,
you must use UTF-8 encoding for the whole
source program. In default mode, all lines must be ended by a standard
end of line sequence (CR, CR/LF, or LF).
Note that the convention of simply accepting all upper half characters in comments means that programs that use standard ASCII for program text, but UTF-8 encoding for comments are accepted in default mode, providing that the comments are ended by an appropriate (CR, or CR/LF, or LF) line terminator. This is a common mode for many programs with foreign language comments.