[libcpp] Issue a pedantic warning for UCNs outside UCS codespace
Joseph Myers
joseph@codesourcery.com
Wed Sep 25 23:34:00 GMT 2019
On Tue, 24 Sep 2019, Eric Botcazou wrote:
> Hi,
>
> the Universal Character Names accepted by the C family of compilers are mapped
> to those of ISO/IEC 10646, which defines the Universal Character Set codespace
> as the range 0-0x10FFFF inclusive. The upper bound is already enforced for
> identifiers but not for literals, so the following code is accepted in C99:
>
> #include <stddef.h>
>
> wchar_t a = L'\U00110000';
>
> whereas it is rejected with an error by other compilers (Clang, MSVC).
>
> I'm not sure whether the compiler is really equired to issue a diagnostic in
> this case. Moreover a few tests in the testsuite manipulate UCNs outside the
> UCS codespace. That's why I suggest issuing a pedantic warning.
For C, I think such UCNs violate the Semantics but not the Constraints on
UCNs, so no diagnostic is actually required in C, although it is permitted
as a pedwarn / error.
However, while C++ doesn't have that Semantics / Constraints division,
it's also the case that before C++2a, C++ only has a dated normative
reference to ISO/IEC 10646-1:1993 (C++2a adds an undated reference and
says the dated one is only for deprecated features, as well as explicitly
making such UCNs outside the ISO 10646 code point range ill-formed). So I
think that for C++, this is only correct as an error / pedwarn in the
C++2a case.
--
Joseph S. Myers
joseph@codesourcery.com
More information about the Gcc-patches
mailing list