[libcpp] Issue a pedantic warning for UCNs outside UCS codespace

Joseph Myers joseph@codesourcery.com
Wed Sep 25 23:34:00 GMT 2019


On Tue, 24 Sep 2019, Eric Botcazou wrote:

> Hi,
> 
> the Universal Character Names accepted by the C family of compilers are mapped 
> to those of ISO/IEC 10646, which defines the Universal Character Set codespace 
> as the range 0-0x10FFFF inclusive.  The upper bound is already enforced for 
> identifiers but not for literals, so the following code is accepted in C99:
> 
> #include <stddef.h>
> 
> wchar_t a = L'\U00110000';
> 
> whereas it is rejected with an error by other compilers (Clang, MSVC).
> 
> I'm not sure whether the compiler is really equired to issue a diagnostic in 
> this case.  Moreover a few tests in the testsuite manipulate UCNs outside the 
> UCS codespace.  That's why I suggest issuing a pedantic warning.

For C, I think such UCNs violate the Semantics but not the Constraints on 
UCNs, so no diagnostic is actually required in C, although it is permitted 
as a pedwarn / error.

However, while C++ doesn't have that Semantics / Constraints division, 
it's also the case that before C++2a, C++ only has a dated normative 
reference to ISO/IEC 10646-1:1993 (C++2a adds an undated reference and 
says the dated one is only for deprecated features, as well as explicitly 
making such UCNs outside the ISO 10646 code point range ill-formed).  So I 
think that for C++, this is only correct as an error / pedwarn in the 
C++2a case.

-- 
Joseph S. Myers
joseph@codesourcery.com



More information about the Gcc-patches mailing list