This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()


On 05/08/2017 12:24 PM, Jonathan Wakely wrote:
On 08/05/17 11:53 +0200, Florian Weimer via libstdc++ wrote:
On 05/05/2017 07:05 PM, Jonathan Wakely wrote:
As discussed at http://stackoverflow.com/q/43769773/981959 (and kinda
hinted at by http://wg21.link/lwg1200) there's a problem with
char_traits<char16_t>::eof() because it returns int_type(-1) which is
the same value as u'\uFFFF', a valid UTF-16 code point.

I think the real bug is that char_traits<char16_t>::int_type is just plain wrong. It has to be a signed integer,

Why does it have to be signed?

Hmm. Maybe it's not strictly required. int_type(-1) as a distinct value is likely sufficient.

and capable of representing values in the range 0 .. 65535. char_traits<char32_t> has a similar problem. char_traits<wchar_t> should be fine on glibc because WEOF is reserved, something that is probably not the case for char32_t.

I think there are 32-bit values which are not valid UTF-32 code
points, including char32_t(-1) which we use for EOF.

I'm not sure if char32_t is restricted to UTF-32 codepoints (the standard does not say, I think). But even UCS-4 is 31-bit only, so maybe the problem does not arise there.

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]