This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
- From: "Jonathan Wakely via gcc-patches" <gcc-patches at gcc dot gnu dot org>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: libstdc++ at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Mon, 8 May 2017 12:01:21 +0100
- Subject: Re: [PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jwakely at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 582639D4ED
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 582639D4ED
- References: <20170505170552.GD5109@redhat.com> <a62fa2bc-0425-68de-35d5-f4a6fdb16364@redhat.com> <20170508102403.GG5109@redhat.com> <32f21519-6180-7fc1-3e24-3c64b3e9eaf7@redhat.com>
- Reply-to: Jonathan Wakely <jwakely at redhat dot com>
On 08/05/17 12:52 +0200, Florian Weimer via libstdc++ wrote:
On 05/08/2017 12:24 PM, Jonathan Wakely wrote:
On 08/05/17 11:53 +0200, Florian Weimer via libstdc++ wrote:
On 05/05/2017 07:05 PM, Jonathan Wakely wrote:
As discussed at http://stackoverflow.com/q/43769773/981959 (and kinda
hinted at by http://wg21.link/lwg1200) there's a problem with
char_traits<char16_t>::eof() because it returns int_type(-1) which is
the same value as u'\uFFFF', a valid UTF-16 code point.
I think the real bug is that char_traits<char16_t>::int_type is
just plain wrong. It has to be a signed integer,
Why does it have to be signed?
Hmm. Maybe it's not strictly required. int_type(-1) as a distinct
value is likely sufficient.
Agreed.
and capable of representing values in the range 0 .. 65535.
char_traits<char32_t> has a similar problem. char_traits<wchar_t>
should be fine on glibc because WEOF is reserved, something that
is probably not the case for char32_t.
I think there are 32-bit values which are not valid UTF-32 code
points, including char32_t(-1) which we use for EOF.
I'm not sure if char32_t is restricted to UTF-32 codepoints (the
standard does not say, I think). But even UCS-4 is 31-bit only, so
maybe the problem does not arise there.
It's not really clear what the encoding of char32_t is (see
http://talesofcpp.fusionfenix.com/post-10/episode-seven-one-char-to-rule-them-all
for a good analysis) but whether it's UCS-4 or UTF-32, U+FFFFFFFF is
not in the universal character set, so we can use 0xFFFFFFFF for
char_traits<char32_t>::eof().
So I think only char_traits<char16_t> has this problem.