This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
- From: Jonathan Wakely <jwakely at redhat dot com>
- To: libstdc++ at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Fri, 2 Jun 2017 19:35:52 +0100
- Subject: Re: [PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jwakely at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 8D0C780F95
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8D0C780F95
- References: <20170505170552.GD5109@redhat.com>
On 05/05/17 18:05 +0100, Jonathan Wakely wrote:
As discussed at http://stackoverflow.com/q/43769773/981959 (and kinda
hinted at by http://wg21.link/lwg1200) there's a problem with
char_traits<char16_t>::eof() because it returns int_type(-1) which is
the same value as u'\uFFFF', a valid UTF-16 code point.
i.e. because all values of int_type are also valid values of char_type
we cannot meet the requirement that:
"The member eof() shall return an implementation-defined constant
that cannot appear as a valid UTF-16 code unit."
I've reported this as a defect, suggesting that the wording above
needs to change.
One consequence is that basic_streambuf<char16_t>::sputc(u'\uFFFF')
always returns the same value, whether it succeeds or not. On success
it returns to_int_type(u'\uFFFF') and on failure it returns eof(),
which is the same value. I think that can be solved with the attached
change, which preserves the invariant in [char.traits.require] that
"a value e such that X::eq_int_type(e,X::to_int_type(c)) is false for
all values c."
This can be true if we ensure that to_int_type never returns the eof()
value. http://www.unicode.org/faq/private_use.html#nonchar10 suggests
doing something like this.
It means that when writing u'\uFFFF' to a streambuf we write that
character successfully, but return u'\uFFFD' instead; and when reading
u'\uFFFF' from a streambuf we return u'\uFFFD' instead. This is
asymmetrical, as we can write that character but not read it back. It
might be better to refuse to write u'\uFFFF' and write it as the
replacement character instead, but I think I prefer to write the right
character when possible. It also doesn't require any extra changes.
All tests pass with this, does anybody see any problems with this
Author: Jonathan Wakely <email@example.com>
Date: Fri May 5 16:57:07 2017 +0100
PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
* doc/xml/manual/status_cxx2011.xml: Document to_int_type behaviour.
* include/bits/char_traits.h (char_traits<char16_t>::to_int_type):
Transform eof value to U+FFFD.
* testsuite/21_strings/char_traits/requirements/char16_t/eof.cc: New.
* testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc: New.
* testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc: New.
I've committed this now. I'll work with WG21 to resolve
https://wg21.link/lwg2959 and if a better solution is found, we can do
that instead. Until then getting some implementation and usage
experience of this solution seems valuable.