This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()


On 05/05/17 18:05 +0100, Jonathan Wakely wrote:
As discussed at http://stackoverflow.com/q/43769773/981959 (and kinda
hinted at by http://wg21.link/lwg1200) there's a problem with
char_traits<char16_t>::eof() because it returns int_type(-1) which is
the same value as u'\uFFFF', a valid UTF-16 code point.

i.e. because all values of int_type are also valid values of char_type
we cannot meet the requirement that:

"The member eof() shall return an implementation-defined constant
that cannot appear as a valid UTF-16 code unit."

I've reported this as a defect, suggesting that the wording above
needs to change.

One consequence is that basic_streambuf<char16_t>::sputc(u'\uFFFF')
always returns the same value, whether it succeeds or not. On success
it returns to_int_type(u'\uFFFF') and on failure it returns eof(),
which is the same value. I think that can be solved with the attached
change, which preserves the invariant in [char.traits.require] that
eof() returns:

"a value e such that X::eq_int_type(e,X::to_int_type(c)) is false for
all values c."

This can be true if we ensure that to_int_type never returns the eof()
value. http://www.unicode.org/faq/private_use.html#nonchar10 suggests
doing something like this.

It means that when writing u'\uFFFF' to a streambuf we write that
character successfully, but return u'\uFFFD' instead; and when reading
u'\uFFFF' from a streambuf we return u'\uFFFD' instead. This is
asymmetrical, as we can write that character but not read it back.  It
might be better to refuse to write u'\uFFFF' and write it as the
replacement character instead, but I think I prefer to write the right
character when possible. It also doesn't require any extra changes.

All tests pass with this, does anybody see any problems with this
approach?



commit 8ab705e4920e933d3b0e90fd004b93d89aab8619
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri May 5 16:57:07 2017 +0100

   PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
PR libstdc++/80624
   	* doc/xml/manual/status_cxx2011.xml: Document to_int_type behaviour.
   	* include/bits/char_traits.h (char_traits<char16_t>::to_int_type):
   	Transform eof value to U+FFFD.
   	* testsuite/21_strings/char_traits/requirements/char16_t/eof.cc: New.
   	* testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc: New.
   	* testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc: New.

I've committed this now. I'll work with WG21 to resolve
https://wg21.link/lwg2959 and if a better solution is found, we can do
that instead. Until then getting some implementation and usage
experience of this solution seems valuable.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]