This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()


As discussed at http://stackoverflow.com/q/43769773/981959 (and kinda
hinted at by http://wg21.link/lwg1200) there's a problem with
char_traits<char16_t>::eof() because it returns int_type(-1) which is
the same value as u'\uFFFF', a valid UTF-16 code point.

i.e. because all values of int_type are also valid values of char_type
we cannot meet the requirement that:

"The member eof() shall return an implementation-defined constant
that cannot appear as a valid UTF-16 code unit."

I've reported this as a defect, suggesting that the wording above
needs to change.

One consequence is that basic_streambuf<char16_t>::sputc(u'\uFFFF')
always returns the same value, whether it succeeds or not. On success
it returns to_int_type(u'\uFFFF') and on failure it returns eof(),
which is the same value. I think that can be solved with the attached
change, which preserves the invariant in [char.traits.require] that
eof() returns:

"a value e such that X::eq_int_type(e,X::to_int_type(c)) is false for
all values c."

This can be true if we ensure that to_int_type never returns the eof()
value. http://www.unicode.org/faq/private_use.html#nonchar10 suggests
doing something like this.

It means that when writing u'\uFFFF' to a streambuf we write that
character successfully, but return u'\uFFFD' instead; and when reading
u'\uFFFF' from a streambuf we return u'\uFFFD' instead. This is
asymmetrical, as we can write that character but not read it back.  It
might be better to refuse to write u'\uFFFF' and write it as the
replacement character instead, but I think I prefer to write the right
character when possible. It also doesn't require any extra changes.

All tests pass with this, does anybody see any problems with this
approach?


commit 8ab705e4920e933d3b0e90fd004b93d89aab8619
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri May 5 16:57:07 2017 +0100

    PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof()
    
    	PR libstdc++/80624
    	* doc/xml/manual/status_cxx2011.xml: Document to_int_type behaviour.
    	* include/bits/char_traits.h (char_traits<char16_t>::to_int_type):
    	Transform eof value to U+FFFD.
    	* testsuite/21_strings/char_traits/requirements/char16_t/eof.cc: New.
    	* testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc: New.
    	* testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc: New.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index 705f2ee..0fa4bc0 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -2630,6 +2630,10 @@ particular release.
       <classname>u32streampos</classname> are both synonyms for
       <classname>fpos&lt;mbstate_t&gt;</classname>.
       The function <function>eof</function> returns <code>int_type(-1)</code>.
+      <function>char_traits&lt;char16_t&gt;::to_int_type</function> will
+      transform the "noncharacter" U+FFFF to U+FFFD (REPLACEMENT CHARACTER).
+      This is done to ensure that <function>to_int_type</function> never
+      returns the same value as <function>eof</function>, which is U+FFFF.
    </para>
 
    <para>
diff --git a/libstdc++-v3/include/bits/char_traits.h b/libstdc++-v3/include/bits/char_traits.h
index 75db5b8..f19120b 100644
--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -507,7 +507,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       static constexpr int_type
       to_int_type(const char_type& __c) noexcept
-      { return int_type(__c); }
+      { return __c == eof() ? int_type(0xfffd) : int_type(__c); }
 
       static constexpr bool
       eq_int_type(const int_type& __c1, const int_type& __c2) noexcept
diff --git a/libstdc++-v3/testsuite/21_strings/char_traits/requirements/char16_t/eof.cc b/libstdc++-v3/testsuite/21_strings/char_traits/requirements/char16_t/eof.cc
new file mode 100644
index 0000000..05def7f
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/char_traits/requirements/char16_t/eof.cc
@@ -0,0 +1,31 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// { dg-do compile { target c++11 } }
+
+#include <string>
+
+
+constexpr bool not_equal_to_eof(char16_t c)
+{
+  using T = std::char_traits<char16_t>;
+  return T::eq_int_type(T::eof(), T::to_int_type(c)) == false;
+}
+
+// Last two code points of the BMP are noncharacters:
+static_assert(not_equal_to_eof(u'\uFFFE'), "U+FFFE compares unequal to eof");
+static_assert(not_equal_to_eof(u'\uFFFF'), "U+FFFF compares unequal to eof");
diff --git a/libstdc++-v3/testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc b/libstdc++-v3/testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc
new file mode 100644
index 0000000..c08c227
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc
@@ -0,0 +1,56 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// { dg-do run { target c++11 } }
+
+#include <streambuf>
+#include <testsuite_hooks.h>
+
+struct streambuf : std::basic_streambuf<char16_t>
+{
+  basic_streambuf* setbuf(char_type* s, std::streamsize n) override
+  {
+    setp(s, s + n);
+    setg(s, s, s + n);
+    return this;
+  }
+};
+
+void
+test01()
+{
+  using traits = streambuf::traits_type;
+
+  char16_t buf[2] = { streambuf::char_type(-1), streambuf::char_type(-2) };
+  streambuf sb;
+  sb.pubsetbuf(buf, 2);
+
+  streambuf::int_type res;
+
+  res = sb.sbumpc();
+  VERIFY( traits::eq_int_type(res, traits::eof()) == false );
+  res = sb.sbumpc();
+  VERIFY( traits::eq_int_type(res, traits::eof()) == false );
+  res = sb.sbumpc();
+  VERIFY( traits::eq_int_type(res, traits::eof()) == true );
+}
+
+int
+main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc b/libstdc++-v3/testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc
new file mode 100644
index 0000000..a2a1917
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc
@@ -0,0 +1,59 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// { dg-do run { target c++11 } }
+
+#include <streambuf>
+#include <testsuite_hooks.h>
+
+struct streambuf : std::basic_streambuf<char16_t>
+{
+  basic_streambuf* setbuf(char_type* s, std::streamsize n) override
+  {
+    setp(s, s + n);
+    setg(s, s, s + n);
+    return this;
+  }
+};
+
+void
+test01()
+{
+  using traits = streambuf::traits_type;
+
+  char16_t buf[2] = { };
+  streambuf sb;
+  sb.pubsetbuf(buf, 2);
+
+  streambuf::int_type res;
+
+  res = sb.sputc(streambuf::char_type(-1));
+  VERIFY( traits::eq_int_type(res, traits::eof()) == false );
+  res = sb.sputc(streambuf::char_type(-2));
+  VERIFY( traits::eq_int_type(res, traits::eof()) == false );
+  res = sb.sputc(streambuf::char_type(1));
+  VERIFY( traits::eq_int_type(res, traits::eof()) == true );
+
+  VERIFY( buf[0] == streambuf::char_type(-1) );
+  VERIFY( buf[1] == streambuf::char_type(-2) );
+}
+
+int
+main()
+{
+  test01();
+}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]