[PATCH] Implement P0966 std::string::reserve should not shrink

Andrew Luo andrewluotechnologies@outlook.com
Wed Jan 23 03:41:00 GMT 2019

Actually, reserve() doesn't currently work as reserve(0) previously, I missed a line there (should have shrink_to_fit() in the body...)

Also, how do you run the tests for the COW string?  I ran make check in the libstdc++-v3 folder and everything passes...



-----Original Message-----
From: libstdc++-owner@gcc.gnu.org <libstdc++-owner@gcc.gnu.org> On Behalf Of Andrew Luo
Sent: Tuesday, January 22, 2019 4:45 PM
To: Jonathan Wakely <jwakely@redhat.com>
Cc: libstdc++@gcc.gnu.org
Subject: RE: [PATCH] Implement P0966 std::string::reserve should not shrink

Thanks for the input.  I will make the suggested changes.

I didn't add a changelog entry yet (which I was planning to do - just wanted to send this out before to get some feedback on the code first).  I will also take care of the copyright assignment (should I send an email to gcc@gcc.gnu.org or are you the maintainer that is taking care of this contribution?)... Next patch I will send to both libstdc++ and gcc-patches as well.  Somehow I missed that.

One possibility to avoid changing the behavior for pre-C++2a code is to put the check in a "#if __cplusplus > 201703L" block and force reserve(size_type) to be inline (if we want to go this route, I think the helper function _M_request_capacity would have to stay, as reserve could shrink in C++17 mode or earlier).  Other than that, with my current patch, reserve() still works as previously, however reserve(0) has changed (I don't know how common it was to use reserve(0) instead of simply reserve())...

As for the split of reserve - wouldn't this kind of code break in previous -std modes:

auto f(&::std::string::reserve); // Ambiguous post C++17

As for deprecation, I will add it to the no-arg overload.  Regarding giving the deprecation notice for all dialects, technically speaking, it was only deprecated in C++2a so should someone receive that deprecation message if they are compiling with -std=c++98, for example?  I don't know what the general policies around this are for GCC/libstdc++.



-----Original Message-----
From: Jonathan Wakely <jwakely@redhat.com>
Sent: Tuesday, January 22, 2019 6:46 AM
To: Andrew Luo <andrewluotechnologies@outlook.com>
Cc: libstdc++@gcc.gnu.org
Subject: Re: [PATCH] Implement P0966 std::string::reserve should not shrink

On 22/01/19 08:34 +0000, Andrew Luo wrote:
>Please see the attached patch.  Still a work in progress, but mostly done.
>For reference:

Thanks for the patch, Andrew. We'd need a changelog entry and copyright assignment to use this, see https://gcc.gnu.org/contribute.html

All patches requesting review need to be CC'd to the gcc-patches list (and libstdc++ ones need to be sent to the libstdc++ list too).

I've added some comments inline below, and attached a proof-of-concept showing how I'd rather implement this change.

>Index: libstdc++-v3/config/abi/pre/gnu.ver
>--- libstdc++-v3/config/abi/pre/gnu.ver	(revision 268105)
>+++ libstdc++-v3/config/abi/pre/gnu.ver	(working copy)
>@@ -2238,6 +2238,14 @@
>     # _Sp_make_shared_tag::_S_eq
>     _ZNSt19_Sp_make_shared_tag5_S_eqERKSt9type_info;
>+    # basic_string::_M_request_capacity
>+ _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE19_M_re
>+ quest_capacityEm;

You'd need to export the Copy-On-Write versions of _M_request_capacity too. It looks like the tests haven't been run for the COW string, as I'd expect them to fail with an undefined reference to _M_request_capacity, and because char/18654.cc relies on the shrink behaviour to pass the tests. That's easy to fix though, just note the actual capacity before the shrink request, and verify it hasn't

--- a/libstdc++-v3/testsuite/21_strings/basic_string/capacity/char/18654.cc
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/capacity/char/18654
+++ .cc
@@ -47,12 +47,14 @@ void test01()
       string str(i, 'x');
       str.reserve(3 * i);
+      const size_type cap = str.capacity();
+      VERIFY( cap >= 3 * i );
       str.reserve(2 * i);
-      VERIFY( str.capacity() == 2 * i );
+      VERIFY( str.capacity() == cap );
-      VERIFY( str.capacity() == i );
+      VERIFY( str.capacity() == cap );

>+    # basic_string::reserve() (C++2a)
>+    _ZNSs7reserveEv;
>+    _ZNSbIwSt11char_traitsIwESaIwEE7reserveEv;
>+ _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE7reserv
>+ eEv;
> } GLIBCXX_3.4.25;

There are existing patterns which already match these new reserve() overloads, so they would need to be altered to not match the new ones:

@@ -224,7 +224,10 @@ GLIBCXX_3.4 {
-    _ZNSs[67][j-z]*E[PRcjmvy]*;
+    _ZNSs6rbeginEv;
+    _ZNSs6resizeE[jmy]*;
+    _ZNSs7replaceE[jmy]*;
+    _ZNSs7reserveE[jmy];
@@ -1740,7 +1743,7 @@ GLIBCXX_3.4.21 {
-    _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE7reserve*;
+ _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE7reserve
+ E[jmy];

> # Symbols in the support library (libsupc++) have their own tag.
>Index: libstdc++-v3/include/bits/basic_string.h
>--- libstdc++-v3/include/bits/basic_string.h	(revision 268105)
>+++ libstdc++-v3/include/bits/basic_string.h	(working copy)
>@@ -420,6 +420,9 @@
>       void
>       _M_erase(size_type __pos, size_type __n);
>+      void
>+      _M_request_capacity(size_type __requested_capacity);

I think it would be better to not add this new function. Currently we have a single reserve(size_type) function which handles the growing case, and the shrinking case. If the post-P0966 world means you have to call separate functions for those cases (reserve to grow, and shrink_to_fit to shrink) then it's suboptimal to use a single implementation. No callers of reserve want the shrinking path, and no callers of shrink_to_fit want the growing path.

See the attached patch for my preferred solution. It makes the definition of reserve(size_type) smaller, and hopefully faster, and more likely to be inlined.

This has another minor advantage, which is that shrink_to_fit() can now work when -fno-exceptions is used, if the string fits in the SSO buffer (previously it always did nothing for -fno-exceptions just because it was simplest to implement as a call to reserve(0)).

>     public:
>       // Construct/copy/destroy:
>       // NB: We overload ctors in some cases instead of using default 
>@@ -1014,8 +1017,21 @@
>        *  data.
>        */
>       void
>+#if __cplusplus > 201703L
>+      reserve(size_type __res_arg);
>       reserve(size_type __res_arg = 0);
>+#if __cplusplus > 201703L
>+	  /**
>+	   *  Deprecated function, added for compatibility
>+	   */

We want to use the [[deprecated]] attribute, not a comment. And there doesn't seem to be any point in only conditionally making the split from one reserve function into two overloaded functions. Since you've changed the behaviour of reserve(0) unconditionally, why not just declare the reserve() member unconditionally too, instead of limiting it to C++17? That means we can issue a deprecation warning for all modes.

>+      void
>+      reserve()
>+      { }

It's also unfortunate to have to add new symbols to the shared library for a deprecated function that will be removed at some point.

I'm not sure what to do about that. For now my patch doesn't remove the default argument, but we could do this:

@@ -1014,7 +1032,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
        *  data.
-      reserve(size_type __res_arg = 0);
+      reserve(size_type __res_arg);
+      __attribute__((__always_inline__))
+      __attribute__((__deprecated__("use shrink_to_fit() or reserve(0)")))
+      void reserve() { }

        *  Erases the string, making it empty.
@@ -3979,7 +4006,11 @@ _GLIBCXX_END_NAMESPACE_CXX11
        *  data.
-      reserve(size_type __res_arg = 0);
+      reserve(size_type __res_arg);
+      __attribute__((__always_inline__))
+      __attribute__((__deprecated__("use shrink_to_fit() or reserve(0)")))
+      void reserve() { this->reserve(0); }
        *  Erases the string, making it empty.

The changes to the gnu.ver linker script shown above are still needed here, to prevent the new overloads being exported with the wrong symbol versions. But the always_inline attributes mean we don't need to export new symbols.

For the COW string it still calls reserve(0) to un-share the string.

But there's a high level problem with this change: it makes it awkward to do shrink_to_fit in C++98 mode. There's no shrink_to_fit() member for C++98, and you're removing the reserve(0) behaviour. Usually we'd implement resolutions for LWG issues in all relevant -std modes, but in this case I'm not sure we want to do that. However, because std::string and std::wstring are explicitly instantiated in the library, it's not easy to provide different behaviour for -std=c++98 and other modes.

Maybe the answer is for the new zero-argument reserve() overload to do shrink-to-fit in C++98 mode. By marking it always_inline we can give it different behaviour in C++98 and later modes, because the explicit instantiation in the library won't be used for that function.

>@@ -951,16 +963,11 @@
>     basic_string<_CharT, _Traits, _Alloc>::
>     reserve(size_type __res)
>     {
>-      if (__res != this->capacity() || _M_rep()->_M_is_shared())
>-        {
>-	  // Make sure we don't shrink below the current size
>-	  if (__res < this->size())
>-	    __res = this->size();
>-	  const allocator_type __a = get_allocator();
>-	  _CharT* __tmp = _M_rep()->_M_clone(__a, __res - this->size());
>-	  _M_rep()->_M_dispose(__a);
>-	  _M_data(__tmp);
>-        }
>+      // P0966 reserve should not shrink
>+      if (__res <= capacity())
>+	return;

You're removing the property that calling reserve on a COW string guarantees it is no longer shared. I think we need to keep that (as my patch does).

Given that this change (either your version, or mine) alters the semantics of std::string in all -std modes, I don't think we can make this change now. It will have to wait for GCC 10, so that the changes have more time on trunk.

More information about the Libstdc++ mailing list