Bug 66414 - string::find ten times slower than strstr
Summary: string::find ten times slower than strstr
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-04 07:59 UTC by Ondrej Bilka
Modified: 2017-01-09 13:06 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-12-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ondrej Bilka 2015-06-04 07:59:40 UTC
Hi, as I seen bug with string::== being slower than using strcmp I decided to check other functions for regressions. Here string::find doesn't simply call optimized strstr and as result its ten times slower in following benchmark on my sandy bridge.


#include <cstring>
#include <cstdlib>
#include <string>
using namespace std;

int
main()
{
  int i;
  char s[10000];
  for (i = 0; i < 10000; i++)
    s[i] = ((unsigned) rand() % 128) + 1;
  s[9999] = 0;
  int sum = 0;
  std::string foo = s;
  std::string bar;
  char *needle = strdup("needle");

  for (i = 0; i < 100000; i++)
    {
      needle[0] = ((unsigned) rand() % 128) + 1;
#ifdef STRSTR
      sum += (long) strstr(s, needle);
#else
      sum += foo.find(needle);
#endif
    }
  return sum;
}
Comment 1 ycn65201 2016-11-30 10:35:43 UTC
Hello

Also found this bug, along with other article in which it is explained why it is slow:

http://0x80.pl/notesen/2016-10-08-slow-std-string-find.html


Please fix it
Comment 2 Jonathan Wakely 2016-11-30 12:35:44 UTC
Ondrej, which version is this a regression against?
Comment 3 Jonathan Wakely 2016-12-05 10:41:16 UTC
I think we can simply hoist the code out into a helper function and then overload that for the common char_traits<char> and char_traits<wchar_t> cases, where we know the char_traits doesn't do anything funky and we can forward to optimized libc calls. The generic version can't do that.
Comment 4 Jonathan Wakely 2016-12-05 13:01:19 UTC
Although we can only do that for the overload that assumes null-terminated input, not all cases that basic_string::find supports.

We can use the GNU memmem extension for the general case, but that's not standard.
Comment 5 Jonathan Wakely 2016-12-05 15:17:06 UTC
This patch would implement the requested change:

--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -2291,9 +2291,28 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
        *  it begins.  If not found, returns npos.
       */
       size_type
-      find(const _CharT* __s, size_type __pos = 0) const
+      find(const _CharT* __s, size_type __pos = 0) const _GLIBCXX_NOEXCEPT
       {
        __glibcxx_requires_string(__s);
+       if (__are_same<_Traits, char_traits<_CharT>>::__value)
+         {
+           if (__are_same<_CharT, char>::__value)
+             {
+               const char* __data = (const char*)data();
+               const char* __p
+                 = __builtin_strstr(__data + __pos, (const char*)__s);
+               return __p ? (__p - __data) : npos;
+             }
+#if _GLIBCXX_USE_WCHAR_T
+           else if (__are_same<_CharT, wchar_t>::__value)
+             {
+               const wchar_t* __data = (const wchar_t*)data();
+               const wchar_t* __p
+                 = __builtin_wcsstr(__data + __pos, (const wchar_t*)__s);
+               return __p ? (__p - __data) : npos;
+             }
+#endif
+         }
        return this->find(__s, __pos, traits_type::length(__s));
       }
 

But it causes a huge regression in this example, where currently it is hundreds of times faster than strstr:

#include <string>
#include <cstring>
#include <chrono>
#include <iostream>
#include <cassert>

int main()
{
  std::string haystack(5000, 'a');
  std::string needle = haystack;
  needle.back() = 'b';
  std::size_t pos[1000];

  auto start = std::chrono::high_resolution_clock::now();
  for (auto& p : pos)
    p = haystack.find(needle.c_str());
  auto stop = std::chrono::high_resolution_clock::now();
  std::cout << (stop - start).count() << std::endl;
  for (auto p : pos)
    assert(p == std::string::npos);

  start = std::chrono::high_resolution_clock::now();
  for (auto& p : pos)
  {
    auto ptr = std::strstr(haystack.c_str(), needle.c_str());
    p = ptr ? ptr - haystack.c_str() : std::string::npos;
  }
  stop = std::chrono::high_resolution_clock::now();
  std::cout << (stop - start).count() << std::endl;
  for (auto p : pos)
    assert(p == std::string::npos);
}

This is because the current code uses char_traits::length() (i.e. strlen) first, so if the needle is longer than the haystack we don't bother looking for it.

I don't think it's clear that using strstr is an improvement.

The C++17 library provides Boyer-Moore and Boyer-Moore-Horspool searchers, so that already offers you more control than basic_string::find or strstr.
Comment 6 AK 2016-12-07 17:50:46 UTC
I have posted a patch up for review for string::find which might help as well.

https://gcc.gnu.org/ml/libstdc++/2016-12/msg00051.html

Please give feedback for improvement.
-Aditya
Comment 7 Jonathan Wakely 2017-01-09 13:06:30 UTC
Author: redi
Date: Mon Jan  9 13:05:58 2017
New Revision: 244225

URL: https://gcc.gnu.org/viewcvs?rev=244225&root=gcc&view=rev
Log:
PR66414 optimize std::string::find

2017-01-09  Jonathan Wakely  <jwakely@redhat.com>
	    Aditya Kumar  <hiraditya@msn.com>

	PR libstdc++/66414
	* include/bits/basic_string.tcc
	(basic_string::find(const CharT*, size_type, size_type)): Optimize.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/basic_string.tcc