Bug 59048 - operator== between std::string and const char* slower than strcmp
Summary: operator== between std::string and const char* slower than strcmp
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2013-11-08 10:59 UTC by Luca Stoppa
Modified: 2019-07-28 07:20 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-05-19 00:00:00


Attachments
testcase (518 bytes, text/plain)
2013-11-08 11:34 UTC, Luca Stoppa
Details
not inlined memcmp used. (599 bytes, text/plain)
2013-11-08 13:47 UTC, Luca Stoppa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luca Stoppa 2013-11-08 10:59:48 UTC
Template functions
bool operator==( const char*, const std::string& ) and
bool operator==( const std::string&, const char* )
creates unecessary temporary std::string object. I'm using mainly GCC 4.4.7, but I have tested GCC 4.8.3 and the behavior is exactly the same.

Look at this simple example:
1) here we call operator==(std::string&, const char*):
size_t f(const std::string &str)
{
    size_t result = 0;
    size_t len = str.size();
    for (size_t i=0; i<len; ++i)
        if (str == "ST")
            result += i;
    return result;
}

2) here we call operator==(const char*, const std::string&)
size_t h(const std::string &str)
{
    size_t result = 0;
    size_t len = str.size();
    for (size_t i=0; i<len; ++i)
        if ("ST" == str)
            result += i;
    return result;
}

3) here a basic const char* version
size_t ii(const char *str)
{
    size_t result = 0;
    size_t len = strlen(str);
    for (size_t i=0; i<len; ++i)
        if (0 == strcmp(str,"ST"))
            result += i;
    return result;
}

4) here a mixed version: std::string compared with strcmp().
size_t g(const std::string &str)
{
    size_t result = 0;
    size_t len = str.size();
    for (size_t i=0; i<len; ++i)
        if (0 == strcmp(str.c_str(),"ST"))
            result += i;
    return result;
}

This is the main I used to test these functions:
int main(int argc, char **argv )
{
    long how_many_times=atol( argv[1] );
    std::string events[]={ "CASH", "EQ", "FI", "FT", "FWD", "OP", "ST" };
    size_t result=0;
    for (size_t i=0; i<how_many_times; ++i)
        for (size_t j=0; j<elements(events); ++j)
            result += f(events[j]);

    std::cout <<result <<std::endl;
    return 0;
}

Few things to notice: running time ./a.out
f() will produce:
bash-4.1$ time ./a.out 10000000
10000000
real    0m4.222s

g() will produce:
bash-4.1$ time ./a.out 10000000
10000000
real    0m1.036s

h() will produce:
bash-4.1$ time ./a.out 10000000
10000000
real    0m4.223s

ii() (if we change in main() std::string events[]={...} into const char* events[]={...}) will produce:
bash-4.1$ time ./a.out 10000000
10000000
real    0m1.266s
if I remove the call to strlen() will be basically 0seconds.

Which is the problem?
The problem here is that: why f()/h() are taking basically 4times more then g()? The only different is how we compare strings. It seems like that f() and h() are creating a temporary std::string object. Shouldn't we have the same performance? It seems like the compare() method of the char_trait<char> could be better implemented.

As a final notice: I have compiled all examples with g++ -O3 (Linux 64 and MacOS Snow Lion).
Comment 1 Jonathan Wakely 2013-11-08 11:12:10 UTC
4.4.7 has been unmaintained for years.

Please provide a proper testcase, not several incomplete snippets that force people to recreate your tests.
Comment 2 Jonathan Wakely 2013-11-08 11:18:16 UTC
(In reply to Luca Stoppa from comment #0)
> Template functions
> bool operator==( const char*, const std::string& ) and
> bool operator==( const std::string&, const char* )
> creates unecessary temporary std::string object.

Where?

> It seems like that f()
> and h() are creating a temporary std::string object.

If you look at the code or use a debugger you can see there is no temporary created, so you'll need to explain why you think that.
Comment 3 Luca Stoppa 2013-11-08 11:34:48 UTC
Created attachment 31181 [details]
testcase

g++ -O3 mapping.cpp

time ./a.out 10000000 f
time ./a.out 10000000 g
time ./a.out 10000000 h
time ./a.out 10000000 i

will execute the 4 different paths in the code.
Comment 4 Paolo Carlini 2013-11-08 11:37:14 UTC
Indeed, I had a look to f and the correct operator== and compare are called, no temporaries. By the way, type_traits<char> uses memcmp not strcmp.
Comment 5 Luca Stoppa 2013-11-08 11:41:55 UTC
Thanks a lot,
so if memcmp() is called, how can the difference in performance be explained?
In short:

std::string s="something";

if (s == "something") { ... }

and

if (0 == strcmp(s.c_str(),"something")) { ... }
? Probably I'm doing something wrong, or my testcase is invalid?
Comment 6 Paolo Carlini 2013-11-08 11:46:03 UTC
An important detail is that the compare functions aren't inline, and are exported for basic_string<char>. Thus, for a fair comparison, the strcmp should be in an attribute noinline helper (to be 100% correct, should be a memcmp). I'm pretty sure this is all there is to it.
Comment 7 Jonathan Wakely 2013-11-08 11:46:45 UTC
The only major difference I see is that in the operator== case you make a call to a PIC function in libstdc++.so, whereas strcmp can be inlined.

There's no temporary created though.
Comment 8 Paolo Carlini 2013-11-08 11:50:40 UTC
Right, it's also PIC.
Comment 9 Luca Stoppa 2013-11-08 13:47:58 UTC
Created attachment 31182 [details]
not inlined memcmp used.
Comment 10 Luca Stoppa 2013-11-08 14:01:52 UTC
Hi,
honestly I don't know what PIC means, but I did like you suggested. I have added a wrapper to memcmp() that is not inlined.
__attribute__((noinline)) int memcmp_not_inlined
(const char *s1, const char *s2, size_t bytes) { return memcmp(s1,s2,bytes); }

Basically I got exactly the same results.
Comment 11 Marc Glisse 2013-11-08 19:23:47 UTC
memcmp is pure and gcc manages to pull it out of the loop (see the file created by -fdump-tree-optimized). It is funny that -fno-builtin-strcmp makes the code more than 2 times faster (the main difference I can see is using "repz cmpsb" instead of a call to the libc function strcmp).
Comment 12 Marc Glisse 2014-01-09 17:21:34 UTC
Configuring libstdc++ with --disable-extern-template "fixes" the issue (I actually only edited bits/c++config.h).

I don't think we can mark the functions with attribute pure, and even if we do it doesn't seem sufficient (not sure what else gcc wants).
Comment 13 Ondrej Bilka 2015-06-02 22:48:59 UTC
Yes, this is duplicate of following.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
Until thats fixed you should compile everything with -fno-builtin-strcmp -fno-builtin-memcmp
Comment 14 Manuel López-Ibáñez 2015-06-03 06:34:01 UTC
Let's mark it as a duplicate then.

*** This bug has been marked as a duplicate of bug 43052 ***
Comment 15 Marc Glisse 2015-06-03 07:54:32 UTC
(In reply to Ondrej Bilka from comment #13)
> Yes, this is duplicate of following.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

No it isn't. I made a comment related to that other PR, but IIRC the main point here is that the function is not pulled out of the loop, which is independent of how memcmp gets expanded.
Comment 16 Martin Sebor 2017-05-19 02:09:02 UTC
Confirmed.

I don't think the char* overload can ever match the string overload because the former has to compute the length of the char* argument and (possibly also) compare the strings, while the latter can avoid the first step because the lengths of both arguments are precomputed.

But the char* overload can be sped up by avoiding string::compare which always traverses the strings, when operator== only has to do that if their lengths are the same.

It also doesn't help that string::compare is not expanded inline (because std::string is declared extern).

In my tests, the following definition speeds the operator up by a factor of 5, making it about 20% faster than strcmp:

  template<typename _CharT>
    inline bool
    operator==(const basic_string<_CharT>& __lhs,
        const _CharT* __rhs)
  {
    return !__lhs.compare (__rhs);

    typedef typename basic_string<_CharT>::size_type size_type;
    const size_type __len1 = __lhs.length ();
    const size_type __len2 = __builtin_strlen (__rhs);

    return __len1 == __len2
      && !char_traits<_CharT>::compare (__lhs.data (), __rhs,
					__len1 < __len2 ? __len1 : __len2);
  }