[PATCH] improve string find algorithm

Jonathan Wakely jwakely@redhat.com
Fri Jan 6 15:34:00 GMT 2017


On 06/01/17 16:26 +0100, Jakub Jelinek wrote:
>On Fri, Jan 06, 2017 at 03:16:12PM +0000, Jonathan Wakely wrote:
>> On 06/01/17 16:09 +0100, Jakub Jelinek wrote:
>> > On Fri, Jan 06, 2017 at 08:42:15AM -0600, Aditya Kumar wrote:
>> > > Yes, we do.
>> > > Sorry for the mistake, it happened because I first wrote this for
>> > > libcxx (https://reviews.llvm.org/D27068) and while porting that line
>> > > got missed.
>> >
>> > Shouldn't find at least in the case where it is narrow char string
>> > just use C library memmem?  That implements a Two-Way searching algorithm
>> > with some improvements from Boyer-Moore.
>> > Otherwise, consider what even your modified version will do for
>> >
>> > #include <string>
>> > int main() {
>> > (std::string(10000000, 'a')+"b").find(std::string(1000000, 'a')+"b");
>> > }
>>
>> Yes, I have an incomplete local patch that checks for memmem and uses
>> it where possible. This change would still be valuable for non-GNU
>> targets though.
>
>The description of Two-Way algorithm is in
>http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
>Boyer-Moore in
>http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm

And in std::boyer_moore_searcher in <functional> :-)
We also have std::boyer_moore_horspool_searcher in <functional>.

We could try allocating memory in std::string::find() and fall back to
the dumb algorithm if it fails, but we don't want to throw bad_alloc.




More information about the Gcc-patches mailing list