Search algorithms in __gnu_cxx::

Dhruv Matani
Wed Sep 13 13:56:00 GMT 2006

On 9/13/06, Paolo Carlini <> wrote:
> Dhruv Matani wrote:
> >  I have with me, presently a working implementation of Boyer Moore.
> > However, the only problem is that the lookup using hash_map is very
> > slow, since it needs to be done at every stage. If I use std::map,
> > it's even slower. However, while testing with strings(char arrays), I
> > tried using a static array of 256 entries(one for each char), and that
> > made whole process faster by about 70%, which is a LOT!!!! So, I'm
> > contemplating providing specialization for char, unsigned char, and
> > string for which this algo will probably be used the most, and have
> > that implementation use the array logic, while the other
> > implementations can use __gnu_cxx::hash_map<>. Again, I hope no one
> > uses this algo for anything other than string searching, else they
> > will be in for a big performance hit.

> A couple of quick comments.
> 1/ Please, dont'use the legacy __gnu_cxx stuff for new projects. That
> code has small and large bugs which we don't want to spend time on and
> now we have got much better and soon-to-be-standardized replacement in
> tr1 (also performing quite a bit better in practice, as far as I know).

Ok, no problem.

> 2/ If we are really going for benchmarks, I'd like to see something
> more, for instance also the case of a string which actually does occur
> and, in case of strings, a comparison to the current string::find, which
> is very simple, still often so much better than the previous best
> relying on std::search.

Here are numbers for strings that do occur in the text:
Boyer Moore             std::search
  8.485s                          13.257s

That said, since I don't have the patched library with the find()
member implemented NOT as using std::search, I'll get down to testing
that as soon as I get it.

> That said, I think we should concentrate first
> on a solid, neat, generic implementation, only at a later stage provide
> specializations.

What I meant was not to provide a specialization of the algorithm
proper, but of the __bm_struct<> implementation when instantiated with
char, string, etc.... so that instead of using the hash(associative)
containers, we can use an array.

> For sure, that kind of algorithm is not going to beat
> in every possible circumstance the simplistic ones (we knew that at the
> outset) but we are anyway confident that a good implementation can be
> useful to the users, that's why there is that much research going on
> pattern matching, BM variants and so on and so on...
> Paolo.


   -Dhruv Matani.

"Life is wasted on the living"
  - Zaphod Beeblebrox the Fourth.

More information about the Libstdc++ mailing list