This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Search algorithms in __gnu_cxx::


On 9/13/06, Paolo Carlini <pcarlini@suse.de> wrote:
Dhruv Matani wrote:
>  I have with me, presently a working implementation of Boyer Moore.
> However, the only problem is that the lookup using hash_map is very
> slow, since it needs to be done at every stage. If I use std::map,
> it's even slower. However, while testing with strings(char arrays), I
> tried using a static array of 256 entries(one for each char), and that
> made whole process faster by about 70%, which is a LOT!!!! So, I'm
> contemplating providing specialization for char, unsigned char, and
> string for which this algo will probably be used the most, and have
> that implementation use the array logic, while the other
> implementations can use __gnu_cxx::hash_map<>. Again, I hope no one
> uses this algo for anything other than string searching, else they
> will be in for a big performance hit.


A couple of quick comments.
1/ Please, dont'use the legacy __gnu_cxx stuff for new projects. That
code has small and large bugs which we don't want to spend time on and
now we have got much better and soon-to-be-standardized replacement in
tr1 (also performing quite a bit better in practice, as far as I know).

Ok, no problem.


2/ If we are really going for benchmarks, I'd like to see something
more, for instance also the case of a string which actually does occur
and, in case of strings, a comparison to the current string::find, which
is very simple, still often so much better than the previous best
relying on std::search.

Here are numbers for strings that do occur in the text: Boyer Moore std::search 8.485s 13.257s

That said, since I don't have the patched library with the find()
member implemented NOT as using std::search, I'll get down to testing
that as soon as I get it.



That said, I think we should concentrate first
on a solid, neat, generic implementation, only at a later stage provide
specializations.

What I meant was not to provide a specialization of the algorithm proper, but of the __bm_struct<> implementation when instantiated with char, string, etc.... so that instead of using the hash(associative) containers, we can use an array.

For sure, that kind of algorithm is not going to beat
in every possible circumstance the simplistic ones (we knew that at the
outset) but we are anyway confident that a good implementation can be
useful to the users, that's why there is that much research going on
pattern matching, BM variants and so on and so on...

Paolo.



Regards,
-Dhruv.



--
  -Dhruv Matani.
http://www.geocities.com/dhruvbird/

"Life is wasted on the living"
 - Zaphod Beeblebrox the Fourth.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]