This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Search algorithms in __gnu_cxx::


On 9/15/06, Paolo Carlini <pcarlini@suse.de> wrote:
Dhruv Matani wrote:
> On this page:
> http://www.movsd.com/bm.htm
> you can find a complete description of the 3 different kinds of tables
> which _may_ be used in any Boyer Moore algorithm implementation. I am
> using the Good Character & Bad Character shift tables ONLY. As you can
> see from the description, that's perfectly fine to do.

To be clear, it's not about *me*, it's about implementing something well
known and of known properties. That means, the pointer above must be
part of the documentation, as a comment in the code, at least. Likewise
for any other extension we may add in the future.

Ok sure thing. I'll make sure to comment these things in as well :-)


>> Also, I think we have to do something for that unordered_map, I don't
>> think covering the general case via lookups in a map leads to something
>> close to the original spirit, in any possible sense...
> Nope, it doesn't because the algorithm wasn't meant for any search,
> but was designed for string searching in particular. Hence the use of
> the static tables, etc.... However, using the unordered_map<> as an
> approximation to preserve generality will only hurt the performance
> and not the correctness. Again, since this is an extension, people
> will use it only for strings if we document it accordingly.

No, we don't want to do that. Because using a map like that means that
all the complexity figures of the Boyer-Moore algorithm are not valid
anymore.

Yes, you are kind of right, but not entirely. The complexity remains the same. It's just that the constants in the big oh notation get blown out of proportion because of the lookup times in the hash maps.

Therefore, either we find a better way to deal with that
problem (I suggest we don't give up so quickly, no reason to rush) or we
have to deliver the algorithm only for chars.

I'm not ruling out any possibilities, but in all fairness, I think the Boyer-Moore(and other string matching) algorithms rely on the fact that these tables could be created and looked up fairly quickly(comparable to character compare time)


Regards, -Dhruv.




-- -Dhruv Matani. http://www.geocities.com/dhruvbird/

"Life is wasted on the living"
 - Zaphod Beeblebrox the Fourth.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]