This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Fwd: Re: Search algorithms in __gnu_cxx::


On 11/16/06, Chris Jefferson <chris@bubblescope.net> wrote:
On 15/11/06, Dhruv Matani <dhruvbird@gmail.com> wrote:
>
> Here are the numbers. Attached is the file used.
>
> +-------------------+---------+
> | algo              | time    |
> +-------------------+---------+
> | Boyer Moore       | 9.475s  |
> | std::search       | 12.283s |
> | strstr            | 13.184s |
> | std::string::find | 20.166s |
> +-------------------+---------+

Just out of interest, as these "best case" numbers? What happens in
the worst cases you can construct for your algorithm? Obvious examples
are searching for a very short string.

Yes, that is an interesting observation. I have instrumented the test case to account for 5 types of strings to be searched. However, the input remains a large file.

The different types of input strings are:
/*
 #define INPUT_DATA 1 // Search for strings that occur in the input
with a decent frequency.
 #define INPUT_DATA 2 // Search for strings that don't occur in the input data.
 #define INPUT_DATA 3 // High frequency short strings.
 #define INPUT_DATA 4 // Low frequence short strings.
 #define INPUT_DATA 5 // Low frequency long strings.
 *
 */

Here are the timings:

+-------------+-------+---------+
| algo        | input | time    |
+-------------+-------+---------+
| Boyer Moore |     1 | 9.414s  |
| std::search |     1 | 12.088s |
| Boyer Moore |     2 | 6.893s  |
| std::search |     2 | 8.573s  |
| Boyer Moore |     3 | 12.746s |
| std::search |     3 | 15.638s |
| Boyer Moore |     4 | 10.002s |
| std::search |     4 | 14.723s |
| Boyer Moore |     5 | 1.895s  |
| std::search |     5 | 6.490s  |
+-------------+-------+---------+

As you can see, the boyer moore implementation is always faster than
std::search.

Attached is the input file, and test program if you wish to verify the
timings locally.
run as: time ./a.out rtest.html

CPU: Intel(R) Pentium(R) 4 CPU 2.66GHz
cache size      : 1024 KB



Regards,
-Dhruv.



--
  -Dhruv Matani.
http://www.geocities.com/dhruvbird/

"Be sure brain is in gear before engaging mouth"
     -- Anonymous




-- -Dhruv Matani. http://www.geocities.com/dhruvbird/

"Be sure brain is in gear before engaging mouth"
     -- Anonymous

Attachment: bm_test.tar.bz2
Description: BZip2 compressed data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]