Tree-Based and Trie-Based Text find Find Timing Test

Description

This test inserts a number of values with keys from an arbitrary text ([wickland96thirty]) into a container, then performs a series of finds using find. It measures the average time for find as a function of the number of values inserted.

(The test was executed with text_find_timing_test thirty_years_among_the_dead_preproc.txt 200 200 2100)

Purpose

The test checks the effect of different underlying data structures.

Results

Figures NTTG, NTTM, and NTTL show the results for the native, tree-based, and trie-based types in g++, local, and local, respectively.

no image
NTTG: Native, tree-based, random int find timing test using find - g++

In the above figure, the names in the legends have the following meaning:

  1. splay_tree_map- tree with Tag = splay_tree_tag , and Node_Update = null_tree_node_update
  2. ov_tree_map- tree with Tag = ov_tree_tag , and Node_Update = null_tree_node_update
  3. n_map- std::map
  4. rb_tree_map- tree with Tag = rb_tree_tag , and Node_Update = null_tree_node_update
no image
NTTM: Native, tree-based, random int find timing test using find - msvc++

In the above figure, the names in the legends have the following meaning:

  1. splay_tree_map- tree with Tag = splay_tree_tag , and Node_Update = null_tree_node_update
  2. ov_tree_map- tree with Tag = ov_tree_tag , and Node_Update = null_tree_node_update
  3. n_map- std::map
  4. rb_tree_map- tree with Tag = rb_tree_tag , and Node_Update = null_tree_node_update
no image
NTTL: Native, tree-based, random int find timing test using find - local

Observations

For this setting, a splay tree (tree with Tag = splay_tree_tag) does not do well. This is possibly due to two reasons:

  1. A splay tree is not guaranteed to be balanced [motwani95random]. If a splay tree contains n nodes, its average root-leaf path can be m >> log(n).
  2. Assume a specific root-leaf search path has length m, and the search-target node has distance m' from the root. A red-black tree will require m + 1 comparisons to find the required node; a splay tree will require 2 m' comparisons. A splay tree, consequently, can perform many more comparisons than a red-black tree.

An ordered-vector tree (tree with Tag = ov_tree_tag), a red-black tree (tree with Tag = rb_tree_tag), and the native red-black tree all share approximately the same performance.

An ordered-vector tree is slightly slower than red-black trees, since it requires, in order to find a key, more math operations than they do. Conversely, an ordered-vector tree requires far lower space than the others. ([austern00noset], however, seems to have an implementation that is also faster than a red-black tree).

A PATRICIA trie (trie with Tag = pat_trie_tag) has good look-up performance, due to its large fan-out in this case. In this setting, a PATRICIA trie has lookup performance comparable to a hash table (see Hash-Based Text find Find Timing Test), but it is order preserving. This is not that surprising, since a large fan-out PATRICIA trie works like a hash table with collisions resolved by a sub-trie. A large fan-out PATRICIA trie does not do well on modifications (see Tree-Based and Trie-Based Text Insert Timing Test). It is possibly beneficial to semi-static settings, therefore.

Observations::Tree-Like-Based Container Types summarizes some observations on tree-based and trie-based containers.