Hash-Based Text find Find Timing Test

Description

This test inserts a number of values with keys from an arbitrary text ([ wickland96thirty ]) into a container, then performs a series of finds using find . It measures the average time for find as a function of the number of values inserted.

(The test was executed with text_find_timing_test thirty_years_among_the_dead_preproc.txt 200 200 2100)

Purpose

The test checks the effect of different range-hashing functions, trigger policies, and cache-hashing policies (see Design::Associative Containers::Associative Containers::Hash-Based Containers::Hash Policies and Design::Associative Containers::Hash-Based Containers::Resize Policies ).

Results

Figures NCCG, NCCM and NCCL show the results for the native and collision-chaining types in g++, msvc++, and local, respetively.

no image
NCCG: Native and collision-chaining hash text find timing test using find - g++

In the above figure, the names in the legends have the following meaning:

  1. n_hash_map_ncah- std::tr1::unordered_map with cache_hash_code = false
  2. cc_hash_mod_prime_nea_lc_1div8_1div1_nsth_map- cc_hash_table with Comb_Hash_Fn = direct_mod_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_prime_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/1
  3. cc_hash_mask_exp_nea_lc_1div8_1div2_sth_map- cc_hash_table with Comb_Hash_Fn = direct_mask_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_exponential_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/2
  4. cc_hash_mask_exp_nea_lc_1div8_1div1_nsth_map- cc_hash_table with Comb_Hash_Fn = direct_mask_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_exponential_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/1
  5. cc_hash_mask_exp_nea_lc_1div8_1div2_nsth_map- cc_hash_table with Comb_Hash_Fn = direct_mask_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_exponential_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/2
no image
NCCM: Native and collision-chaining hash text find timing test using find - msvc++

In the above figure, the names in the legends have the following meaning:

  1. n_hash_map_ncah- stdext::hash_map
  2. cc_hash_mod_prime_nea_lc_1div8_1div1_nsth_map- cc_hash_table with Comb_Hash_Fn = direct_mod_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_prime_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/1
  3. cc_hash_mask_exp_nea_lc_1div8_1div1_nsth_map- cc_hash_table with Comb_Hash_Fn = direct_mask_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_exponential_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/1
  4. cc_hash_mask_exp_nea_lc_1div8_1div2_nsth_map- cc_hash_table with Comb_Hash_Fn = direct_mask_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_exponential_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/2
  5. cc_hash_mask_exp_nea_lc_1div8_1div2_sth_map- cc_hash_table with Comb_Hash_Fn = direct_mask_range_hashing , and Resize_Policy = hash_standard_resize_policy with Size_Policy = hash_exponential_size_policy , and Trigger_Policy = hash_load_check_resize_trigger with αmin = 1/8 and αmax = 1/2
no image
NCCL: Native and collision-chaining hash text find timing test using find - local

Observations

In this setting, the range-hashing scheme (See Design::Associative Containers::Hash-Based Containers::Hash Policies ) affects performance more than other policies. As Figure NCCG shows, containers using mod-based range-hashing (including the native hash-based container, which is currently hard-wired to this scheme) have lower performance than those using mask-based range-hashing. A modulo-based range-hashing scheme's main benefit is that it takes into account all hash-value bits. Standard string hash-functions are designed to create hash values that are nearly-uniform as is [ knuth98sorting ].

Trigger policies (see Design::Associative Containers::Hash-Based Containers::Resize Policies ), i.e. the load-checks constants, affect performance to a lesser extent.

Perhaps surprisingly, storing the hash value alongside each entry affects performance only marginally, at least in pb_ds 's implementation. (Unfortunately, it was not possible to run the tests with std::tr1::unordered_map 's cache_hash_code = true , as it appeared to malfuntion.)

Observations::Hash-Based Containers' Policies summarizes some observations on hash-based containers' policies.