This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[00/13] Share hash traits between hash_table and hash_set


When reviewing Mikhail's patch, I objected to another local "traits for
hashing a string" class.  But when I looked into it, I realised it really
is quite hard to unify the current hash traits.

At the moment we need separate traits classes for hash_table, hash_set
and hash_map.  I think that's a sign that we don't have the right
abstraction.

The aim of this series is to unify the traits for hash_table and hash_set.
If that's OK, I'll look at adding default hash_map traits for the common
case in which the deleted/empty information is stored in the key rather
than the value and where hash_table traits could be used.

There were various things I needed to change along the way:

- Our one generic hasher, pointer_hash, inherits from typed_noop_remove.
  In other words, it's specifically for cases where the pointer isn't
  freed when an entry is deleted.  That seems a bit arbitrary.  I think
  we should separate out the hashing and the removing aspects.

  Also, typed_noop_remove and typed_free_remove don't provide hashing typedefs,
  but the ggc equivalents ggc_hasher and ggc_cache_hasher do.  If we separate
  the hashing and removing aspects, the typedefs belong to the hashing side.

  This gives:

  "Hashers"
    - the only generic one is pointer_hash, but I'll follow up with a string
      one if the series is OK

  "Removers"
    - typed_noop_remove (as it exists now)
    - typed_free_remove (as it exists now)
    - ggc_remove (renamed from ggc_hasher, but without the typedefs)
    - ggc_cache_remove (renamed from ggc_cache_hasher, but without the typedefs)

  These are then combined to give:
    - nofree_ptr_hash
    - free_ptr_hash
    - ggc_ptr_hash
    - ggc_cache_ptr_hash

  For this particular case, it might be cleaner to use a vec<>-style
  parameter for the allocator (really just freer), e.g.

    hash_table<foo, da_gc>,
    hash_table<foo, da_heap>

  I did try that, but it felt unnatural in the cases where the remove()
  function frees some subobject as well.  Either you have a separate
  "allocator" that frees both the subdata and the entry itself, or you
  have a separate (hasher?) function for freeing subdata.

  Much of the awkwardness here is due to not having "proper" object
  construction and destruction.  If we had that, the memory management
  would be an integral part of the element type rather than being part
  of the traits.  So I don't think adding a template parameter for the
  element remover would necessarily be forward progress.

  Although we end up with four classes for pointers, and could have at
  least three for strings, most other types have only one.

- The traits have optional is_deleted, is_empty, mark_deleted and
  mark_empty methods, with a default behaviour suitable for pointer elements.
  The methods were probably optional because many pointer traits classes
  were implemented independently, but after the change above, they can
  instead inherit from pointer_hash (or from an allocator-specific
  derivation).  It then makes sense to put the handling of empty and
  deleted pointer entries in pointer_hash and remove the complicated
  code for handling defaults.

- handle_cache_entry is too tied to the internals.  The generic
  implementation is:

    static void
    handle_cache_entry (T &e)
    {
      if (e != HTAB_EMPTY_ENTRY && e != HTAB_DELETED_ENTRY && !ggc_marked_p (e))
	e = static_cast<T> (HTAB_DELETED_ENTRY);
    }

  and a typical override looks like:

    void
    pad_type_hasher::handle_cache_entry (pad_type_hash *&t)
    {
      extern void gt_ggc_mx (pad_type_hash *&);
      if (t == HTAB_EMPTY_ENTRY || t == HTAB_DELETED_ENTRY)
	return;
      else if (ggc_marked_p (t->type))
	gt_ggc_mx (t);
      else
	t = static_cast<pad_type_hash *> (HTAB_DELETED_ENTRY);
    }

  In particular, this bypasses the normal remove() function, so we never
  update m_n_deleted.

  ISTM the (single) caller should be checking for empty and deleted items
  first and handling any deletion itself.  All we want to know from the
  traits is which of the following applies:

  - the entry should be deleted
  - the entry should be kept and needs to be marked
  - the entry should be kept and is already marked (an optimisation of
    the previous case).

  The patch adds a keep_cache_entry function to provide this information.

Series bootstrapped & regression-tested on x86_64-linux-gnu.  Also tested
with config-list.mk.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]