[PATCH] Re: Canonical type nodes, or, comptypes considered harmful

Doug Gregor doug.gregor@gmail.com
Mon Nov 20 10:18:00 GMT 2006


On 09 Nov 2006 23:09:07 -0800, Ian Lance Taylor <iant@google.com> wrote:
> I meant something very simple: for every type, there is a
> TYPE_CANONICAL field.  This is how you tell whether two types are
> equivalent:
>     TYPE_CANONICAL (a) == TYPE_CANONICAL (b)
> That is what I mean when I saw one memory dereference and one pointer
> comparison.

I've gone through and implemented this in the C++ front end. The
general idea is to add two new fields to every _TYPE in GCC:

TYPE_CANONICAL(t) refers to the "canonical" type associated with the
type t, which might be t itself. The "canonical" type is defined by
the language's type equivalence rules, so, e.g., "int" and "typedef
int foo" would both refer to the canonical type "int".

TYPE_STRUCTURAL_EQUALITY(t) is a flag that is set when the type t
needs to use structural equality checks. This happens when we can't
trust TYPE_CANONICAL to tell us for certain when two types are
equivalent, e.g., because we weren't able to maintain canonical types.

The basic approach to the attached patch is simple: find every place
in the C++ front end (and the parts of the C front end that it relies
on) where one duplicates a type node. For each of those places,
propagate canonical type and structural equality information.

Hwo do we find these places? Grep finds most of them (calls to
build_variant_type_copy are the main culprits). To find the rest, the
patch has a canonical types verification mode (lamely enabled/disabled
by a #define in cp/typeck.c) where it performs the structural check
and then checks if using TYPE_CANONICAL would come to the same
conclusion. If not, it errors out. By running the test suites under
this mode, I was able to find the places where we duplicate type nodes
and fix the TYPE_CANONICAL field. No types have actually been
eliminated, so error messages haven't changed, and we probably haven't
saved any memory... but type comparison is very fast, and the cost of
maintaining these new fields doesn't seem to be too ridiculous.

Some brief performance numbers when running "make check-g++" on GCC
with --enable-checking=tree:
  + Without my patch: 5m57.451s
  + Canonical types verification mode: 5m54.356s
  + Canonical types (no verification): 5m36.943s

I can't quite explain why the verification mode is 3s faster than the
unpatched GCC, but it's less than a percent and I'm on a noisy system.
Still, canonical types give us an 18-second improvement (about 5%),
which is promising. Note that I haven't canonicalized function types
yet (they're always structural), which might help improve performance
further; see the latter part of this message for more information
about "troublemaker" type nodes in GCC.

Some brief performance numbers when running "make check" on
libstdc++-v3 testsuite, again with --enable-checking=tree:
  + Without my patch: 17m41.156s
  + Canonical types verification mode: 17m26.116s
  + Canonical types (no verification): 17m12.235s

Only about a 3% speedup here. If I try out some more template-heavy
code (e.g., some of the tests in the Boost Graph Library), I've seen
some speedups up in the 9.6% range. I imagine I could do better with
more sophisticated template libraries.

The attached patch introduces no new regressions on i686-pc-linux-gnu,
in either the canonical types verification mode or the "full-speed"
mode. ChangeLogs for the C and C++ parts follow, although only the C++
front end uses the canonical types right now.

I won't even ask about committing this patch, because it is a straw
man. Despite my testing, this is a risky patch... if we missed a case
where we need to canonicalize types, we will break code until we fix
that case. Most of them are easy, some are not quite as easy. At a
minimum, I'd want to bootstrap/test the C, C++, Objective C++, and
Java front ends, try some template-heavy code in the C++ compiler
(e.g., Boost), and test more platforms before even considering this
patch. That said, I think this is a promising direction. We don't seem
to lose out on performance, and we can see some wins... but it's a big
change.

So, are we interested in pursing this further? If so, several other
questions pop up:

  1) What about the C, Objective-C, Objective-C++ front ends?
  2) Should the canonical types verification mode be a configure-time
switch (--enable-checking=canonical-types)? Or a compiler flag
(-fcheck-canonical-types)? The latter might be a greater help to us,
because we can ask bug submitters to add the flag and provide the
resulting output.
  3) Anyone interested in helping?

  Cheers,
  Doug Gregor
  Open Systems Lab @ Indiana University


      FUNCTION_TYPE, METHOD_TYPE:
        - Default arguments should not be a part of the type
        - Throw specifiers should not be part of the type
        - Many attributes should not be part of the type
        - We can still canonicalize these.

      TYPENAME_TYPE:
        - This node poses some interesting problems, because we can't
      	  result TYPENAME_TYPEs that occur in the return type or type
      	  of out-of-line member definitions until we've seen that they
      	  are, in fact, member definitions. Worse, we might end up
      	  comparing TYPENAME_TYPEs in different contexts, where the
      	  currently open classes differ. Thus, the "canonical" type
      	  for a TYPENAME_TYPE is dependent on the context, so we must
      	  always perform structural comparisons.

      TEMPLATE_TEMPLATE_PARM:
        - When we reduce the level of a template template parameter
          node, we don't also reduce the level of its template
          parameters. This makes it impossible to match the new,
          level-reduced template template parameter to its canonical
          template template parameter type, because the canonical
          template template parameter will have its template
          parameters at a different level. So, when we reduce the
          level of a TEMPLATE_TEMPLATE_PARM, we mark is as requiring
          structural equality.

      UNBOUND_CLASS_TEMPLATE:
        - These nodes always require structural equality checks,
          because they are not hashed. We could add a hash table for
          these nodes.

      INTEGER_TYPE:
        - We force this node to use structural equality when building
          the index type for an array type with value-dependent bounds
          inside a template, because we don't have an effective way to
          hash on expressions. See compute_array_index_type.

      ARRAY_TYPE:
        - We try to layout array types when we build them, even if
          they refer to incomplete types. In this case, we compute the
          alignments incorrectly, creating duplicate ARRAY_TYPE nodes
          that are not easily placed together again. Ideally, we would
          have some kind of "no computed yet" alignment value, that
          gets filled in when necessary.



2006-11-20	Douglas Gregor  <doug.gregor@gmail.com>

	* builtins.c (std_gimplify_va_arg_expr): Keep track of the
	canonical type when building a variant with a different alignment.
	* c-common.c (c_common_nodes_and_builtins): Since variants of
	void_type_node get built before it is given a name, we need to
	give those variants the name, too.
	(handle_packed_attribute): When building the type variant, set the
	canonical type appropriately.
	(handle_unused_attribute): When building the type variant, set the
	canonical type appropriately.
	(handle_aligned_attribute): Ditto.
	(handle_deprecated_attribute): Ditto.
	(complete_array_type): We need to work with the canonical main
	type of the array, from which we will build the qualified version.
	* stor-layout.c (layout_type): When we don't know the
	alignment of a type for which we're building an array, we end up
	guessing wrong, so make the type require structural equality.
	* tree.c (make_node_stat): When we build a new type, it is its
	own canonical type.
	(build_type_attribute_qual_variant): When building an attribute
	variant, its canonical type is the non-attribute variant.
	(build_qualified_type): Ditto.
	(build_distinct_type_copy): When building a distinct type from
	another type, the new type is its own canonical type.
	(build_pointer_type_for_mode): When building a pointer type, also
	build a canonical type pointer.
	(build_reference_type_for_mode): When building a reference type,
	also build a canonical type reference.
	(build_index_type): When we can't hash an index type (e.g.,
	because its maximum value is negative), the index type requires
	structural equality tests.
	(build_array_type): Build the canonical form of an array type.
	(build_function_type): Function types require structural equality,
	because they contain default arguments, attributes, etc.
	(build_method_type_directly): Ditto for method types.
	(build_offset_type): Build the canonical offset type.
	(build_complex_type): Build the canonical vector type.
	* tree.h (TYPE_CANONICAL): New.
	(TYPE_STRUCTURAL_EQUALITY): New.
	(struct tree_type): Added structural_equality, unused_bits,
	canonical fields.

2006-11-20	Douglas Gregor  <doug.gregor@gmail.com>

	* typeck.c (structural_comptypes): Was "comptypes".
	(comptypes): Use canonical type information to perform fast type
	comparison. When VERIFY_CANONICAL_TYPES, verify that the canonical
	type comparison returns the same results as we would see from the
	current, structural check. Support COMPARE_STRUCTURAL when we need
	structural checks.
	* decl.c (typename_compare): Fix comment.
	(build_typename_type): TYPENAME_TYPE nodes require structural
	equality checks, because they resolve different based on the
	current class type.
	(make_unbound_class_template): UNBOUND_CLASS_TEMPLATE nodes
	require structural equality checks (for now).
	(build_ptrmemfunc_type): Build the canonical pointer to member
	function type.
	(compute_array_index_type): Whenever we build a new index type
	to represent the size of an array in a template, we need to mark
	this index type as requiring structural equality. This goes for
	arrays with value-dependent sizes with the current ABI, or all
	arrays with ABI-1.
	* tree.c (cplus_array_hash): New.
	(struct cplus_array_info): New.
	(cplus_array_compare): New.
	(cplus_array_htab): New.
	(build_cplus_array_type_1): Use a hash table to cache the array
	types we build. Build the canonical array type for each array
	type.
	(cp_build_qualified_type_real): When building a cv-qualified array
	type, use the hash table of array types and build canonical array
	types as necessary.
	(bind_template_template_parm): BOUND_TEMPLATE_TEMPLATE_PARM nodes
	use structural equality (for now).
	* cp-tree.h (COMPARE_STRUCTURAL): New.
	* pt.c (canonical_template_parms): New.
	(canonical_type_parameter): New.
	(process_template_parm): Find the canonical type parameter.
	(lookup_template_class): When we have named the primary template
	type, set the canonical type for our template class to the primary
	template type. If any of the template arguments need structural
	equality checks, the template class needs structural equality
	checks.
	(tsubst): When reducing the level of a template template
	parameter, we require structural equality tests for the resulting
	parameter because its template parameters have not had their types
	canonicalized. When reducing a template type parameter, find the
	canonical reduced type parameter.
	(any_template_arguments_need_structural_equality_p): New.
	* name-lookup.c (pushdecl_maybe_friend): When creating a new type
	for a TYPE_DECL, make its canonical type the original type.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: canonical-types.patch
Type: text/x-patch
Size: 30493 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20061120/a0a0a870/attachment.bin>


More information about the Gcc-patches mailing list