This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add gimple subclasses for every gimple code (was Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3))

On Thu, 2013-11-14 at 00:13 -0700, Jeff Law wrote:
> On 11/08/13 12:02, David Malcolm wrote:
> >> I wouldn't mind seeing a small example proof of concept posted to help
> >> those who don't see where this is going understand the goal.  I would
> >> recommend against posting another large patch for inclusion at this time.
> > Attached is a proof-of-concept patch which uses the
> > gimple_statement_switch subclass (as a "gimple_switch" typedef).  This
> > is one of the subclasses that the earlier patch added, which has no new
> > fields, but which carries the invariant that, if non-NULL,
> >     gimple_code (gs) == GIMPLE_SWITCH.
> [ ... ]
> Thanks.  It's pretty much what I expected.  Obviously for other codes 
> there may be a lot more changes that you have to slog through, but I 
> think this example shows the main concepts.
> Presumably in this new world order, the various gimple statement types 
> will continue to inherit from a base class.  That seems somewhat 
> inevitable and implies a certain amount of downcasting (via whatever 
> means we agree upon).  The worry, in my mind is how pervasive the 
> downcasting will be and how much of it we can get rid of over time.
> I may be wrong, but ISTM some of the downcasting is a result of not 
> providing certain capabilities via (pure?) virtual methods.  For 
> example, expand_gimple_stmt_1 seems ripe for implementing as virtual 
> methods.   ISTM you could also have virtuals to build the statements, 
> dump/pretty-print them, verify them, branch/edge redirection, 
> estimations for inlining, etc.  ISTM that would eliminate a good chunk 
> of the downcasting.

FWIW, I prefer the downcasts to adding virtual functions; what I've
tried to do is create a very direct mapping from the status quo,
introducing inheritance to gain the benefits listed earlier in the
thread, whilst only changing "surface syntax".

It seems to me that we're considering the general problem of type-safe
code dispatch: given a type hierarchy, and various sites that want to
vary behavior based on what types they see, how best to invoke the
appropriate code, ensuring that the code that's called "knows" that its
dealing with the appropriate subclass i.e. in a typesafe manner.

There are various idioms for doing this kind of dispatch in C++, a
non-exhaustive list is:

   (a) switches and if/then tests on the GIMPLE_CODE (stmt) - the status
quo, and what my proposed patch continues to do, albeit gaining some
compile-time typechecking using as_a<> for the switch and dyn_cast<> for
the if/then.  This is changing some surface syntax without making major
changes, and gains us compile-time typesafety and IMHO more readable
code, though clearly opinions vary here.   In my (brief) testing, (a)
has no significant effect on compiler performance.

   (b) adding virtual functions to gimple would be another way to handle
type-safe dispatch, but they carry costs:
      (i) they would implicitly add a vtable ptr to the top of every
gimple statement, increasing the memory consumption of the process
      (ii) it's my belief that a virtual function call is more expensive
than the kinds of switch/if+then branching that we're currently doing on
the code - though I don't have measurements to back this up
      (iii) what I call "link-time granularity". A vtable references
every method within it.  I'd love to have a, but to do so,
every vtable would need to be populated with a particular set of
operations at link time - where do we draw the line for "core" gimple
operations, the dispatches performed by every core pass?   The set of
operations will never be complete: some plugin may want to add a new set
of per-gimple-subclass behaviors for some custom gimple pass.

  (c) the "Visitor" design pattern [1] - rather than adding virtual
functions to gimple, instead add them to a visitor class e.g.:

     class gimple_visitor
	/* Visit a statement.  This will dispatch to the appropriate
	   handler below, based on GIMPLE_CODE (stmt), encapsulating
	   the appropriate downcast within a big switch statement.  */
	void visit_stmt (gimple stmt);

	/* Each gimple code gets its own handler.  This class
	   provides an empty implementation of each.  If we want
	   to force overrides, we could have an abstract_gimple_visitor
	   base class above this one that has all of these be pure
	   virtual.  */
	virtual void visit_cond (gimple_cond stmt) {}
	virtual void visit_switch (gimple_switch stmt) {}
	virtual void visit_assign (gimple_assign stmt) {}
	virtual void visit_phi (gimple_phi phi) {}
	/* etc */

   Example of a subclass:

     class gimple_pretty_printer : public gimple_visitor
	/* Each of these implements the subclass-specific
           pretty-printing logic.  */
	void visit_cond (gimple_cond stmt);
	void visit_switch (gimple_switch stmt);
	void visit_assign (gimple_assign stmt);
	void visit_phi (gimple_phi phi);
	/* etc */

   so that the dispatch can be written like this:
     gimple_pretty_printer pp;
     pp.visit_stmt (stmt);

   (the above isn't *exactly* the Visitor pattern from the Gang of Four
book, I'm doing things in the visitor in order avoiding adding vfuncs to

   This approach avoids adding an implicit vtable field to the top of
gimple [(i) above], and keeps the vtables with the code using them
[(iii) above].

  However it still would mean (ii) changing from switch/if-then control
flow to vfunc calls, with unknown impact on performance.  I'd be nervous
about adding virtual functions anywhere where we're not already jumping
though function ptrs.

  [Aside: this approach also gives us a natural place to store state
relating to the dispatch (as fields of the visitor subclass), which may
help in removing global state from the compiler.  (Doesn't help with GTY
state though)]

  (d) The status quo, with the drawback of doing the type-checking
either at run-time (checked build) or not at all (release build).

We have 41 gimple codes (and adding new ones is a relatively rare
event), but many more dispatch sites, I think - a first estimate might
  [gcc] $ grep -nH -e "gimple_code (" *.c | wc -l

So we have (I think) a large number of sites that dispatch to code based
on a (relatively) small fixed set of types - this suggests to me the use
of the Visitor pattern: (c) above, but I think (a) is the conservative
approach that gives many benefits for relatively low risk, and gives us
an easy transition path to (c) if measurement establishes the lack of
significant performance impact.  Such a transition could be done

> Just to be clear, I'm not asking you to make those changes, just for 
> your thoughts on approaches to eliminate the downcasting based on what 
> we've seen so far.

(nods).   Note that I don't regard the downcasting as inherently bad,
just one approach to the generic issue of typesafe dynamic code
dispatch.   Yes, in many OO textbooks, it's regarded as a code smell,
but then again "goto" has its uses :)

> Thanks for pulling this together to help illustrate how some of this 
> might look in practice.  I hope others take the time to look closely as 
> this example and think about what it means in terms of how we would be 
> writing code 6 months from now.

On the subject of "6 months from now", my dream is that we can have a and a (.a in the regular builds).   I've love to
build a pluggable static analyzer on top of the gimple code - so I'm
trying to think of how we can structure the gimple code in such a way
that it's reusable in a modular way.

FWIW, I don't like the implicit "pointerness" of types within my
proposed patch, I'd rather have, say, the "gimple_switch" type be the
underlying _struct_, and spell out "gimple_switch *" to make it clear
that the latter is a pointer.

Doing so would make the is-a.h API somewhat less clunky, so that one
could write:

   if (gimple_switch *swtch = dyn_cast<gimple_switch> (stmt))
	// do switchy stuff on swtch.

rather than:

   if (gimple_switch swtch = dyn_cast<gimple_statement_switch> (stmt))
	// do switchy stuff on swtch.

Another tweak to the patch series could be to only introduce the leaf
subclasses one at a time, each time making use of the new subclass.  For
example, the "gimple_switch" subclass would be added at the same time as
a patch that makes use of it to improve compile-time typesafety, so each
patch would contain its own justification, as it were (applying the
YAGNI principle).

Sorry for the long email; thanks for reading to the end.

[1] see the Gang of Four "Design Pattens" book, or e.g.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]