function-at-a-time processing in C

Thu Jun 15 11:31:00 GMT 2000

Mark Mitchell <mark@codesourcery.com> writes:

> First, I think we want language-specific trees.  Then, we want
> language-independent trees.  Finally, we want RTL, or some other
> low-level representation. 

In principle, I have no objection.  But the closer the language-specific
tree is to the language-independent tree, the easier it to convert
the former to the latter, the less memory and cpu is wasted in the
process, and the easier it is to write and maintain the tools
that work on the language-specific trees.

> Scheme as all kinds of constructs that make no sense in C:
> continuations, for example.

There are those who argue that all compilers can benefit from
using a continuation-passing-style.  Conversely, you can view
call-with-current-continuation as a pure run-time issue that
does not effect the compiler.

> In Lisp, s-expressions are the basic entity; it would make sense to
> have a representation that looked like s-expressions.

Yes and no.  You first read S-expressions, and macro-expand them.
At that point you can generate an internal representation that
is closer to gcc tree nodes than to s-expressions.  (In fact,
this is what Kawa does:  The internal format is a tree of instances
of classes that inherit from the Expression class.)

> First, we want a common representation for C and C++.  So, you have to
> disagree for C++, too. :-)

I do - but I was afraid there might be some subtle gotcha I couldn't
think of.

> I don't fancy trying to deal with C++ templates by pulling out the
> middle of an EXIT_EXPR, turning it into a declaration, and then trying
> to stuff that back into the for loops.

I agree it may be reasonable to have a FOR_LOOP_EXPR ...  I.e. if for
some statement/expression, the language-nidependent representation
would need to use some kind of non-trivial re-writing, then it is
better to start out with a language-specific tree type.  For example,
a for loop, which may need a LOOP_EXPR containing a LABELED_BLOCK_EXPR
(to handle continue), and surrounded by another LABELED_BLOCK_EXPR
(to handle break).  While the Java front-end does that, a sensible
alternative is to just use a FOR_LOOP_EXPR.  Then when the front-end is
done, it calls a "tree normalizer", which converts the FOR_LOOP_EXPR
to the language-independent LOOP_EXPR/LABELED_BLOCK_EXPR thingie.

However, for constructs that can be represented directly in
non-language-specific tree nodes (such as if statements), that is
preferable.  (If you want to be able to distinguish it statments
from conditional expressions, use a flag bit for that.)

> Finally, there's a bigger goal here: improving the C front-end, and
> reducing the amount of code in the compiler.  If, once that's done, we
> decide to combine all three kinds of loops, so be it.  But, let's
> cross one bridge at a time.  We've got working C++ code; we can share
> it with the C code and reap some benefits.  We can then go back and
> change things around some and possibly reap more benefits.  This is
> not an irreversible situation.  If we have to redesign the C++
> representation first, that will be a lot more work, and frankly, it
> just won't get done any time soon.

Ok.

> Let's leave this debate.  I think we're going to have to agree to
> disagree on this point.  I know I'm starting to repeat myself, and
> that's usually the sign I'm out of persuasive arguments.  But, I'm by
> no means convinced by your arguments either -- leaving us at an
> impasse.

I think it makes sense to do what you're currently doing.
I would be happier with a consensus on the *goal*, even if
that isn't an immediate priority.

As a way to incrementally approach, we can use EXPR_STMT as much as
possible.  For example, drop IF_STMT, and use a COND_EXPR wrapped in
an EXPR_STMT.  Use a flag bit (in the COND_EXPR) to indicate that this
is really an if-statement, for those places that care (such as error
messages).

> It's not going to be any harder to get to a common representation
> later than it is now; there's not going to be new code depending on
> the C++ representation.  Instead, some of the C++ code will be shared
> with the C representation.  When and if we do change things around
> we'll automatically be changing both front-ends at once.  In other
> words, we're going to a place that is monotonically better than where
> we are, even if it's not what you think is the global optimum.

As long as what you're basically doing is moving existing C++-specific
code so it can be shared with C, I have no objection.  We can't fix
everything at once.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/~per/