This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Higher level RTL issues



I know folks have heard me mention the idea of a higher level RTL from
time to time.  I must apologize for not posting this earlier -- oh well,
life goes on.  I must also apologize as this isn't 100% complete, but
given that I haven't posted anything about it already and the fact that
I'm going on vacation tomorrow, I figured I'd at least get this info out
and let folks chew on it while I'm gone.

I don't think there's any significant disagreement that a higher level
IL would be useful.  What is a subject of debate is whether we have
a lower tree form, higher RTL form or some other form.  I think if we
look at the set of problems we want to solve we can get a good feel for
which of the 3 choices makes the most sense.


I'm looking at the new IL as a way to enable an SSA optimization path.  On
that optimization path I'm initially looking at the following optimizations:

	* Sparse conditional constant propagation (ssa-ccp.c)

	* Traditional dead code elimination (ssa-dce.c)

	* Aggressive dead code elimination (ssa-dce.c)

	* Register coalescing and renaming (ssa.c)

	* Dominator based value numbering (will eventually be in ssa.c)

Certainly there will be more optimizations in the future, but I think this
set gives us a good idea of the kinds of optimization problems I want to 
solve.

Let's first consider Tree SSA work Diego's doing vs our existing SSA
translator.  The SSA naming discipline in Diego's work is merely a
conceptual model -- we do not actually rewrite the tree structures to
conform to SSA naming rules.  The nice thing about this model is you
don't actually have to convert the IL into and out of SSA form.

That model works fine for a class of optimization problems, but breaks down
with any optimization which potentially has two SSA names for the same 
variable live at the same point in the program.  GVN and dominator based
value numbering can create situations where we have two SSA names live at
the same time.  It's probably true that other interesting optimizations
could create similar situations.  I believe this argues that either the
tree SSA needs to do rewriting or that tree SSA is not suitable for the
class of optimizations I'm trying to implement.

Now let's consider writing a new IL to sit between trees and RTL.  It can
certainly be done, no doubt about that.  However, that means we have to
come up with a new class of routines to analyze and manipulate this new IL
as well as translators in/out of this new IL.  It means we as developers have
to familiarize ourselves with a whole new IL.  It means that we can't use any
of the existing work in the SSA translator and optimization passes.  In short
I just don't think it makes a lot of sense.

So that leaves the 3rd alternative and my preferred solution -- a higher level
RTL.  The biggest benefits of this direction are that we already have code
which can analyze and manipulate RTL which would likely only need minor
extensions.  Similarly I believe the learning curve for us as developers is
a whole lot smaller since we're still working with RTL.


So, given the set of optimization problems I want to solve and the possible
solutions using a higher level RTL seems like the natural choice to me.

--

If we now switch gears and assume that we're going to have a higher level
RTL representation we have the interesting problem of defining the higher
level RTL representation and means by which we lower from the higher level
RTL representation into RTL as we know it today.

Given that I'm looking to have this new RTL layer be suitable for a class
of SSA optimizers, I've largely let the needs of those optimizers define
the properties of the new RTL layer.

As a mental model, we have a set of standard patterns that are defined
unconditionally for all targets.   Those patterns handle standard
operations -- data movement, arithmetic, logical, flow control, etc. 
The operands within those patterns have the property that registers and
constants are interchangable -- ie, if we have a register, we can 
replace it with a constant and the result is a valid insn that does not
need re-recognition.  Furthermore, we have a set of standard addressing
modes (TBD, but I'm considering reg, reg + disp, constant).  MEMs can
be replaced by registers and thus by constants under the same rules.
[ It's not clear where we're going to allow MEMs, but anywhere we allow
  a MEM will also allow a REG or constant. ]

There are no libcalls, subregs, cc0, etc which cause us so much trouble
in SSA form.


If we have this higher level RTL we have the interesting problem of 
lowering from the high level RTL to RTL as we know it today.  Clearly
we want to avoid having to rewrite each backend.  In an ideal world
we wouldn't have to twiddle backends at all, but I'm not sure if
that's feasible in practice.  So it is a goal to not have to twiddle
backends, but it is not an absolute requirement.

I believe we want to structure the lowering phase in the following
manner (many thanks to Richard for help in this space):

  a. First try to match the high level RTL to a pattern in the
     target's backend.  If we get a match, then no lowering was
     needed.

  b. See if the RTL matches a define_split in the backend, if so
     call the target backend's splitter to perform the lowering.
     The insns resulting from the split must match patterns in the
     target's backend.

  c. Extract the operator and its operands and call back into the
     expand_foo routines we currently have (expr.c optabs.c).  Those
     will use the target's define_expands and the generic lowering code we've
     already got in expr.c, optabs.c and friends to generate target
     specific RTL.

Once the lowering phase is started, the patterns for the higher level
RTL are no longer allowed to match.  


--
 


This design has come from my experiences in trying to build out the
fledgling SSA optimization path mentioned at the start of this message.

I've actually implemented this stuff by hand on one embedded target using
state variables and define_splits for lowering as a proof of concept.  It
worked well enough to build libgcc, newlib, libstdc++ and run the various
testsuites (including the proprietary testsuites Red Hat uses internally).

I've also had this up and running on IA-64 linux -- which included successful
bootstrapping, spec runs, etc.


Anyway, chew on it for a while and I'll join in the discussion when I return
from vacation.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]