This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: AST optimizer in C++?


On Sat, 24 Aug 2002, Dave Hudson wrote:

[ CC'ing the list again, to hopefully get some discussion going... ]

> Chris Lattner wrote:
> >
> > Now this is a "sorta" because we don't have statements and expressions.
> > :)  LLVM is a low level three address code type representation, in SSA
> > form, with _type information_ for all operands.  One of our research
> > results has been to show that most of the optimizations traditionally done
> > on AST's can be done just as well on low level representations, *as long
> > as you have rich type information*.  In the context of LLVM then, this
> > means that statements and expressions are both instructions, it's just
> > that "statements" generally return void, ie they don't produce a value.
>
> I find this really interesting because I've been toying with some ideas
> about regenerating type information from very low level code
> representations.  In particular I've been considering how easily one
> might be able to apply higher-level optimizations to essentially object
> code and also how well one might be able to retarget such code to other
> processors.  Having spent a great deal of time inventing a few new
> low-level optimization passes for 8-bit CPUs recently I can see a lot of
> possibilities in this area.  The thing that occurs to me is that if I
> can regenerate something like an SSA form then there's quite a lot of
> scope to do things in this area.

It is certainly possible.  We are able to recapture the LLVM typesystem
(http://llvm.cs.uiuc.edu/docs/LangRef.html#typesystem) from RTL and a
little bit of debugging information (we use type information from function
prototypes, which really helps things.  This only works because GCC never
changes the prototype for a function).  RTL is a bit nicer than machine
code though, because we are able to control the "flavor" of RTL generated.
This means that we don't have to do as much reconstruction as you would
have to do from machine code (merging hi/low parts of constants, for
example).

One of our future research problems is to see how well the LLVM "level
raising" pass can process machine code, which has very little inherent
type information available at all: there you don't even get function
prototypes (but you can get the number of arguments).  If you want me to
explain a bit more about how and what the level raising does, just let me
know.

> > 2. We have disabled almost all of the optimizations in GCC (basically all
> >    the RTL optimizations, keeping the language specific optimizations,
>
> I assume then that you're doing things like register allocations within
> your own backend?  For many of the processors I'm interested in there
> are some very good reasons to wish to use stack slots at times while in
> others I find passes like gcc's combiner can really pessimize things.
> In general many of these processors either really want to exploit
> memory-to-memory operations or are so starved of real registers that gcc
> ends up having to work with simulated registers and thus really makes a
> mess of things.

Yes, we are.  Our one "real" backend is for the Sparc, which uses a
considerably different approach than the GCC backend (uses optimal tree
instruction selection, for example).  Our backend is not nearly as
retargetable though: which I why I would be crazy to propose not using the
GCC backend moving forward!

> > 5. LLVM allows a full suite of optimizations on global objects as well as
> >    the bodies of functions.  This means it can completely eliminate unused
> >    global variables, functions, etc, without problem.  I believe that GCC
> >    cannot currently do this in the backend (as part of generic
> >    optimization) because it treats global variables, for example, as
> >    immutable places in memory accessed by symbol_refs.
>
> You can get around much of this problem by using -ffunction-sections and
> -fdata-sections with --gc-sections passed to ld.  Without these much of
> the code I work with would be pretty much impossible.  Of course what
> this can't do is partially specialize a function (by virtue of all
> callers within a particular application only invoking it in one
> particular way) so that say some global data references become dead.

Sorta.  Here you can eliminate global variables, but can you change their
type, data layout, or other properties?  Can you delete fields of global
structures that are never used?

> >  * constantmerge       - Merge global variable constants (basically
> >                          .rodata) with identical contents together.

> This I could do with :-)

:)  This actually is more or less possible with a smart binary linker
though, isn't it?

> >  * deadinst elimination - Trivial dead code elimination
> >  * raise allocations    - Convert malloc/free calls to malloc/free instructions
>
> I'm intrigued if this means that you're essentially limiting your style
> of memory management to malloc/free?  I guess reference counting would
> correctly be managed by equivalencing it to wrappers around malloc and free?

LLVM has builtin intrinsics for malloc and free.  This pass just converts
calls in the source code from malloc/free calls to malloc/free intrinsics.
More than anything, this is a way of extending the type system to be more
accurate [The intrinsics return typed pointers, the function calls
obviously return void*].  At link time, these are lowered back to
malloc/free calls.

Having these as primitives that are well typed allows all kinds of
interesting memory optimizations.  One of the areas we are researching is
"data structure" based optimizations.  For a flavor of the kind of thing
we are working on, check out the "pointer compression" and
"automatic pool allocation" sections of this
paper: http://llvm.cs.uiuc.edu/pubs/MSP2002-FinalSubmission.pdf

> >  * Level raise         - Recover type information
>
> Do you have any papers on what you've done here?  With many (not all) of
> the targets I'm interested in - particularly the IP2022 this could be
> rather tricky.  The IP2k is an 8-bit single-accumulator-based CPU (it's
> not the most obvious choice for a gcc target, although the code we now
> get is extremely good aside from having no inter-procedural analysis
> :-)) and thus when we're using DImode 3-operand instructions we might
> have 24 opcodes forming a single RTL expression (at one point this used
> to be up to 31 opcodes!).

Unfortunately, there are no papers on this transformation yet.  Perhaps in
4-6 months though.  Basically we fix the RTL generated by the front end so
the output RTL is "simple".  At link time, this LLVM code is then
converted into a machine specific representation, which is where the DI
mode expansion would come into effect.  Reusing the GCC backend would
allow all of the builtin optimizations that GCC already performs on the
RTL to be used...

> >  * promote mem2reg     - Promote stack values to SSA registers.
>
> This I would dearly like to see :-)  Again with 8-bit targets (or I
> guess 16-bit targets that use 32-bit ints) gcc really badly pessimizes
> things because it doesn't undo some really bad effects of
> integer-promotion.  I've written some pretty extensive code to revert
> this in the machine-dependent-reorg pass (for a number of reasons this
> has to happen after register allocation unfortunately) but this renders
> stack-slot information invisible to much of what I'd like to do because
> gcc doesn't flow analyse stack-slots :-(  (at least not on the
> slight-precursor the gcc-3.0 that we currently use - we can't move to
> 3.1 because the new jump threading does terrible things to our code sizes).

One of the interesting aspects of LLVM is that the concept of stack
slots/stack allocation is unified with the concept of the C builtin
'alloca'.  Effectively our pass promotes things like:

X = alloca int
... load X
store ... , X

into usage of an actual 32 bit register.  This is an important pass after
interprocedural inlining, because a lot of variables, for example, with
their address taken to pass into a function don't need an address anymore.
When no address is neccesary, it can be promoted back to a register...
The backend converts alloca instructions in the entry block of the
function [with constant size] into stack slots.

By making the stack allocation an instruction, all of the optimizers can
correctly eliminate stack slots by simply deleting the alloca
instructions.  This one of the many benefits of abstracting this out.
Another is that LLVM doesn't have an "address of" operator.

> >  * licm                - Loop invariant code motion
>
> Do you have an example of this pass in operation?  This is one area that
> I'd really like to resolve because loop handling just doesn't seem to
> work well for either the IP2k or IP4k ports (IP4k being a 32-bit core).

What kind of example are you interested in?  LICM is a pretty well studied
problem, and is actually a subset of a technique referred to as Partial
Redundancy Elimination.  GCC actually has optimizers that can do this,
it's just that it doesn't have the alias analysis infrastructure to
disambiguate loads/stores very well.  RTL based optimizers also have
problems inherent with working on a machine specific representation, but
that if pretty much common to the backend (and is why transformations are
being moved to the tree-ssa representation).

> > Anyway, I'd love to discuss various IR design issues.  If anyone would
> > like to talk about the Pros/cons of optimizing on a tree vs a low-level
> > IR with type information, I can certainly talk about the experience I have
> > with it.  Hopefully I can convince you that tree's are not really
> > neccesary at all.  :) :)
>
> I don't really think I need convincing here but I'm interested in your
> thoughts about this - for the last few months I have been left wondering
> if I've really been missing some understanding of the benefits of trees
> vs something at a lower level.

I have been reading (mostly lurking) the GCC lists for the last few years,
and I started to get really excited about the tree-ssa work.  I think it's
a good start in the right direction: I just think that it's a mistake to
couple it to the tree representation.  It is certainly a huge step in the
right direction for GCC though.  Obviously my option doesn't matter much
here.  :)

-Chris

http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]