This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Jump Bypass Optimization


>  In message <Pine.LNX.4.33.0205180746060.24370-100000@www.eyesopen.com>, Roger 
> Sayle writes:
> 
>  > [1] JUMP BYPASSING
>  > The current constrant propagation pass of GCC is able to convert
>  > conditional jumps into unconditional jumps using the available
>  > expressions calculated using standard PRE.  With this methodology
>  > a conditional jump is optimized if it depends upon a value that
>  > has a known value at the point the jump is evaluated.
>  > 
>  > Consider the common case of a basic block that starts with a
>  > conditional jump, that has multiple incoming edges.  The current
>  > cprop transformation only applies if the value is known and has
>  > the same value on all incoming branches.  The "jump bypass"
>  > optimization considers the situation where the value is known
>  > over some subset of the incoming edges.  If the condition is
>  > known to be satisfied the edge is redirected to the appropriate
>  > target.
> Right.  In theory all you have to do is get the available expression
> from each incoming edge, see if it allows for optimization of the 
> conditional and if it does, clean up the incoming edge to bypass
> the conditional.
> 
> Note this is not the same as sparse conditional constant propagation,
> though there is some overlap between the two.
> 
>  > where immediately after "done = true" the edge can be redirected out
>  > of the loop.  Hopefully, if "done" isn't used elsewhere live analysis
>  > will clean up the assignment, effectively turning it into a break.
> Correct.  If "done" isn't a memory location life analysis will zap it;
> if "done" is in a memory location, then it's much hard to eliminate.
> 
>  > [2] GCSE PSEUDO-ASSIGNMENT
>  > Common subexpression elimination is handled by two separate passes
>  > within GCC, CSE and GCSE.  In theory, the constant propagation
>  > transformations available to both should be the same, but with
>  > different scopes.  CSE should clean-up basic blocks and extended
>  > basic-blocks, whilst GCSE cprop should apply interblock.
> IMHO, CSE must go.  It's a freaking mess and the slowest of our
> RTL optimizers.  Long term it's easily replaced with dominator based

100% agreed. However we seems to be not getting near to this stage.
I am making slow progress on the midRTL. Right now I do have exams, then
I plan to concentrate on merging the existing patches into mainline
and just after that I will create the official tree.

Concerning the progress I've made so far is that I do have working prototype in
cfg branch for code lowering, minus side cases, like calls.  It appears to work
now reliably (there has been half a dozen of issues, suprisingly) I've put most
of code into the mainline in force_operand. It is not very long, just it has
been tricky to get it working right.  Still it does not bootstrap with lowering
on RISCy targets due to libcalls, however I've tried i386/m68k sucesfully.

I do have the code for multiple machine descriptions working, but I didn't
installed it into the branch yet.  It is roughly tested so next step I would
like to put the both together and then the fun begins - we need to handle the
side cases.  I have no idea how much work that can be and whether I can do it
for 3.3, however I would like to put there all important stuff - the
virtualization of interfaces and code to re-emit simple RTL expressions.  I
plan to create branch soon after finals, so in June.

Still midlevel RTL does not solve it. I was trying to play around the SSA code
but it appears to have problems as done currently, as commonly we create dead
blocks and we can't remove them because of hardwired basic block indexes in the
SSA representation.  THis makes impossible for me to get -fssa-dce nor
-fssa-ccp working, as it eventually dies by trying to do dataflow on
disconnected graph.  Zdenek's bb-renumbering killer patch solve this particular
problem and even when it has shown to be dificult to integrate to mainline, hope
it will find it's way in next few weeks.  However I am affraid that this is just
top of the iceberg with the SSA impleemtnation.

Still we will need lowlevel non-SSA CSE pass.  So question is,  how should the
CSE replacement pass look like, and what needs to be implemented for it.  I
can't voluntee muself, as I would like to prioritize higly the midRTL work
after 3.2 stage2, but I guess if there is some outline, someone else will try
to do the important task

> optimization on an ssa form, but, well, that work has stalled.  In
> the short term we might consider doing CSE on blocks in the dominator
> tree to at least remove its insane compile time complexity on complex
> flow graphs.

How this should work?

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]