This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Replace rename_ssa_into_ssa with update_ssa


On Sat, Apr 23, 2005 at 03:52:35AM +0200, Zdenek Dvorak wrote:
> Hello,
> 
> > This patch brings over most of the changes in tree-cleanup-branch.  
> > I think that all that's left is the re-ordering of tree passes.
> > 
> > The main change in this patch is the replacement of rewrite_ssa_into_ssa
> > with update_ssa (the incremental SSA updating mechanism).  As a
> > side-effect of this change, it is no longer necessary to keep
> > virtual operands in loop-closed SSA form.  This provides some
> > nice compile time speedups in the loop passes.
> 
> hmm... I wonder whether this is really a good idea.  This means that
> operations like loop unrolling will leave the ssa form for virtual
> operands broken, making it impossible to use it until update_ssa is
> called.
>
Well, yes.  But this is true of any other updating scheme, after
you unroll the loop you have to update the SSA form before you
can do anything with the code.  I think I'm missing your point.
Could you show me a test case with broken code?

> How much do we really gain by it?  I.e. do you have some
> comparison between the compile time difference between your patch, and
> your patch that leaves the virtual operands in loop closed ssa form?
> 
Having virtual operands in loop-closed form will obviously yield
higher compile times.  There's simply more IL to grok through.
Or perhaps I'm not understanding your question correctly?

Making the obvious change to tree-ssa-loop-manip.c gives the
following for cc1-i-files:

'Before' is with virtuals NOT in loop-closed form.
'After' is with virtuals in loop-closed form.

Phase                           Before  After   % change
tree copy propagation            7.73    8.98     16.2%
tree SSA rewrite                 7.22    9.30     28.8%
tree SSA incremental            22.11   30.32     37.1%
dominator optimization          35.31   36.50      3.4%
tree aggressive DCE              1.39    2.06     48.2%
loop invariant motion            0.90    2.66    195.6%
tree iv optimization             3.83   14.48    278.1%
tree loop init                   1.28   14.13   1003.9%
tree copy headers                2.26   11.05    388.9%
tree SSA to normal               4.43    4.97     12.2%
TOTAL                          760.88  823.29      8.2%

> If I understand your patch correctly, you have
> removed the specialized versions of ssa form updating from
> tree_duplicate_loop_to_header_edge and tree_duplicate_sese_region.
> Why?  They should be more efficient than calling the generic updating.
> 
Ah, yes, I forgot to mention this.  Thanks for the reminder.  The
main motivation was code factoring, but in this case, the
existing code updated 2 regions of code, the original region and
the copy.  The call to update_ssa only updates the copied region,
so it's faster.

That's part of the reason that we get speedups in the loop code.
In fact, one of the experiments I made included several loop
optimizers and the vectorizer.  The speedups in the loop passes
are more noticeable.  For cc1-i-files, I get:

'Before' is using the specialized SSA form updating.
'After' is using the generic incremental SSA updating.

Phase                           Before  After   % change
tree PHI insertion               2.46    1.36    -44.7%
tree SSA rewrite                 8.38    7.03    -16.1%
tree SSA other                   8.36    5.07    -39.4%
tree SSA incremental            13.07   19.09     46.1%
loop invariant motion            2.67    1.01    -62.2%
tree loop init                   2.48    1.22    -50.8%
tree copy headers                2.74    2.21    -19.3%
TOTAL                          812.95  807.96     -0.6%

I was going to experiment with removing the specialized updating
done in the vectorizer.  It's my impression that it goes to a lot
of effort for not much gain, but I have not found the time to do
this yet.


> Could you please remove my name from the changelog entry?  I have
> nothing to do with this patch
>
This is an odd request.  Credit where credit's due.  You
implemented the code to replace calls to rewrite_ssa_into_ssa
with another scheme.  I kept those hunks from your original
patch.

> (even if you were inspired by some of my patches, it was very,
> very losely and somewhat in contradiction of my ideas).
> 
Inspired?  No, I just took the code.  The main problem I saw with
your original patch was the concept of local dominance.  The code
went through heroic efforts to keep this information up-to-date
and the end result was a more complex code base that was also
slower.   At the time, I compared both incremental methods and
got the following timings for cc1-i-files (as of 2005-03-17):

'Before' is the patch using local dominance.
'After' is the patch using an earlier version of update_ssa.

Phase                           Before  After   % change
 tree VRP                        5.62    4.20    -25.3%
 tree copy propagation           8.03    7.04    -12.3%
 tree store copy propagation     2.82    2.27    -19.5%
 tree SSA rewrite                4.50    8.26     83.6%
 tree SSA other                 23.04   20.83     -9.6%
 tree SSA incremental           11.21   13.30     18.6%
 dominator optimization         20.38   18.37     -9.9%
 tree FRE                       13.61   12.83     -5.7%
 tree PRE                        8.10    7.42     -8.4%
 tree conservative DCE           7.63    5.26    -31.1%
 tree SSA to normal              6.35    5.54    -12.8%
 TOTAL                         786.43  761.35     -3.2%


One of the main things I'm after is maintenance.  If we can use a
generic approach and not be slower, then we gain from a
maintenance point of view.  If the generic code is actually
faster than the specialized version, even better.

So, I would try to encourage new passes to try and use update_ssa
whenever possible and only hand-code the SSA updating when the
generic updating is slower and can't be easily sped up.

The code in update_ssa tries to minimize unnecessary traversals
and visits as few blocks as possible.  I expect that we will find
more opportunities for speeding up the code even further.  The
code should not be hard to modify and is encapsulated.

If a pass absolutely wants to update the SSA form on its own,
then it can use the interface into the updater as the vectorizer
does.  We export things like register_new_name_mapping,
create_new_def_for, ssa_names_to_replace, get_current_def and
set_current_def.


Diego.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]