This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Replace rename_ssa_into_ssa with update_ssa
On Sat, Apr 23, 2005 at 03:52:35AM +0200, Zdenek Dvorak wrote:
> Hello,
>
> > This patch brings over most of the changes in tree-cleanup-branch.
> > I think that all that's left is the re-ordering of tree passes.
> >
> > The main change in this patch is the replacement of rewrite_ssa_into_ssa
> > with update_ssa (the incremental SSA updating mechanism). As a
> > side-effect of this change, it is no longer necessary to keep
> > virtual operands in loop-closed SSA form. This provides some
> > nice compile time speedups in the loop passes.
>
> hmm... I wonder whether this is really a good idea. This means that
> operations like loop unrolling will leave the ssa form for virtual
> operands broken, making it impossible to use it until update_ssa is
> called.
>
Well, yes. But this is true of any other updating scheme, after
you unroll the loop you have to update the SSA form before you
can do anything with the code. I think I'm missing your point.
Could you show me a test case with broken code?
> How much do we really gain by it? I.e. do you have some
> comparison between the compile time difference between your patch, and
> your patch that leaves the virtual operands in loop closed ssa form?
>
Having virtual operands in loop-closed form will obviously yield
higher compile times. There's simply more IL to grok through.
Or perhaps I'm not understanding your question correctly?
Making the obvious change to tree-ssa-loop-manip.c gives the
following for cc1-i-files:
'Before' is with virtuals NOT in loop-closed form.
'After' is with virtuals in loop-closed form.
Phase Before After % change
tree copy propagation 7.73 8.98 16.2%
tree SSA rewrite 7.22 9.30 28.8%
tree SSA incremental 22.11 30.32 37.1%
dominator optimization 35.31 36.50 3.4%
tree aggressive DCE 1.39 2.06 48.2%
loop invariant motion 0.90 2.66 195.6%
tree iv optimization 3.83 14.48 278.1%
tree loop init 1.28 14.13 1003.9%
tree copy headers 2.26 11.05 388.9%
tree SSA to normal 4.43 4.97 12.2%
TOTAL 760.88 823.29 8.2%
> If I understand your patch correctly, you have
> removed the specialized versions of ssa form updating from
> tree_duplicate_loop_to_header_edge and tree_duplicate_sese_region.
> Why? They should be more efficient than calling the generic updating.
>
Ah, yes, I forgot to mention this. Thanks for the reminder. The
main motivation was code factoring, but in this case, the
existing code updated 2 regions of code, the original region and
the copy. The call to update_ssa only updates the copied region,
so it's faster.
That's part of the reason that we get speedups in the loop code.
In fact, one of the experiments I made included several loop
optimizers and the vectorizer. The speedups in the loop passes
are more noticeable. For cc1-i-files, I get:
'Before' is using the specialized SSA form updating.
'After' is using the generic incremental SSA updating.
Phase Before After % change
tree PHI insertion 2.46 1.36 -44.7%
tree SSA rewrite 8.38 7.03 -16.1%
tree SSA other 8.36 5.07 -39.4%
tree SSA incremental 13.07 19.09 46.1%
loop invariant motion 2.67 1.01 -62.2%
tree loop init 2.48 1.22 -50.8%
tree copy headers 2.74 2.21 -19.3%
TOTAL 812.95 807.96 -0.6%
I was going to experiment with removing the specialized updating
done in the vectorizer. It's my impression that it goes to a lot
of effort for not much gain, but I have not found the time to do
this yet.
> Could you please remove my name from the changelog entry? I have
> nothing to do with this patch
>
This is an odd request. Credit where credit's due. You
implemented the code to replace calls to rewrite_ssa_into_ssa
with another scheme. I kept those hunks from your original
patch.
> (even if you were inspired by some of my patches, it was very,
> very losely and somewhat in contradiction of my ideas).
>
Inspired? No, I just took the code. The main problem I saw with
your original patch was the concept of local dominance. The code
went through heroic efforts to keep this information up-to-date
and the end result was a more complex code base that was also
slower. At the time, I compared both incremental methods and
got the following timings for cc1-i-files (as of 2005-03-17):
'Before' is the patch using local dominance.
'After' is the patch using an earlier version of update_ssa.
Phase Before After % change
tree VRP 5.62 4.20 -25.3%
tree copy propagation 8.03 7.04 -12.3%
tree store copy propagation 2.82 2.27 -19.5%
tree SSA rewrite 4.50 8.26 83.6%
tree SSA other 23.04 20.83 -9.6%
tree SSA incremental 11.21 13.30 18.6%
dominator optimization 20.38 18.37 -9.9%
tree FRE 13.61 12.83 -5.7%
tree PRE 8.10 7.42 -8.4%
tree conservative DCE 7.63 5.26 -31.1%
tree SSA to normal 6.35 5.54 -12.8%
TOTAL 786.43 761.35 -3.2%
One of the main things I'm after is maintenance. If we can use a
generic approach and not be slower, then we gain from a
maintenance point of view. If the generic code is actually
faster than the specialized version, even better.
So, I would try to encourage new passes to try and use update_ssa
whenever possible and only hand-code the SSA updating when the
generic updating is slower and can't be easily sped up.
The code in update_ssa tries to minimize unnecessary traversals
and visits as few blocks as possible. I expect that we will find
more opportunities for speeding up the code even further. The
code should not be hard to modify and is encapsulated.
If a pass absolutely wants to update the SSA form on its own,
then it can use the interface into the updater as the vectorizer
does. We export things like register_new_name_mapping,
create_new_def_for, ssa_names_to_replace, get_current_def and
set_current_def.
Diego.