This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: tree-ssa status (was: Re: Dropping of old loop optimizer)


In message <1046379746 dot 3100 dot 23 dot camel at shadowfax>, Diego Novillo writes:
 >On Thu, 2003-02-27 at 15:31, Steven Bosscher wrote:
 >
 >> It would be interesting to know the total time spent in the RTL
 >> optimizers for mainline and branch...
 >> 
 >Yes, it would.  This is something that Jeff Law has started to do
 >recently.  Most of the recent compile time improvements come from
 >profiling the implementation and finding obvious hot spots.  We will be
 >doing lots of that in the coming weeks/months.
More correctly, I've been measuring time in the SSA path independently
from the rest of the compiler.  This makes it a lot easier to see where
the SSA path is wasting time (ie, it isn't hidden by something like cse
or gcse going crazy).

I haven't tried to measure and compare the time the two branches spend
in the RTL optimizers.  We'll probably be to that point in the not too
terribly distant future.

 >> The worst slowdown is obviously in 179.art.  What's so special about it
 >> that makes the branch twice as slow?
 >> 
 >No idea yet.  This is part of what still needs to be done.
Right.  Of focus so far has been on getting the underlying infrastucture
in place and more recently making sure that infrastructure runs reasonably
quickly.  We haven't started looking at the generated code yet.  It won't
make a lot of sense to do that until we have a real translator out of SSA.

 >> Define "unnecessary"...
 >> 
 >If we can make simplifying assumptions in the RTL optimizers, we could
 >make them run faster.  This of course would need to be predicated.  As
 >you point out, not all the front ends go through tree-ssa.  It is still
 >unclear to me whether this is impossible or merely difficult.
One canonical example I use is null pointer check elimination which
can be completely subsumed by a tree-ssa version.  We already know that
null pointer check elimination is relatively slow and memory intensive
(that's why it blocks the bitvectors rather than doing everything in
parallel).

The other canonical example is all the path following performed by cse1.
I believe there is an excellent chance we'll be able to have cse1 do a
block-local CSE once the tree-ssa code is doing some basic value numbering
and copy prop on the dominator tree.  This code is also known to be a
major cpu hog.

In both cases the SSA equivalent optimizations can be made bloody fast
and memory efficient.


 >> > So, it's a lot of work.  Will it be ready for 3.5's stage1?  I
 >> > don't know.  Particularly if the list of requirements grows
 >> > bigger.  The integration work will also be interesting.  I diff'd
 >> > mainline and the branch a few days ago:
 >> > 
 >> >  307 files changed, 80994 insertions(+), 4342 deletions(-)
 >> 
 >> How much of those insertions are in new files?
 >> 
 >Good point.  About 70000.
Also keep in mind the change to carry file/line information in a common
location touched a lot of code in basically mindless ways.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]