This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix code quality regression on UltraSPARC


On Sun, 12 Dec 2004, Eric Botcazou wrote:
> Unless we always scan all the edges until we find the FALLTHRU.
> I originally didn't want to, so the proposed heuristics is pretty
> limited.

I'm happy with your current solution, such that if there are two
loop entries, and A follows B, then its better to place the new
preheader after A.  For the gzip regression this is clearly sufficent.


I was just thinking out loud about further refinements (you did ask
in your original post).  For example, I started off by thinking that
you could refine your patch from "A follows B", to "A is dominated
by B", or "A is immediately dominated by B".  However, I believe that
the correct placement is determined purely by the "fallthru" edges.
Graph-theoretically, this is the only property affected by your
proposed change.

[I apologise for any confusion with the terminology below, when I
talk about the fallthru edges into the loop header, I'm talking about
the placement of edges *before* we create the preheader.  After we
insert the preheader there should only be one incoming edge to the
loop header, which is from the new preheader].


The reason this code is here at all, and we just don't always place
the preheader immediately before the loop header, is to avoid badly
placing the preheader "in the loop", i.e. when the latch edge is a
fallthru edge, such as a rotated "while (..) {}" loop.  In this case,
placing the preheader there, changes loop edges from fallthru to
unconditional jumps, which obviously hurts performance.  In this case,
we the preheader should be placed after an arbitrary loop entry edge.
In all other cases, (the latch edge isn't a fallthru edge) its best
to place to preheader immediately before the loop header.  If there
are no fallthru edges into the loop header, the preheader can also
alternatively be placed after an arbitrary predecessor block with no
adverse increase/decrease in the number of unconditional jumps.

Your "follows" heuristic captures some aspects of this logic.


I also want to thank Kazu for pointing out on IRC, that we already
loop over all incoming edges at the top of create_preheader.  This
means that we can put some more effort into placing the preheader
intelligently without affecting the big-O of this routine.  I believe
that just inspecting the first N edges for a fallthru edge, where
N is four or five, should be sufficient to capture most of the benefit
without any severe performance impact.


I hope this makes my comments somewhat clearer.  If not, post the
updated patch that you have (which I'm happy to approve) to address
the urgent performance regression, and refinements can be investigated
as follow-up patches.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]