This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: improvement to bt-load.c


> On Fri, Jan 23, 2004 at 02:35:05PM +0000, Joern Rennecke wrote:
> > -   If we have a profiling count available, we could use it here.  */
> > +   When we have a profiling count available, we get it as bb->frequency.
> > +   However, if not compiling with -fbranch-probabilities, or no valid
> > +   profile data is available, all the execution counts are zero, so
> > +   augment these counts with a heuristic based on loop depth.  */
> >  static int
> >  basic_block_freq (basic_block bb)
> >  {
> > -  return bb->frequency;
> > +  int loop_depth = MIN (10, (bb->loop_depth));
> > +
> > +  return bb->frequency + (1 << (loop_depth * 3));

rth:
> This shouldn't be needed?  See estimate_bb_frequencies.

Funny, the ChangeLog shows this was introduced in 2001, and it was
definitely not working in a snapshot from the 29th May 2001 mainline.
But is did a few simple tests in our current mainline, and it appears
to work now.  So the only thing remaining is to remove the outdated
piece of comment:

@@ -168,8 +173,7 @@ static int first_btr, last_btr;
 
 
 
-/* Return an estimate of the frequency of execution of block bb.
-   If we have a profiling count available, we could use it here.  */
+/* Return an estimate of the frequency of execution of block bb.  */
 static int
 basic_block_freq (basic_block bb)
 {



As this is just changing (or rather not changing) a heuristic, and
apparently not any material way, I don't thing a full regression
test is required for this variation of the patch - I think it is
an obvious one, after verifying that bb->frequency now has meaningful
values.  I did check that the compiler can be build and perform the
optimization, of course.

We have still a remaining issue with this, though:  when compiling
with -fbranch-probabilities and no data file is found, you get
this message:

l2.c: In function `f':
l2.c:8: note: file l2.gcda not found, execution counts assumed to be zero

and that is exactly what is does!
Thus, when you build with an automated process and have profile information
only for some of the files, or only some of the time, or with some
profile information not recognized as in the correct format, failure to
find valid profile information results in all execution counts to be
assumed to be zero.  This is particularily unexpected if you also
use the -fguess-branch-probability flag explicitly.
I am going to submit a separate patch to fall back to estimated
execution counts.

rth:
> The rest of it looks good.  I'm not sure we need to keep the old
> algorithm around though.  At the moment, SH is the only user of
> this code, so if you get better results there, great.  And, really,
> it makes sense that you would; I can't think of any reason the new
> would work worse than the old.

Is that an approval, or a request to discuss/remove the -fbtr-bb-exclusive
option first?

The problem is that the rest of the code was apparently designed
with the assumption that multiple uses of the same target register
in the same basic block are an exception.  This entire business with
the other_btr_uses* flags can prevent us re-using the register fully if
we hoist out all uses.  It would probably make sense to have something
more sophisticated to keep track of these lifeness issues, but so far
our performance analyses haven't turned up a case where this was
important.  It would not only need some work, but also a testcase to
verify that the optimization does the expected transformations to
properly implement such better tracking.
Since for the time being the lifeness tracking is still mostly based on
basic blocks, we might fare better with the old algorithm if there are
enough target registers to do two levels of hoisting branch target loads
out of inner loops, and a third level would be beneficial when the
inner loops have low or zero iteration count, or where we can safe
on call overhead for often-called functions that return quickly most
of the time.

The main benefit of the option is that it allows you to quickly verify
if a performance regression is due to the new algorithm.  When such
a regression is verified, I can check if the new algorith can be improved
to match the old one on the testcase.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]