This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Patch to Avoid Bad Prefetching

Please see my responses below.

-----Original Message-----
From: Zdenek Dvorak [] 
Sent: Wednesday, April 15, 2009 6:03 PM
To: Shobaki, Ghassan
Subject: Re: Patch to Avoid Bad Prefetching


> An estimate of 10 iterations can lead to a serious degradation in
> performance if the actual trip count turns out to be less than 10 and
> less than the ahead distance.

so just require that the number of iterations is at least 3*ahead (I
think prefetching would not be useful for such loops with just a few
iterations, anyway), 

[Ghassan] Fixing the unknown trip counts at 10 will totally close the
door for all loops with small bodies (and consequently large ahead
distances). I have seen many such loops that benefit greatly from
prefetching. For example, the libquantum benchmark has a loop with a
small body and an ahead distance of 19, where one critical prefetch
improves performance by about 20%. We don't want to miss such
opportunities, right? So, I am proposing to be conservative by default
but still leave the door open for overriding the conservative behavior
upon the user's request. 
Another reasonable solution that is adopted by other compilers (such as
Pathscale, PGI and Sun Studio) is providing multiple levels of
prefetching: a conservative level of prefetching where many prefetching
opportunities may be missed and an aggressive level (or levels), where
more prefetching is issued at the risk of possibly degrading
performance. Then it will be up to the user to try the two or three
levels of prefetching and pick the one that works best for his
application. I think this is an option that is worth considering for

or possibly disable the prefetching completely if
we do not have a reasonable estimate (i.e., if
estimate_number_of_iterations fails).

[Ghassan] This will essentially mean disabling prefetching unless
profile feedback is used, which leads to depriving most users from the
benefit of prefetching.

> However, the user can always override that conservative estimate by
> setting the prefetch-unknown-trip-count parameter that I am proposing.
> It seems to me that setting this parameter provides the programmer
> an easy way to give a hint about the trip count if he wants. 

I think programmers are even less likely to play with the hundreds
of options gcc provides than to use profile feedback,

[Ghassan] I totally agree that programmers will not play with hundreds
of parameters trying to figure out what value to use for each parameter.
However, if a programmer is advised by a compiler developer or a
performance analyst to use a command-line option to improve the
performance of his application, I think he will most likely use it. If
we have a customer who is interested in optimizing the performance of
his application, recommending a couple of command-line options for him
will be much easier than asking him to compile with profile feedback.
Many people prefer profiling and tuning their application once (with the
right command-line options) to enabling profile feedback every time they
compile their application.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]