This is the mail archive of the
mailing list for the GCC project.
RE: Patch to Avoid Bad Prefetching
Please see my responses below.
From: Richard Guenther [mailto:firstname.lastname@example.org]
Sent: Wednesday, April 15, 2009 2:17 AM
To: Zdenek Dvorak
Cc: Shobaki, Ghassan; email@example.com
Subject: Re: Patch to Avoid Bad Prefetching
On Wed, Apr 15, 2009 at 7:50 AM, Zdenek Dvorak <firstname.lastname@example.org> wrote:
> thanks for looking into this issue. ?A changelog + bootstrap and
> regtesting information is missing in your submission.
>> Index: params.h
>> --- params.h ?(revision 145634)
>> +++ params.h ?(working copy)
>> @@ -172,4 +172,8 @@ typedef enum compiler_param
>> ? ?PARAM_VALUE (PARAM_SWITCH_CONVERSION_BRANCH_RATIO)
>> ?#define LOOP_INVARIANT_MAX_BBS_IN_LOOP \
>> ? ?PARAM_VALUE (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP)
>> +#define PREFETCH_UNKNOWN_TRIP_COUNT \
>> + ?PARAM_VALUE (PARAM_PREFETCH_UNKNOWN_TRIP_COUNT)
>> +#define PREFETCH_MAX_BLOCKS_TO_UNROLL \
>> + ?PARAM_VALUE (PARAM_PREFETCH_MAX_BLOCKS_TO_UNROLL)
>> ?#endif /* ! GCC_PARAMS_H */
> New params need to be documented (in doc/invoke.texi).
>> - ?if (loop->num_nodes > 2)
>> + ?if ((int)loop->num_nodes > PREFETCH_MAX_BLOCKS_TO_UNROLL+1)
>> ? ? ?return false;
> Add space after (int), as well as before and after +.
I wonder why this test does exist at all - shouldn't the size/time
estimates and the unrolling limits already cover the code growth
[Ghassan] I am not quite sure why this test exists, but there is a comment in there saying that it has been found that loops with multiple basic blocks do not benefit from unrolling in the prefetching pass. My experimentation has shown that this is not true. I found that unrolling loops with multiple basic blocks in the prefetcher can give a performance gain on some applications (for example, libquantum was improved by about 2%). Note that this change just generalizes what's already there and has no effect by default.
>> + ?if (is_loop_prefetching_profitable (ahead, est_niter) == false)
>> + ? ?goto fail;
> This should be if (!is_loop_prefetching_profitable (ahead, est_niter))
>> +DEFPARAM (PARAM_PREFETCH_UNKNOWN_TRIP_COUNT,
>> + ? ? ? "prefetch-unknown-trip-count",
>> + ? ? ? "A guess for unknown trip counts to be used in prefetching
>> profitability ",
>> + ? ? ? 4, 0, 0)
> I think it would be better to use expected_loop_iterations if the
> estimate is not available, rather than introducing a new param
> (while it is likely not more precise, we have way too many params
> as it is).
I agree. IMHO for prefetching we should recommend profile-feedback.
[Ghassan] The point is that in the prefetching profitability analysis, we must have a very conservative estimate for unknown trip counts or we may get a big performance degradation as shown in the benchmark numbers I included. By conservative, I mean something like 3 or 4 iterations only.
In other words, if we don't know the trip count, we should assume the worse in this particular analysis to avoid seriously degradaing performance. I wonder how the estimate of expected_loop_iterations is computed and whether it gives us the conservative estimate that we need here?
As for profile feedback, we believe that the vast majority of real users do not use it. So, we want that majority to benefit from prefetching. After working on prefetching for quite a few months now, I can say that prefetching can give significant performance improvements on many applications without profile feedback. So, why should we make it necessary to use profile feedback? Do we have any statistics on what percentage of users actually use profile feedback?