This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch PR 44576]: Reduce the computation cost in compute_miss_rate for prefetching loop arrays
- From: Sebastian Pop <sebpop at gmail dot com>
- To: Richard Guenther <richard dot guenther at gmail dot com>
- Cc: "Fang, Changpeng" <Changpeng dot Fang at amd dot com>, Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>, Christian Borntraeger <borntraeger at de dot ibm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "uweigand at de dot ibm dot com" <uweigand at de dot ibm dot com>
- Date: Fri, 2 Jul 2010 11:41:38 -0500
- Subject: Re: [Patch PR 44576]: Reduce the computation cost in compute_miss_rate for prefetching loop arrays
- References: <D4C76825A6780047854A11E93CDE84D02F7759@SAUSEXMBP01.amd.com> <AANLkTimL0po55_PnkbcCe2lbJNlaeWmcLeRqy48kTBen@mail.gmail.com>
On Wed, Jun 30, 2010 at 03:52, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 2:06 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>> Hi,
>>
>> Attached is the patch that partially fixes bug 44576: ?testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
>> time on prefetching + peeling.
>>
>> The cost of compute_miss_rate in tree-ssa-loop-prefetch.c is found to be very high, and it results in an exploding
>> compilation time when there are thousands of memory references in the loop.
>>
>> compute_miss_rate is to compute the possibility (miss_rate) that two memory references fall on difference cache
>> lines. We will insert prefetches for both references if the computed miss rate is greater than the experimentally
>> determined ACCEPTABLE_MISS_RATE.
>>
>> In this patch, compute_miss_rate is renamed as is_miss_rate_acceptable, and does an early return if the computed
>> miss rate is already greater than the ACCEPTABLE_MISS_RATE, and thus significantly reduces the cost for such
>> case. Note that the worst case cost is not affected by this patch.
>>
>> This patch reduces the compile time of the test case from 1m20'' to 1m03'' on an amd-linux64 system.
>> Note that without -fprefetching-loop-arrays, the compile time on the same system is 0m30''. The extra 33'' is mostly
>> spent in "compute_all_dependences (datarefs, &dependences, vloops, true)". ?I am not sure whether the cost in
>> ?dependence analysis for loop should be that high (with several thousands memory references in loop) or there
>> is a bug in dependence computation.
>
> Well, it seems that you compute the same information over-and-over
> by doing
>
> unsigned int
> tree_ssa_prefetch_arrays (void)
> {
> ...
> ?FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
> ? ?{
> ? ? ?if (dump_file && (dump_flags & TDF_DETAILS))
> ? ? ? ?fprintf (dump_file, "Processing loop %d:\n", loop->num);
>
> ? ? ?unrolled |= loop_prefetch_arrays (loop);
>
> static bool
> loop_prefetch_arrays (struct loop *loop)
> {
> ...
> ?determine_loop_nest_reuse (loop, refs, no_other_refs);
>
> static void
> determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs,
> ? ? ? ? ? ? ? ? ? ? ? ? ? bool no_other_refs)
> {
> ...
> ?if (loop->inner)
> ? ?return;
> ...
> ?/* Find the outermost loop of the loop nest of loop (we require that
> ? ? there are no sibling loops inside the nest). ?*/
> ?nest = loop;
> ?while (1)
> ? ?{
> ? ? ?aloop = loop_outer (nest);
>
> ? ? ?if (aloop == current_loops->tree_root
> ? ? ? ? ?|| aloop->inner->next)
> ? ? ? ?break;
>
> ? ? ?nest = aloop;
> ? ?}
> ...
> ?compute_all_dependences (datarefs, &dependences, vloops, true);
>
>
> In fact you should iterate _only_ over innermost loops like
>
> ?FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
>
> does that make a difference?
>
>>
>> The patch passed Bootstrapping and regression tests.
>>
>> Is this patch OK to commit?
>
> Ok.
>
Committed r161728