This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [patch] Improve prefetch heuristics

From: "Fang, Changpeng" <Changpeng dot Fang at amd dot com>
To: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "rguenther at suse dot de" <rguenther at suse dot de>, "sebpop at gmail dot com" <sebpop at gmail dot com>
Date: Thu, 6 May 2010 12:23:35 -0500
Subject: RE: [patch] Improve prefetch heuristics
References: <20100430010543.GA30055@kam.mff.cuni.cz> <1C13CD442679CE45A2E80AE9251D7EF921803A36@SAUSEXMBP01.amd.com>,<20100506071340.GA16357@kam.mff.cuni.cz>

________________________________________
From: Zdenek Dvorak [rakdver@kam.mff.cuni.cz]
Sent: Thursday, May 06, 2010 2:13 AM
To: Fang, Changpeng
Cc: gcc-patches@gcc.gnu.org; rguenther@suse.de; sebpop@gmail.com
Subject: Re: [patch] Improve prefetch heuristics

Hi,

>  I attach the original patch here for further discussion.
>
> >> Patch5: 0005-Also-apply-the-insn-to-prefetch-ratio-heuristic-to-l.patch
> >> This patch applies the instruction to prefetch ratio heuristic also to loops with
> >> known loop trip count.  It improves 416.gamess by 3~4% and 445.gobmk by 3%.
>
> >I think it would be better to find out why the prefetching for the loops in
> >these examples is currently performed (or why it causes the degradation).  If
> >the instruction count is too small, AHEAD should be big enough to prevent the
> >prefetching.  Currently we just test whether est_niter <= ahead, which is
> >rather aggressive. We should only emit the prefetches if # of iterations is
> >significantly bigger than ahead, so that could be the place that needs to be
> >fixed.  Which ...
>
> I found that whenever the instruction to prefetch ratio is too small, there is a performance
> degradation if we insert the prefetches.
>
> Here is an example dump from engine/dragon.c in 445.gobmk benchmark:
> Ahead 1, unroll factor 1, trip count 379, insn count 351, mem ref count 95, prefetch count 85
>
> And another example from dftint.fppized.f in 416.gamess benchmark:
> Ahead 4, unroll factor 1, trip count 216, insn count 52, mem ref count 15, prefetch count 15
>
> You may see that there are too many prefetches generated in the loop. One possible reason
> is that the memory references are not appropriately grouped so a prefetch is generated for
> each memory reference (should be one for each group). However, everything is correctly computed,
> the question is, will these many prefetches help performance?
>
> My experiments say no because we see degradations in these two benchmarks.

>in principle, I agree that the patch might be reasonable.  But, I would like to see
>the comparison with the results that you get by decreasing SIMULTANEOUS_PREFETCHES to
>some reasonable value (say 10),

Yes,  SIMULTANEOUS_PREFETCHES is a good parameter to tune if we observed too many
prefetches. By the way, 10 is too small for amd64 architecture. I am going to more experiments
on this (the current default value of 100 seems too large).

Thanks,

Changpeng




Zdenek

References:
- RE: [patch] Improve prefetch heuristics
  - From: Fang, Changpeng
- Re: [patch] Improve prefetch heuristics
  - From: Zdenek Dvorak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]