This is the mail archive of the
mailing list for the GCC project.
RE: Patch to Avoid Bad Prefetching
- From: "Shobaki, Ghassan" <Ghassan dot Shobaki at amd dot com>
- To: "Richard Guenther" <richard dot guenther at gmail dot com>, "Zdenek Dvorak" <rakdver at kam dot mff dot cuni dot cz>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 4 Jun 2009 12:24:18 -0700
- Subject: RE: Patch to Avoid Bad Prefetching
- References: <D60441ACF713E84E9952ADA42F2E08B716728E@ssvlexmb2.amd.com> <D60441ACF713E84E9952ADA42F2E08B70149981A@ssvlexmb2.amd.com> <20090416010249.GB21250@kam.mff.cuni.cz> <D60441ACF713E84E9952ADA42F2E08B70149988B@ssvlexmb2.amd.com> <20090416052522.GB3021@kam.mff.cuni.cz> <D60441ACF713E84E9952ADA42F2E08B701499986@ssvlexmb2.amd.com> <20090416172534.GB19702@kam.mff.cuni.cz> <firstname.lastname@example.org> <912DA18E911D8B418824641EF1541F3C417E84@ssvlexmb2.amd.com> <20090603125414.GA16367@kam.mff.cuni.cz> <email@example.com>
I agree with Zdenek that the first heuristic is imprecise and needs improvement. At least it has to distinguish between simple integer arithmetic ops and expensive FP ops. I am planning on doing this very soon. As I explained in the previous email, that will require computing an instruction-by-instruction time estimate using the estimates that GCC already computes but adapting them to properly account for the costs of cache missing memory ops.
I ran stream and it turns out that Zdenek's guess was right. The patched compiler caused regressions on two out of four scores (Add and Triad). However, the regressions went away when I set the prefetch_min_insn_to_mem_ratio parameter to 3 instead of the current default value of 4. Here are the scores I got. The numbers below are improvements in avg. time relative to no prefetching:
Copy Scale Add Triad
Current Prefetcher 28% 29% 31% 31%
Patch with param=4 27% 28% 1% 1%
Patch with param=3 28% 29% 31% 31%
Since a parameter value of 3 worked well on stream, I tested CPU2006 with a parameter value of 3 instead of 4. The results were the same on INT2006 but the improvement was slightly less on FP2006. Nevertheless, the patch still gave a significant improvement relative to the current GCC prefercher with no significant regressions. Here are the improvement numbers relative to the current GCC prefetcher:
Patch with param = 4: INT2006 3.62% FP2006 8.72%
Patch with param = 3: INT2006 3.66% FP2006 7.59%
Here are the benchmarks that are significantly affected by this parameter. The numbers below are percentage improvements relative to no prefetching:
Benchmark current_prefetch patch_param=4 patch_param=3
Hmmer -29% 0% 2%
Zeusmp -10% -2% -5%
Leslie -14% 0% -6%
Calculix -11% -1% -4%
Tonto -9% -1% -7%
All other benchmarks had the same improvements relative to the current prefetcher with both param=3 and param=4. So, the results for those are the same as those reported in my first email.
In conclusion, the patch with param=3 gives good results on both benchmarks and does cause any regressions relative to what is currently in trunk. So, I suggest that I check in this patch with a default value of 3 for the parameter prefetch_min_insn_to_mem_ratio and then work on enhancing the first heuristic in a subsequent patch.
I have fixed the formatting error caught by Zdeneatk and added entries for the two parameters in doc/invoke.texi. The patch bootstrapped successfully using the latest trunk (checked out last night). I am currently running "make check" with the patch. I will send the slightly revised patch after it passes.
From: Richard Guenther [mailto:firstname.lastname@example.org]
Sent: Wednesday, June 03, 2009 7:20 AM
To: Zdenek Dvorak
Cc: Shobaki, Ghassan; email@example.com
Subject: Re: Patch to Avoid Bad Prefetching
On Wed, Jun 3, 2009 at 2:54 PM, Zdenek Dvorak <firstname.lastname@example.org> wrote:
> could you please test the patch on the stream benchmark? ?It exhibits
> several opportunities for prefetching/movnt instructions that we would
> not like to miss, and I somewhat suspect that the first heuristic could
> cause us to. ?The patch is OK assuming the results are fine. ?It would
> be nice to improve the heuristics later, especially the first one looks
> a bit too imprecise to me.
>> + ? ? ?if(insn_to_prefetch_ratio < MIN_INSN_TO_PREFETCH_RATIO)
>> + ? ? return false;
>> + ? ? ?return true;
> space after if; or better, return insn_to_prefetch_ratio >= MIN_INSN_TO_PREFETCH_RATIO;
>> Index: params.def
>> --- params.def ? ? ? ?(revision 145634)
>> +++ params.def ? ? ? ?(working copy)
>> @@ -761,6 +761,17 @@ DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_I
>> ? ? ? ? "max basic blocks number in loop for loop invariant motion",
>> ? ? ? ? 10000, 0, 0)
>> +DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO,
>> + ? ? ? "min-insn-to-prefetch-ratio",
>> + ? ? ? "min. ratio of insns to prefetches to enable prefetching for "
>> + ? ? ? ? ?"a loop with an unknown trip count",
>> + ? ? ? 10, 0, 0)
>> +DEFPARAM (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO,
>> + ? ? ? "prefetch-min-insn-to-mem-ratio",
>> + ? ? ? "min. ratio of insns to mem ops to enable prefetching in a loop",
>> + ? ? ? 4, 0, 0)
> You also need to document the params in doc/invoke.texi.
Indeed. The patch also needs to pass bootstrap & regtest on the