This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] Improve prefetch heuristics
- From: Sebastian Pop <sebpop at gmail dot com>
- To: "Fang, Changpeng" <Changpeng dot Fang at amd dot com>
- Cc: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>, "changpeng dot fang at gmail dot com" <changpeng dot fang at gmail dot com>
- Date: Fri, 7 May 2010 12:27:28 -0500
- Subject: Re: [patch] Improve prefetch heuristics
- References: <20100430010543.GA30055@kam.mff.cuni.cz> <1C13CD442679CE45A2E80AE9251D7EF921803A3C@SAUSEXMBP01.amd.com> <20100507095121.GA27016@kam.mff.cuni.cz> <u2pcb9d34b21005070917xea0bb06eh4c5e77606af52044@mail.gmail.com> <r2xcb9d34b21005070921paaf8df6era7fb1f385bda1ecd@mail.gmail.com> <1C13CD442679CE45A2E80AE9251D7EF921803A3E@SAUSEXMBP01.amd.com>
Committed 0004 as of revision r159163.
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools
On Fri, May 7, 2010 at 11:58, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi, Sebastian:
>
> I updated patch 0004 as suggested by Zdenek. I am still working on 0001.
>
> Thanks,
>
> Changpeng
>
>
> ________________________________________
> From: Sebastian Pop [sebpop@gmail.com]
> Sent: Friday, May 07, 2010 11:21 AM
> To: Zdenek Dvorak; GCC Patches
> Cc: Fang, Changpeng; changpeng.fang@gmail.com
> Subject: Re: [patch] Improve prefetch heuristics
>
> Somehow gcc-patches got dropped from my CC.
> I'm sending it back to the list.
>
> Sebastian
>
> On Fri, May 7, 2010 at 11:17, Sebastian Pop <sebpop@gmail.com> wrote:
>> Changpeng,
>>
>> I committed 0002 and 0003 to trunk as revisions r159161 and r159162.
>> Could you please send the corrected version of 0001 and add the
>> suggested comment to 0004.
>>
>> Thanks,
>> Sebastian
>>
>> On Fri, May 7, 2010 at 04:51, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
>>> Hi,
>>>
>>>> 0001-Reduce-useless-and-redundant-prefetches.patch
>>>> ==========================================
>>>> This patch has too parts:
>>>> First one is in schedule_prefetches. ?If the actual unroll_factor is far less than what is
>>>> required by the prefetch (i.e. the loop is not sufficiently unrolled), redundant prefetches
>>>> will be introduced. For example, if the prefetch requires unrolling 16 times and the actually
>>>> unroll factor is 1 (not unrolled), 15 out of 16 interations of the loop will issue redundant
>>>> prefetches (16 prefetches fall on the same cache line).
>>>> We add the following lines in schedule_prefetches to disable prefetches for such cases:
>>>> if (prefetch_mod / unroll_factor > 8)
>>>> ? continue;
>>>
>>> this part is OK.
>>>
>>>> The second part is in issue_prefetch_refs. If, due to some reason, the computed "ahead"
>>>> is too small, and the prefetch would most likely fall on the same cache line with the existing
>>>> memory reference, the prefetch is considered useless. This patch can avoid such useless
>>>> prefetches.
>>>>
>>>> Prefetch distance is how far ahead should we issue the prefetch. The ideal prefetch distance
>>>> should be prefetch latency. We should not schedule prefetch too eailer or too later. In loop prefetch,
>>>> we compute ahead which is how many iteration ahead should we issue the prefetch. Essentially,
>>>> ahead = prefetch_latency/loop_body_size. If the loop body is too big, ahead is vey small (possibly
>>>> less than 1 -- we round it to 1), and thus the address difference with the memory reference is too
>>>> small.
>>>
>>> This part is not. ?issue_prefetch_refs a wrong place for this decision; the decision which prefetches
>>> will be issued should be done in schedule_prefetches. ?Anyway, if I follow your reasoning, one
>>> would conclude that we need to disable the prefetching for loops with PREFETCH_LATENCY < time
>>> (in loop_prefetch_arrays) completely. ?This IMHO makes no sense, as one would expect the
>>> prefetching to be profitable exactly for these loops, that do enough extra work to make it
>>> possible to hide memory latency through prefetching.
>>>
>>>> 0002-Dump-a-diagnostic-info-when-the-insn-to-mem-ratio-is.patch
>>>> =======================================================================
>>>> This patch adds diagnostic statements if the instruction to memory ratio is too small and the prefetch is
>>>> not generated. This helps us find the reason why prefetch not generated in a loop.\
>>>> =======================================================================
>>>
>>> OK.
>>>
>>>> 0003-Account-for-loop-unrolling-in-the-insn-to-prefetch-r.patch
>>>> =================================================================
>>>> This patch accounts for loop unrolling when applying the instruction to prefetch heuristic for loops with unknown
>>>> trip count. (unroll_factor * ninsns) is used to estimate the number of instructions in a loop. This approach is
>>>> too simple, and is aggressive in generating prefetches. We add comments from Zdenek with suggestions
>>>> for further improvements.
>>>> ==================================================================
>>>
>>> OK.
>>>
>>>> 0004-Define-the-TRIP_COUNT_TO_AHEAD_RATIO-heuristic.patch
>>>> ====================================================
>>>> This patch defines the trip count to ahead ratio heuristic in the cost
>>>> ?model: don't generate prefetches for loops where the trip count is
>>>> ?less than TRIP_COUNT_TO_AHEAD_RATIO times the ahead iterations.
>>>> ====================================================
>>>
>>> OK; but I would suggest to update the comment explaining TRIP_COUNT_TO_AHEAD_RATIO
>>> ("For example, in a loop with a prefetch ahead distance of 10, supposing that
>>> TRIP_COUNT_TO_AHEAD_RATIO is equal to ...") for its current value of 4.
>>>
>>> Zdenek
>>>
>>
>
>