This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Improve prefetch heuristics


Committed 0004 as of revision r159163.

Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

On Fri, May 7, 2010 at 11:58, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi, Sebastian:
>
> I updated patch 0004 as suggested by Zdenek. I am still working on 0001.
>
> Thanks,
>
> Changpeng
>
>
> ________________________________________
> From: Sebastian Pop [sebpop@gmail.com]
> Sent: Friday, May 07, 2010 11:21 AM
> To: Zdenek Dvorak; GCC Patches
> Cc: Fang, Changpeng; changpeng.fang@gmail.com
> Subject: Re: [patch] Improve prefetch heuristics
>
> Somehow gcc-patches got dropped from my CC.
> I'm sending it back to the list.
>
> Sebastian
>
> On Fri, May 7, 2010 at 11:17, Sebastian Pop <sebpop@gmail.com> wrote:
>> Changpeng,
>>
>> I committed 0002 and 0003 to trunk as revisions r159161 and r159162.
>> Could you please send the corrected version of 0001 and add the
>> suggested comment to 0004.
>>
>> Thanks,
>> Sebastian
>>
>> On Fri, May 7, 2010 at 04:51, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
>>> Hi,
>>>
>>>> 0001-Reduce-useless-and-redundant-prefetches.patch
>>>> ==========================================
>>>> This patch has too parts:
>>>> First one is in schedule_prefetches. ?If the actual unroll_factor is far less than what is
>>>> required by the prefetch (i.e. the loop is not sufficiently unrolled), redundant prefetches
>>>> will be introduced. For example, if the prefetch requires unrolling 16 times and the actually
>>>> unroll factor is 1 (not unrolled), 15 out of 16 interations of the loop will issue redundant
>>>> prefetches (16 prefetches fall on the same cache line).
>>>> We add the following lines in schedule_prefetches to disable prefetches for such cases:
>>>> if (prefetch_mod / unroll_factor > 8)
>>>> ? continue;
>>>
>>> this part is OK.
>>>
>>>> The second part is in issue_prefetch_refs. If, due to some reason, the computed "ahead"
>>>> is too small, and the prefetch would most likely fall on the same cache line with the existing
>>>> memory reference, the prefetch is considered useless. This patch can avoid such useless
>>>> prefetches.
>>>>
>>>> Prefetch distance is how far ahead should we issue the prefetch. The ideal prefetch distance
>>>> should be prefetch latency. We should not schedule prefetch too eailer or too later. In loop prefetch,
>>>> we compute ahead which is how many iteration ahead should we issue the prefetch. Essentially,
>>>> ahead = prefetch_latency/loop_body_size. If the loop body is too big, ahead is vey small (possibly
>>>> less than 1 -- we round it to 1), and thus the address difference with the memory reference is too
>>>> small.
>>>
>>> This part is not. ?issue_prefetch_refs a wrong place for this decision; the decision which prefetches
>>> will be issued should be done in schedule_prefetches. ?Anyway, if I follow your reasoning, one
>>> would conclude that we need to disable the prefetching for loops with PREFETCH_LATENCY < time
>>> (in loop_prefetch_arrays) completely. ?This IMHO makes no sense, as one would expect the
>>> prefetching to be profitable exactly for these loops, that do enough extra work to make it
>>> possible to hide memory latency through prefetching.
>>>
>>>> 0002-Dump-a-diagnostic-info-when-the-insn-to-mem-ratio-is.patch
>>>> =======================================================================
>>>> This patch adds diagnostic statements if the instruction to memory ratio is too small and the prefetch is
>>>> not generated. This helps us find the reason why prefetch not generated in a loop.\
>>>> =======================================================================
>>>
>>> OK.
>>>
>>>> 0003-Account-for-loop-unrolling-in-the-insn-to-prefetch-r.patch
>>>> =================================================================
>>>> This patch accounts for loop unrolling when applying the instruction to prefetch heuristic for loops with unknown
>>>> trip count. (unroll_factor * ninsns) is used to estimate the number of instructions in a loop. This approach is
>>>> too simple, and is aggressive in generating prefetches. We add comments from Zdenek with suggestions
>>>> for further improvements.
>>>> ==================================================================
>>>
>>> OK.
>>>
>>>> 0004-Define-the-TRIP_COUNT_TO_AHEAD_RATIO-heuristic.patch
>>>> ====================================================
>>>> This patch defines the trip count to ahead ratio heuristic in the cost
>>>> ?model: don't generate prefetches for loops where the trip count is
>>>> ?less than TRIP_COUNT_TO_AHEAD_RATIO times the ahead iterations.
>>>> ====================================================
>>>
>>> OK; but I would suggest to update the comment explaining TRIP_COUNT_TO_AHEAD_RATIO
>>> ("For example, in a loop with a prefetch ahead distance of 10, supposing that
>>> TRIP_COUNT_TO_AHEAD_RATIO is equal to ...") for its current value of 4.
>>>
>>> Zdenek
>>>
>>
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]