[PATCH 1/2] Introduce prefetch-minimum stride option

Luis Machado luis.machado@linaro.org
Mon May 7 15:51:00 GMT 2018



On 05/07/2018 12:15 PM, H.J. Lu wrote:
> On Mon, May 7, 2018 at 7:09 AM, Luis Machado <luis.machado@linaro.org> wrote:
>>
>>
>> On 05/01/2018 03:30 PM, Jeff Law wrote:
>>>
>>> On 01/22/2018 06:46 AM, Luis Machado wrote:
>>>>
>>>> This patch adds a new option to control the minimum stride, for a memory
>>>> reference, after which the loop prefetch pass may issue software prefetch
>>>> hints for. There are two motivations:
>>>>
>>>> * Make the pass less aggressive, only issuing prefetch hints for bigger
>>>> strides
>>>> that are more likely to benefit from prefetching. I've noticed a case in
>>>> cpu2017
>>>> where we were issuing thousands of hints, for example.
>>>>
>>>> * For processors that have a hardware prefetcher, like Falkor, it allows
>>>> the
>>>> loop prefetch pass to defer prefetching of smaller (less than the
>>>> threshold)
>>>> strides to the hardware prefetcher instead. This prevents conflicts
>>>> between
>>>> the software prefetcher and the hardware prefetcher.
>>>>
>>>> I've noticed considerable reduction in the number of prefetch hints and
>>>> slightly positive performance numbers. This aligns GCC and LLVM in terms
>>>> of
>>>> prefetch behavior for Falkor.
>>>>
>>>> The default settings should guarantee no changes for existing targets.
>>>> Those
>>>> are free to tweak the settings as necessary.
>>>>
>>>> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>>>>
>>>> Ok?
>>>>
>>>> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>>>>
>>>>          Introduce option to limit software prefetching to known constant
>>>>          strides above a specific threshold with the goal of preventing
>>>>          conflicts with a hardware prefetcher.
>>>>
>>>>          gcc/
>>>>          * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>>>>          <minimum_stride>: New const int field.
>>>>          * config/aarch64/aarch64.c (generic_prefetch_tune): Update to
>>>> include
>>>>          minimum_stride field.
>>>>          (exynosm1_prefetch_tune): Likewise.
>>>>          (thunderxt88_prefetch_tune): Likewise.
>>>>          (thunderx_prefetch_tune): Likewise.
>>>>          (thunderx2t99_prefetch_tune): Likewise.
>>>>          (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
>>>>          (aarch64_override_options_internal): Update to set
>>>>          PARAM_PREFETCH_MINIMUM_STRIDE.
>>>>          * doc/invoke.texi (prefetch-minimum-stride): Document new option.
>>>>          * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
>>>>          * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
>>>>          * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return
>>>> false if
>>>>          stride is constant and is below the minimum stride threshold.
>>>
>>> OK for the trunk.
>>> jeff
>>>
>>
>> Thanks. Committed as revision 259995 now.
> 
> This breaks bootstrap on x86:
> 
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
> should_issue_prefetch_p(mem_ref*)’:
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1010:54: error:
> comparison of integer expressions of different signedness: ‘long long
> unsigned int’ and ‘int’ [-Werror=sign-compare]
>         && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1014:4: error: format
> ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘long
> long int’ [-Werror=format=]
>      "Step for reference %u:%u (%d) is less than the mininum "
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      "required stride of %d\n",
>      ~~~~~~~~~~~~~~~~~~~~~~~~~
>      ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

I've reverted this for now while i address the bootstrap problem.



More information about the Gcc-patches mailing list