This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Andrew Pinski <apinski at cavium dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 27 Jan 2017 13:11:58 +0100
- Subject: Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx
- Authentication-results: sourceware.org; auth=none
- References: <CA+=Sn1kk9gtpVAuqE-RcAyps=6HzX+-4Kj_+QPZC+oOda6GtDg@mail.gmail.com> <CAFiYyc0KnZwFrrambGN6eRjO+yc_d0AGKGEYXT7uVAqwwJsi4A@mail.gmail.com>
On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski <apinski@cavium.com> wrote:
>> Hi,
>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>> -mcpu=thunderxt88p1. I filled out the tuning structures for both
>> thunderx and thunderx2t99. No other core current enables software
>> prefetching so I set them to 0 which does not change the default
>> parameters.
>>
>> OK? Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>> CN88xx with no regressions. I got a 2x improvement for 462.libquantum
>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>> CN99xx's SPEC did not change.
>
> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
And I wonder if most benefit comes from the unrolling the pass might do
rather than from the prefetches...
Richard.
> Richard.
>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>> l2_cache_size fields.
>> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW.
>> * config/aarch64/aarch64.c (generic_tunings): Update to include
>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>> l2_cache_size fields to 0.
>> (cortexa35_tunings): Likewise.
>> (cortexa53_tunings): Likewise.
>> (cortexa57_tunings): Likewise.
>> (cortexa72_tunings): Likewise.
>> (cortexa73_tunings): Likewise.
>> (exynosm1_tunings): Likewise.
>> (thunderx_tunings): Fill out some of the new fields.
>> (thunderxt88_tunings): New variable.
>> (xgene1_tunings): Update to include prefetch_latency,
>> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0.
>> (qdf24xx_tunings): Likewise.
>> (thunderx2t99_tunings): Fill out some of the new fields.
>> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like
>> AUTOPREFETCHER_OFF.
>> Set param values if the fields are non-zero. Turn on
>> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at
>> least 3 or profile feed usage is enabled.
>> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning.
>> (thunderxt88): Likewise.