[PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

Maxim Kuvyrkov maxim.kuvyrkov@linaro.org
Fri Feb 3 12:00:00 GMT 2017


Hi Andrew,

I took the liberty of rebasing your patch on top of my patchset.  Does it look correct?

I think I addressed all the comments you had about my review and posted updated patches.

--
Maxim Kuvyrkov
www.linaro.org



> On Jan 30, 2017, at 7:25 PM, Andrew Pinski <apinski@cavium.com> wrote:
> 
> On Mon, Jan 30, 2017 at 6:49 AM, Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>>> On Jan 27, 2017, at 6:59 PM, Andrew Pinski <apinski@cavium.com> wrote:
>>> 
>>> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski <apinski@cavium.com> wrote:
>>>>>> Hi,
>>>>>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>>>>>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>>>>>> thunderx and thunderx2t99.  No other core current enables software
>>>>>> prefetching so I set them to 0 which does not change the default
>>>>>> parameters.
>>>>>> 
>>>>>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>>>>>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>>>>>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>>>>>> CN99xx's SPEC did not change.
>>>>> 
>>>>> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
>>>> 
>>>> And I wonder if most benefit comes from the unrolling the pass might do
>>>> rather than from the prefetches...
>>> 
>>> Not in this case.  The main reason why I know is because the number of
>>> L1 and L2 misses drops a lot.
>> 
>> I can confirm this.  In my experiments loop unrolling hurts several tests.
> 
> Not on the cores I tried it.  I tried it on both ThunderX CN88xx and
> ThunderX CN99xx, I did not get any regressions due to unrolling.
> 
> Thanks,
> Andrew
> 
>> 
>> The prefetching approach I'm testing for -O2 includes disabling of loop unrolling to prevent code bloat.
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0007-Prefetch-tuning-for-ThunderX.patch
Type: application/octet-stream
Size: 3867 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20170203/7d8a77d9/attachment.obj>


More information about the Gcc-patches mailing list