This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor


Siddhesh Poyarekar wrote: 
On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote:
> Why is that a bad thing? With the patch as is, the testcase generates:
> 
> .L4:
>        ldr     q0, [x2, x3]
>        add     x5, x1, x3
>        add     x3, x3, 16
>        cmp     x3, x4
>        str     q0, [x5]
>        bne     .L4
> 
> With a change in address cost (for loads and stores) we would get:
> 
> .L4:
>        ldr     q0, [x3], 16
>        str     q0, [x4], 16
>        cmp     x3, x5
>        bne     .L4

> This is great for the load because of the way the falkor prefetcher
> works, but it is terrible for the store because of the way the pipeline
> works.  The only performant store for falkor is an indirect load with a
> constant or zero offset.  Everything else has hidden costs.

Are you saying the same issue exists for all stores with writeback? If so then
your patch would need to address that too.

> I briefly looked at the possibility of splitting the register_offset
> cost into load and store, but I realized that I'd have to modify the
> target hook for it to be useful, which is way too much work for this
> single quirk.

It seems way more fundamental if it affects anything that isn't a simple
immediate offset. Again I suggest using the existing cost infrastructure
to find a setting that improves performance. If discouraging pre/post increment
helps Falkor then that's better than doing nothing.

>> I think a special case for Falkor in aarch64_address_cost would be acceptable
>> in GCC8 - that would be much smaller and cleaner than the current patch. 
>> If required we could improve upon this in GCC9 and add a way to differentiate
>> between loads and stores.
>
> I can't do this in address_cost since I can't determine whether the
> address is a load or a store location.  The most minimal way seems to be
> using the patterns in the md file.

Well I don't think the approach of blocking specific patterns is a good solution to
this problem and may not be accepted by AArch64 maintainers. Try your example
with -funroll-loops and compare with my suggestion (with or without extra code to
increase cost of writeback too). It seems to me adjusting costs is always going to
result in better overall code quality, even if it also applies to loads for the time being.

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]