This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
- From: Ramana Radhakrishnan <ramana dot gcc at googlemail dot com>
- To: Charles Baylis <charles dot baylis at linaro dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>
- Date: Wed, 18 Jun 2014 11:06:20 +0100
- Subject: Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
- Authentication-results: sourceware.org; auth=none
- References: <CADnVucAaG=uAZyxQGvyf5bqrmW8JfhfjCp84uCpVnf+=Tois5w at mail dot gmail dot com> <CAJA7tRbHVGuBJFB0cqXfZeuvJj6WO0BJrCtTOmQqi8tt7ZffPQ at mail dot gmail dot com> <CADnVucC_LXyyY1WrxbzdkmTYo=8Ur1Nvr09auZk3rNM7MnsqSg at mail dot gmail dot com>
- Reply-to: ramrad01 at arm dot com
On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
>> <charles.baylis@linaro.org> wrote:
>>> This patch adds support for post-indexed addressing for NEON structure
>>> memory accesses.
>>>
>>> For example VLD1.8 {d0}, [r0], r1
>>>
>>>
>>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>>
>>> Ok for trunk?
>>
>> This looks like a reasonable start but this work doesn't look complete
>> to me yet.
>>
>> Can you also look at the impact on performance of a range of
>> benchmarks especially a popular embedded one to see how this behaves
>> unless you have already done so ?
>
> I ran a popular suite of embedded benchmarks, and there is no impact
> at all on Chromebook (including with the additional attached patch)
Thanks for the due diligence
>
> The patch was developed to address a performance issue with a new
> version of libvpx which uses intrinsics instead of NEON assembler. The
> patch results in a 3% improvement for VP8 decode.
Good - 3% not to be sneezed at.
>
>> POST_INC, POST_MODIFY usually have a funny way of biting you with
>> either ivopts or the way in which address costs work. I think there
>> maybe further tweaks needed but for a first step I'd like to know what
>> the performance impact is.
>
>> I would also suggest running this through clyon's neon intrinsics
>> testsuite to see if that catches any issues especially with the large
>> vector modes.
Thanks.
>
> No issues found in clyon's tests.
Please keep an eye out for any regressions.
>
> Your mention of larger vector modes prompted me to check that the
> patch has the desired result with them. In fact, the costs are
> estimated incorrectly which means the post_modify pattern is not used.
> The attached patch fixes that. (used in combination with my original
> patch)
>
>
> 2014-06-15 Charles Baylis <charles.bayls@linaro.org>
>
> * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
> embedded side effects.
I'm not too thrilled with putting in more special cases that are not
table driven in there. Can you file a PR with some testcases that show
this so that we don't forget and CC me on it please ?
Ramana