This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
- From: pinskia at gmail dot com
- To: James Greenhalgh <james dot greenhalgh at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, "richard dot earnshaw at arm dot com" <richard dot earnshaw at arm dot com>
- Date: Fri, 28 Mar 2014 08:09:22 -0700
- Subject: Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
- Authentication-results: sourceware.org; auth=none
- References: <1395997970-27335-1-git-send-email-james dot greenhalgh at arm dot com> <CBECD840-2CDE-4DFE-B917-5B46B7897A99 at gmail dot com> <20140328144805 dot GA31228 at arm dot com>
> On Mar 28, 2014, at 7:48 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> On Fri, Mar 28, 2014 at 11:11:58AM +0000, pinskia@gmail.com wrote:
>>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>> Hi,
>>>
>>> There is no way to perform scalar addition in the vector register file,
>>> but with the RTX costs in place we start rewriting (x << 1) to (x + x)
>>> on almost all cores. The code which makes this decision has no idea that we
>>> will end up doing this (it happens well before reload) and so we end up with
>>> very ugly code generation in the case where addition was selected, but
>>> we are operating in vector registers.
>>>
>>> This patch relies on the same gimmick we are already using to allow
>>> shifts on 32-bit scalars in the vector register file - Use a vector 32x2
>>> operation instead, knowing that we can safely ignore the top bits.
>>>
>>> This restores some normality to scalar_shift_1.c, however the test
>>> that we generate a left shift by one is clearly bogus, so remove that.
>>>
>>> This patch is pretty ugly, but it does generate superficially better
>>> looking code for this testcase.
>>>
>>> Tested on aarch64-none-elf with no issues.
>>>
>>> OK for stage 1?
>>
>> It seems we should also discourage the neon alternatives as there might be
>> extra movement between the two register sets which we don't want.
>
> I see your point, but we've tried to avoid doing that elsewhere in the
> AArch64 backend. Our argument has been that strictly speaking, it isn't that
> the alternative is expensive, it is the movement between the register sets. We
> do model that elsewhere, and the register allocator should already be trying to
> avoid unneccesary moves between register classes.
>
What about on a specific core where that alternative is expensive; that is the vector instructions are worse than the scalar ones. How are we going to handle this case?
Thanks,
Andrew
> If those mechanisms are broken, we should fix them - in that case fixing
> this by discouraging valid alternatives would seem to be gaffer-taping over the
> real problem.
>
> Thanks,
> James
>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> Thanks,
>>> James
>>>
>>> ---
>>> gcc/
>>>
>>> 2014-03-27 James Greenhalgh <james.greenhalgh@arm.com>
>>>
>>> * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
>>> vector registers.
>>>
>>> gcc/testsuite/
>>> 2014-03-27 James Greenhalgh <james.greenhalgh@arm.com>
>>>
>>> * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
>>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch>
>>