This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ARM]: rewrite NEON vector operation intrinsics without UNSPECs
- From: Sandra Loosemore <sandra at codesourcery dot com>
- To: Ramana Radhakrishnan <ramana dot gcc at googlemail dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Tue, 01 Jun 2010 16:13:38 -0400
- Subject: Re: [PATCH, ARM]: rewrite NEON vector operation intrinsics without UNSPECs
- References: <4C000223.3020401@codesourcery.com> <AANLkTiminPJfsuZ9n75z6JoiWRBDns1osBY80eWDcakP@mail.gmail.com>
Ramana Radhakrishnan wrote:
On Fri, May 28, 2010 at 6:49 PM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
The problem this is trying to solve is that, for some of the NEON
instructions, there is both a pattern used by the corresponding intrinsic
that uses an UNSPEC, and a separate canonical RTL pattern for the same
(logically equivalent) operation. Having two copies is a bit of a
maintenance headache; the current patch fixes some minor differences in the
generated code and one outright bug. Other operations currently have only
the intrinsic pattern with an UNSPEC, which means that GCC will never emit
those instructions any other way.
Could you mention what bug this patch particularly solves and if this
occurs in any of the FSF release branches or on trunk ? Is there a
possibility of splitting out the bug fix into a separate patch that
could potentially be backported to other release branches ?
Sure. It's this hunk for neon.md:
(vec_extractv2di): Correct error in register numbering.
Here's the problem, with a little more context.
In vec_extractv2di, we have:
int regno = REGNO (operands[1]) + INTVAL (operands[2]);
operands[1] = gen_rtx_REG (DImode, regno);
return "vmov%?.64\t%Q0, %R0, %P1";
Compare with neon_vget_lanev2di, which is essentially the same operation:
unsigned int regno = REGNO (operands[1]);
unsigned int elt = INTVAL (operands[2]);
ops[1] = gen_rtx_REG (DImode, regno + 2 * elt);
output_asm_insn ("vmov%?\t%Q0, %R0, %P1 @ v2di", ops);
Note that neon_vget_lanev2di is multiplying the element index by 2, but
vec_extractv2di is not. On further inspection of the generated code and the
code in arm.c that handles the "P" specifier, it was apparent that the
neon_vget_lanev2di version was correct. So I fixed vec_extractv2di as part of
merging the two patterns:
@@ -802,11 +802,11 @@
(parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
"TARGET_NEON"
{
- int regno = REGNO (operands[1]) + INTVAL (operands[2]);
+ int regno = REGNO (operands[1]) + 2 * INTVAL (operands[2]);
operands[1] = gen_rtx_REG (DImode, regno);
- return "vmov%?.64\t%Q0, %R0, %P1";
+ return "vmov%?\t%Q0, %R0, %P1 @ v2di";
}
[(set_attr "predicable" "yes")
(set_attr "neon_type" "neon_int_1")]
I can definitely apply this hunk separately and put it on other active branches.
I haven't had the time to go through the patch in its entirety yet,
but from a cursory glance at the it I see endian-dependent changes in
the sources. Do you think it's worthwhile to run the testsuite in big
endian mode for this with Neon ?
We have definitely tested the original 4.4-based version of this set of changes
in big-endian mode; I just double-checked the saved test results to verify that
they are there and there were not any FAILs. The version I posted is a
straightforward rebasing of the older patch against mainline HEAD, so I'm pretty
confident that I didn't introduce any new bugs. OTOH, I suppose it is possible
there are endianness-related bugs in the code not being caught by the test
suite.... :-(
-Sandra