This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, ARM]: rewrite NEON vector operation intrinsics without UNSPECs


Ramana Radhakrishnan wrote:
On Fri, May 28, 2010 at 6:49 PM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
The problem this is trying to solve is that, for some of the NEON
instructions, there is both a pattern used by the corresponding intrinsic
that uses an UNSPEC, and a separate canonical RTL pattern for the same
(logically equivalent) operation.  Having two copies is a bit of a
maintenance headache; the current patch fixes some minor differences in the
generated code and one outright bug. Other operations currently have only
the intrinsic pattern with an UNSPEC, which means that GCC will never emit
those instructions any other way.

Could you mention what bug this patch particularly solves and if this occurs in any of the FSF release branches or on trunk ? Is there a possibility of splitting out the bug fix into a separate patch that could potentially be backported to other release branches ?

Sure. It's this hunk for neon.md:


(vec_extractv2di): Correct error in register numbering.

Here's the problem, with a little more context.

In vec_extractv2di, we have:

  int regno = REGNO (operands[1]) + INTVAL (operands[2]);
  operands[1] = gen_rtx_REG (DImode, regno);
  return "vmov%?.64\t%Q0, %R0, %P1";

Compare with neon_vget_lanev2di, which is essentially the same operation:

  unsigned int regno = REGNO (operands[1]);
  unsigned int elt = INTVAL (operands[2]);
  ops[1] = gen_rtx_REG (DImode, regno + 2 * elt);
  output_asm_insn ("vmov%?\t%Q0, %R0, %P1  @ v2di", ops);

Note that neon_vget_lanev2di is multiplying the element index by 2, but vec_extractv2di is not. On further inspection of the generated code and the code in arm.c that handles the "P" specifier, it was apparent that the neon_vget_lanev2di version was correct. So I fixed vec_extractv2di as part of merging the two patterns:

@@ -802,11 +802,11 @@
           (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
   "TARGET_NEON"
 {
-  int regno = REGNO (operands[1]) + INTVAL (operands[2]);
+  int regno = REGNO (operands[1]) + 2 * INTVAL (operands[2]);

operands[1] = gen_rtx_REG (DImode, regno);

-  return "vmov%?.64\t%Q0, %R0, %P1";
+  return "vmov%?\t%Q0, %R0, %P1  @ v2di";
 }
   [(set_attr "predicable" "yes")
    (set_attr "neon_type" "neon_int_1")]

I can definitely apply this hunk separately and put it on other active branches.

I haven't had the time to go through the patch in its entirety yet,
but from a cursory glance at the it I see endian-dependent changes in
the sources. Do you think it's worthwhile to run the testsuite in big
endian mode for this with Neon ?

We have definitely tested the original 4.4-based version of this set of changes in big-endian mode; I just double-checked the saved test results to verify that they are there and there were not any FAILs. The version I posted is a straightforward rebasing of the older patch against mainline HEAD, so I'm pretty confident that I didn't introduce any new bugs. OTOH, I suppose it is possible there are endianness-related bugs in the code not being caught by the test suite.... :-(


-Sandra


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]