This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64][PR65375] Fix RTX cost for vector SET



On 15/04/15 10:25, James Greenhalgh wrote:
On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote:
Now that Stage1 is open, is this OK for trunk.
Hi Kugan,

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..d6ad0af 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
/* Fall through. */
  	case REG:
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+              /* The cost is 1 per vector-register copied.  */
+              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+              *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
  	  /* const0_rtx is in general free, but we will use an
  	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
                /* The cost is 1 per register copied.  */
                int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
  			      / UNITS_PER_WORD;
I would not have expected control flow to reach this point, as we have:

  /* TODO: The cost infrastructure currently does not handle
     vector operations.  Assume that all vector operations
     are equally expensive.  */
  if (VECTOR_MODE_P (mode))
    {
      if (speed)
	*cost += extra_cost->vect.alu;
      return true;
    }
But, I see that this check is broken for a set RTX (which has no mode).
So, your patch works, but only due to a bug in my original implementation.
This leaves the code with quite a messy design.

There are two ways I see that we could clean things up, both of which
require some reworking of your patch.

Either we remove my check above and teach the RTX costs how to properly
cost vector operations, or we fix my check to catch all vector RTX
and add the special cases for the small subset of things we understand
up there.

The correct approach in the long term is to fix the RTX costs to correctly
understand vector operations, so I'd much prefer to see a patch along
these lines, though I appreciate that is a substantially more invasive
piece of work.


Would we want to catch all vector RTXes in that check at the top
and have special vector rtx handling there? (Perhaps even in a function
of its own like aarch64_vector_rtx_costs?). Or do you think it would
be cleaner to handle them in the aarch64_rtx_costs giant switch?
Vector-specific RTX codes like vec_concat, vec_select would integrate
cleanly, but handling other common rtxen could potentially be messy?

Kyrill


Thanks,
James



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]