This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][i386] Fix vec_construct cost, remove unused ix86_vec_cost arg


The following fixes vec_construct cost calculation to properly consider
that the inserts will happen to SSE regs thus forgo the multiplication
done in ix86_vec_cost which is passed the wrong mode.  This gets rid of
the only call passing false to ix86_vec_cost (so consider the patch
amended to remove the arg if approved).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

I am considering to make the factor we apply in ix86_vec_cost
which currently depends on X86_TUNE_AVX128_OPTIMAL and
X86_TUNE_SSE_SPLIT_REGS part of the actual cost tables since
the reason we apply them are underlying CPU architecture details.
Was the original reason of doing the multiplication based on
those tunings to be able to "share" the same basic cost table
across architectures that differ in this important detail?
I see X86_TUNE_SSE_SPLIT_REGS is only used for m_ATHLON_K8
and X86_TUNE_AVX128_OPTIMAL is used for m_BDVER, m_BTVER2
and m_ZNVER1.  Those all have (multiple) exclusive processor_cost_table
entries.

As a first step I'd like to remove the use of ix86_vec_cost for
the entries that already have entries for multiple modes
(loads and stores) and apply the factor there.  For example
Zen can do two 128bit loads per cycle but only one 128bit store.
With multiplying AVX256 costs by two we seem to cost sth like
# instructions to dispatch * instruction latency which is an
odd thing.  I'd have expected # instructions to dispatch / instruction 
throughput * instruction latency - so a AVX256 add would cost
the same as a AVX128 add, likewise for loads but stores would be
more expensive because of the throughput issue.  This all
ignores resource utilization across multiple insns but that's
how the cost model works ...

Thanks,
Richard.

2018-10-11  Richard Biener  <rguenther@suse.de>

	* config/i386/i386.c (ix86_vec_cost): Remove !parallel path
	and argument.
	(ix86_builtin_vectorization_cost): For vec_construct properly
	cost insertion into SSE regs.
	(...): Adjust calls to ix86_vec_cost.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 265022)
+++ gcc/config/i386/i386.c	(working copy)
@@ -39846,11 +39846,10 @@ ix86_set_reg_reg_cost (machine_mode mode
 static int
 ix86_vec_cost (machine_mode mode, int cost, bool parallel)
 {
+  gcc_assert (parallel);
   if (!VECTOR_MODE_P (mode))
     return cost;
- 
-  if (!parallel)
-    return cost * GET_MODE_NUNITS (mode);
+
   if (GET_MODE_BITSIZE (mode) == 128
       && TARGET_SSE_SPLIT_REGS)
     return cost * 2;
@@ -45190,8 +45189,9 @@ ix86_builtin_vectorization_cost (enum ve
 
       case vec_construct:
 	{
-	  /* N element inserts.  */
-	  int cost = ix86_vec_cost (mode, ix86_cost->sse_op, false);
+	  gcc_assert (VECTOR_MODE_P (mode));
+	  /* N element inserts into SSE vectors.  */
+	  int cost = GET_MODE_NUNITS (mode) * ix86_cost->sse_op;
 	  /* One vinserti128 for combining two SSE vectors for AVX256.  */
 	  if (GET_MODE_BITSIZE (mode) == 256)
 	    cost += ix86_vec_cost (mode, ix86_cost->addss, true);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]