This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[RFC] non-unit stride loads for size power of 2.
- From: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
- To: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: "Richard Beiner (richard dot guenther at gmail dot com)" <richard dot guenther at gmail dot com>, "Uros Bizjak (ubizjak at gmail dot com)" <ubizjak at gmail dot com>
- Date: Tue, 12 Jan 2016 14:51:22 +0000
- Subject: [RFC] non-unit stride loads for size power of 2.
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp dot mailfrom=Venkataramanan dot Kumar at amd dot com;
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Hi
The code below it looks like we always call âvect_permute_load_chainâ to load non-unit strides of size powers of 2.
(---snip---)
/* If reassociation width for vector type is 2 or greater target machine can
execute 2 or more vector instructions in parallel. Otherwise try to
get chain for loads group using vect_shift_permute_load_chain. */
mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
|| exact_log2 (size) != -1
|| !vect_shift_permute_load_chain (dr_chain, size, stmt,
gsi, &result_chain))
vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
static bool
vect_shift_permute_load_chain (vec<tree> dr_chain,
unsigned int length,
gimple *stmt,
gimple_stmt_iterator *gsi,
vec<tree> *result_chain)
{
â...
â...
if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) â This is not used.
{
unsigned int j, log_length = exact_log2 (length);
for (i = 0; i < nelt / 2; ++i)
sel[i] = i * 2;
for (i = 0; i < nelt / 2; ++i)
sel[nelt / 2 + i] = i * 2 + 1;
(---snip------)
Is there any reason to do so?
I have not done any benchmarking, but tried simple test cases for -mavx targets with sizes 2, 4 and VF > 4 (short/char types).
Looks like using vect_shift_permute_load_chain seems better.
Should we change it to something like this ?
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index d0e20da..b0f0a02 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec<tree> dr_chain, int size,
get chain for loads group using vect_shift_permute_load_chain. */
mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
- || exact_log2 (size) != -1
- || !vect_shift_permute_load_chain (dr_chain, size, stmt,
- gsi, &result_chain))
+ || (!vect_shift_permute_load_chain (dr_chain, size, stmt,
+ gsi, &result_chain)
+ && exact_log2 (size) != -1))
vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
vect_record_grouped_load_vectors (stmt, result_chain);
result_chain.release ();
regards,
Venkat.