This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH][simplify-rtx] Simplify vec_merge of vec_duplicates into vec_concat
- From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 06 Jun 2017 09:38:02 +0100
- Subject: [PATCH][simplify-rtx] Simplify vec_merge of vec_duplicates into vec_concat
- Authentication-results: sourceware.org; auth=none
Hi all,
Another vec_merge simplification that's missing from simplify-rtx.c is transforming
a vec_merge of two vec_duplicates. For example:
(set (reg:V2DF 80)
(vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 84))
(vec_duplicate:V2DF (reg:DF 81))
(const_int 2)))
Can be transformed into the simpler:
(set (reg:V2DF 80)
(vec_concat:V2DF (reg:DF 81)
(reg:DF 84)))
I believe this should always be beneficial.
I'm still looking into finding a small testcase demonstrating this, but on aarch64 SPEC
I've seen this eliminate some really bizzare codegen where GCC was generating nonsense like:
ldr q18, [sp, 448]
ins v18.d[0], v23.d[0]
ins v18.d[1], v22.d[0]
With q18 being pushed and popped off the stack in the prologue and epilogue of the function!
These are large files from SPEC that I haven't been able to analyse yet as to why GCC even attempts
to do that, but with this patch it doesn't try to load a register and overwrite all its lanes.
This patch shaves off about 5k of code size from zeusmp on aarch64 at -O3, so I believe it's a good
thing to do.
Ok?
Thanks,
Kyrill
2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
of two vec_duplicates into a vec_concat.
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 0727ca690e9d7f2c14907e3888e67da31ecb1ed6..ac7c4131c2ffef44e66cdc95f09b7bf4d4ce5192 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5760,6 +5760,24 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
if (!side_effects_p (otherop))
return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
}
+
+ /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y)
+ (const_int n))
+ with (vec_concat x y) or (vec_concat y x) depending on value
+ of N. */
+ if (GET_CODE (op0) == VEC_DUPLICATE
+ && GET_CODE (op1) == VEC_DUPLICATE
+ && GET_MODE_NUNITS (GET_MODE (op0)) == 2
+ && GET_MODE_NUNITS (GET_MODE (op1)) == 2
+ && IN_RANGE (sel, 1, 2))
+ {
+ rtx newop0 = XEXP (op0, 0);
+ rtx newop1 = XEXP (op1, 0);
+ if (sel == 2)
+ std::swap (newop0, newop1);
+
+ return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
+ }
}
if (rtx_equal_p (op0, op1)