This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, i386] V4DF __builtin_shuffle
- From: Marc Glisse <marc dot glisse at inria dot fr>
- To: gcc-patches at gcc dot gnu dot org
- Date: Tue, 17 Apr 2012 20:03:59 +0200 (CEST)
- Subject: [PATCH, i386] V4DF __builtin_shuffle
Hello,
this patch expands __builtin_shuffle for V4DF mode in at most 3 insn. It
is simple and works really well, often generates only 2 insn. It is not
very generic, because other modes don't have an instruction equivalent to
vshufpd. For V8SF (and likely V4DI and V8SI with AVX2, but I still need to
do that), my patch "default case" in PR 52607 seems more interesting.
I tried calling this new function after expand_vec_perm_vperm2f128_vblend
(instead of before as in the patch), but it generated more instructions
for some permutations, and never less. That function is still useful for
V8SF though.
I bootstrapped gcc on a non-avx platform, compiled a program that tests
all 4096 shuffles with -mavx/-mavx2, and ran the result using Intel's
emulator (SDE).
There are still a few V4DF permutations that don't generate an optimal
sequence (3 insn instead of 2), but not that many I think. Of course, I am
assuming a constant cost of 1 per insn, which is completely false, but
seems like a sensible first approximation.
(note that I can't commit)
2012-04-17 Marc Glisse <marc.glisse@inria.fr>
PR target/502607
* config/i386/i386.c (ix86_expand_vec_perm_const): Move code to ...
(canonicalize_perm): ... new function.
(expand_vec_perm_2vperm2f128_vshuf): New function.
(ix86_expand_vec_perm_const_1): Call it.
--
Marc Glisse
Attachment:
p3
Description: Text document