This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: ÐÑÐÐÑ: [PATCH 1/2, x86] Add palignr support for AVX2.
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Evgeny <evstupac at gmail dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, "H. J. Lu" <hjl dot tools at gmail dot com>, Richard Henderson <rth at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>
- Date: Thu, 2 Oct 2014 09:50:45 +0200
- Subject: Re: ÐÑÐÐÑ: [PATCH 1/2, x86] Add palignr support for AVX2.
- Authentication-results: sourceware.org; auth=none
- References: <uky3h7i9cc0xquh4vqdlf05c dot 1412234110405 at email dot android dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Oct 02, 2014 at 11:15:10AM +0400, Evgeny wrote:
> Hold on. The patch has a conflict with previously approved here.
Which?
In any case, I've already committed the patch.
That said, if what you care now about is pr52252-atom.c with -O2
-ftree-vectorize -mavx2, let's look at all the permutations it needs
below. The patch I've committed improved f3, from
3x vpshufb + 2x vpermq + 2x vpor into vperm2i128 $33 + vpalignr $11,
left other permutations as is.
So, do you have suggestions on what insns we should use for the other
permutations? Then we can figure out how to best achieve that.
Note that with dozens of permutation instructions in the ISA the
computational complexity of finding optimal sequence might be too
expensive.
typedef unsigned char V __attribute__ ((vector_size (32)));
V a, b, c;
void
f1 (void)
{
c = __builtin_shuffle (a, b, (V) { 0, 32, 1, 33, 2, 34, 3, 35, 4, 36, 5, 37, 6, 38, 7, 39, 8, 40, 9, 41, 10, 42, 11, 43, 12, 44, 13, 45, 14, 46, 15, 47 });
}
void
f2 (void)
{
c = __builtin_shuffle (a, b, (V) { 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 2, 5, 8, 11, 14, 17, 20, 23, 26, 29 });
}
void
f3 (void)
{
c = __builtin_shuffle (a, b, (V) { 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42 });
}
void
f4 (void)
{
c = __builtin_shuffle (a, b, (V) { 16, 48, 17, 49, 18, 50, 19, 51, 20, 52, 21, 53, 22, 54, 23, 55, 24, 56, 25, 57, 26, 58, 27, 59, 28, 60, 29, 61, 30, 62, 31, 63 });
}
void
f5 (void)
{
c = __builtin_shuffle (a, b, (V) { 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52 });
}
void
f6 (void)
{
c = __builtin_shuffle (a, b, (V) { 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 });
}
Jakub