This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Evgeny Stupachenko <evstupac at gmail dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, Richard Henderson <rth at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>
- Date: Wed, 1 Oct 2014 14:25:01 +0200
- Subject: Re: [PATCH 1/2, x86] Add palignr support for AVX2.
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xw8+ojk8W6gP_eLBxdvT0rsTLmGsYKzGig4mJ7Y7xUmFA at mail dot gmail dot com> <CAMe9rOq-2yktd-BF89xRdh+xCnMwzSGSfz=atr83mcTU6_b8cg at mail dot gmail dot com> <CAOvf_xz4W7dn3F-VnWowSG211s8WcU2Qo_8+c1rcNAYwh-k7+g at mail dot gmail dot com> <CAMe9rOoaQ90P9wb4m5ch5W-bPh5-1xvmCMQnd9Sc9meoJ0unNQ at mail dot gmail dot com> <CAOvf_xxiLsTCZSEHJ8DLdD7kRHRTHHSjZXWyNPu3H-6xnSfCsA at mail dot gmail dot com> <CAOvf_xyNC1mRGNrM1kU_nNz_tO6_M4T8wox75D+zndhY5=TVAQ at mail dot gmail dot com> <CAFULd4bfOLW2kOmSndwK=LdNbUwHR1Ogds+5_AZ7j=tH=zu12w at mail dot gmail dot com> <20141001103514 dot GO1986 at tucnak dot redhat dot com> <20141001113815 dot GQ1986 at tucnak dot redhat dot com> <CAFULd4b_T0XByAhGew-wL6D-udF6oPwuw=v6NPYdupAn9JtzXA at mail dot gmail dot com> <20141001121715 dot GR1986 at tucnak dot redhat dot com>
On Wed, Oct 1, 2014 at 2:17 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 01:45:54PM +0200, Uros Bizjak wrote:
>> OK.
>
> Thanks. Second step is a tiny optimization, for the
> simplified 122 (now 24) vshuf-v4di.c testcase:
> typedef unsigned long long V __attribute__ ((vector_size (32)));
> V a, b, c, d;
>
> int
> main ()
> {
> int i;
> for (i = 0; i < 4; ++i)
> {
> a[i] = i + 2;
> b[i] = 4 + i + 2;
> }
> asm volatile ("" : : : "memory");
> c = __builtin_shuffle (a, b, (V) { 2, 5, 6, 3 });
> d = __builtin_shuffle ((V) { 2, 3, 4, 5 }, (V) { 6, 7, 8, 9 }, (V) { 2, 5, 6, 3 });
> if (__builtin_memcmp (&c, &d, sizeof (c)))
> __builtin_abort ();
> return 0;
> }
>
> this patch allows better code to be generated:
> - vmovdqa b(%rip), %ymm0
> + vpermq $238, a(%rip), %ymm1
> movl $32, %edx
> - movl $d, %esi
> - vmovdqa a(%rip), %ymm1
> + vmovdqa b(%rip), %ymm0
> + movl $d, %esi
> movl $c, %edi
> - vperm2i128 $17, %ymm0, %ymm1, %ymm1
> vpblendd $195, %ymm1, %ymm0, %ymm0
> vmovdqa %ymm0, c(%rip)
>
> That is because vperm2i128 $17 unnecessarily uses
> two operands when all the data it grabs are from a single one.
> So, by canonicalizing the permutation we can emit
> vpermq $238 instead. Perhaps more places might benefit from
> extra canonicalize_perm calls (two spots already use that beyond
> the single one on the expansion/testing entry point).
>
> Tested again with
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
> RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> on x86_64-linux. Ok for trunk?
>
> 2014-10-01 Jakub Jelinek <jakub@redhat.com>
>
> * config/i386/i386.c (expand_vec_perm_vperm2f128): Canonicalize
> dfirst permutation.
OK.
Thanks,
Uros.