This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Evgeny Stupachenko <evstupac at gmail dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, Richard Henderson <rth at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>
- Date: Wed, 1 Oct 2014 21:03:17 +0200
- Subject: Re: [PATCH 1/2, x86] Add palignr support for AVX2.
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xz4W7dn3F-VnWowSG211s8WcU2Qo_8+c1rcNAYwh-k7+g at mail dot gmail dot com> <CAMe9rOoaQ90P9wb4m5ch5W-bPh5-1xvmCMQnd9Sc9meoJ0unNQ at mail dot gmail dot com> <CAOvf_xxiLsTCZSEHJ8DLdD7kRHRTHHSjZXWyNPu3H-6xnSfCsA at mail dot gmail dot com> <CAOvf_xyNC1mRGNrM1kU_nNz_tO6_M4T8wox75D+zndhY5=TVAQ at mail dot gmail dot com> <CAFULd4bfOLW2kOmSndwK=LdNbUwHR1Ogds+5_AZ7j=tH=zu12w at mail dot gmail dot com> <20141001103514 dot GO1986 at tucnak dot redhat dot com> <20141001113815 dot GQ1986 at tucnak dot redhat dot com> <CAFULd4b_T0XByAhGew-wL6D-udF6oPwuw=v6NPYdupAn9JtzXA at mail dot gmail dot com> <20141001121715 dot GR1986 at tucnak dot redhat dot com> <CAFULd4Yo0VVJ_Z6dkc1VMpFmO6BNkbaddieXgOX7uTQTrUL11A at mail dot gmail dot com> <20141001125618 dot GT1986 at tucnak dot redhat dot com>
On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote:
>> OK.
>
> And now the expand_vec_perm_palignr improvement, tested
> with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \
> RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> E.g.
> typedef unsigned long long V __attribute__ ((vector_size (32)));
> extern void abort (void);
> V a, b, c, d;
> void test_14 (void)
> {
> V mask = { 6, 1, 3, 4 };
> int i;
> c = __builtin_shuffle (a, mask);
> d = __builtin_shuffle (a, b, mask);
> }
> (distilled from test 15 in vshuf-v4di.c) results in:
> - vmovdqa a(%rip), %ymm0
> - vpermq $54, %ymm0, %ymm1
> - vpshufb .LC1(%rip), %ymm0, %ymm0
> - vmovdqa %ymm1, c(%rip)
> - vmovdqa b(%rip), %ymm1
> - vpshufb .LC0(%rip), %ymm1, %ymm1
> - vpermq $78, %ymm1, %ymm1
> - vpor %ymm1, %ymm0, %ymm0
> + vmovdqa a(%rip), %ymm1
> + vpermq $54, %ymm1, %ymm0
> + vmovdqa %ymm0, c(%rip)
> + vmovdqa b(%rip), %ymm0
> + vpalignr $8, %ymm1, %ymm0, %ymm0
> + vpermq $99, %ymm0, %ymm0
> vmovdqa %ymm0, d(%rip)
> vzeroupper
> ret
> change (and two fewer .rodata constants).
On a related note, I would like to point out that
gcc.target/i386/pr61403.c also fails to generate blend insn with
-mavx2. The new insn sequence includes lots of new vpshufb insns with
memory access.
Uros.