[PATCH] i?86 unaligned/aligned load improvement for AVX512F

Uros Bizjak ubizjak@gmail.com
Sat Jan 4 08:46:00 GMT 2014


On Fri, Jan 3, 2014 at 9:59 AM, Jakub Jelinek <jakub@redhat.com> wrote:

> This is an attempt to port my recent
> http://gcc.gnu.org/viewcvs?rev=204219&root=gcc&view=rev
> http://gcc.gnu.org/viewcvs?rev=205663&root=gcc&view=rev
> http://gcc.gnu.org/viewcvs?rev=206090&root=gcc&view=rev
> changes also to AVX512F.  The motivation is to get:
>
> #include <immintrin.h>
>
> __m512i
> foo (void *x, void *y)
> {
>   __m512i a = _mm512_loadu_si512 (x);
>   __m512i b = _mm512_loadu_si512 (y);
>   return _mm512_add_epi32 (a, b);
> }
>
> use one of the unaligned memories directly as operand to the vpaddd
> instruction.  The first hunk is needed so that we don't regress on say:
>
> #include <immintrin.h>
>
> __m512i z;
>
> __m512i
> foo (void *x, void *y, int k)
> {
>   __m512i a = _mm512_mask_loadu_epi32 (z, k, x);
>   __m512i b = _mm512_mask_loadu_epi32 (z, k, y);
>   return _mm512_add_epi32 (a, b);
> }
>
> __m512i
> bar (void *x, void *y, int k)
> {
>   __m512i a = _mm512_maskz_loadu_epi32 (k, x);
>   __m512i b = _mm512_maskz_loadu_epi32 (k, y);
>   return _mm512_add_epi32 (a, b);
> }
>
> Does it matter which of vmovdqu32 vs. vmovdqu64 is used if no
> masking/zeroing is performed (i.e. vmovdqu32 (%rax), %zmm0 vs.
> vmovdqu64 (%rax), %zmm0) for performance reasons (i.e. isn't there some
> reinterpretation penalty)?
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-01-03  Jakub Jelinek  <jakub@redhat.com>
>
>         * config/i386/sse.md (avx512f_load<mode>_mask): Emit vmovup{s,d}
>         or vmovdqu* for misaligned_operand.
>         (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>,
>         <sse2_avx_avx512f>_loaddqu<mode><mask_name>): Handle <mask_applied>.
>         * config/i386/i386.c (ix86_expand_special_args_builtin): Set
>         aligned_mem for AVX512F masked aligned load and store builtins and for
>         non-temporal moves.
>
>         * gcc.target/i386/avx512f-vmovdqu32-1.c: Allow vmovdqu64 instead of
>         vmovdqu32.

Taking into account Kirill's comment, the patch is OK, although I find
a bit strange in [1] that

void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i
< 1024; i++) e[i] = f[i]; }

results in

        vmovdqu64       (%rsi,%rax), %zmm0
        vmovdqu32       %zmm0, (%rdi,%rax)

Shouldn't these two move insns be the same?

[1] http://gcc.gnu.org/ml/gcc/2014-01/msg00015.html

Thanks,
Uros.



More information about the Gcc-patches mailing list