This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: "Uros Bizjak" <ubizjak at gmail dot com>
Cc: "GCC Patches" <gcc-patches at gcc dot gnu dot org>
Date: Sat, 12 Apr 2008 08:40:33 -0700
Subject: Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
References: <4800B0C2.4090000@gmail.com> <20080412144921.GA28223@lucon.org> <6dc9ffc80804120819v1c84d8cdv43cd49652d84250e@mail.gmail.com> <5787cf470804120824m75d1f07cjee97245859955320@mail.gmail.com>

On Sat, Apr 12, 2008 at 8:24 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Apr 12, 2008 at 5:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  > vzeroall isn't used for correctness. It is used for performance when an
>  >  AVX function is called from an SSE function. We can not optimize out
>  >  vzeroall if any SSE registers are used.
>
>  Can you perhaps post an illustrative example for this?
>

----
bash-3.2$ cat avx.c
#include <gmmintrin.h>

extern __m128i bar (__m128i, __m256i);

__m128i
foo2 (__m128i x)
{
  __m256i y = { 0 };
  _mm256_zeroall ();
  return bar (x, y);
}
----

When foo2 is called from a function with SSE instructions, the upper 128bits of
all AVX registers are undefined, which has a performance penalty. So we call
_mm256_zeroall to clear all AVX registers to improve performance when
the upper 128bits of all AVX registers may be undefined.

With my implementation, I got

bash-3.2$  /export/build/gnu/gcc-avx/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-avx/build-x86_64-linux/gcc/ -mavx -Wall  -S
avx.c -O2
bash-3.2$ cat avx.s
        .file   "avx.c"
        .text
        .p2align 4,,15
.globl foo2
        .type   foo2, @function
foo2:
.LFB686:
        movdqa  %xmm0, -24(%rsp)
        vzeroall
        vpxor   %xmm1, %xmm1, %xmm1
        movdqa  -24(%rsp), %xmm0
        jmp     bar
.LFE686:
        .size   foo2, .-foo2

That is we spill the argument 'x'. But I couldn't optimize out

vpxor   %xmm1, %xmm1, %xmm1

when I tried an approach similar to yours. Instead, vzeroall was optimized out.

BTW, you can play with AVX branch. It is as stable as trunk and I am keeping
it as close to trunk as possible.  The current Linux binutils 2.18.50.0.6
supports AVX.

Thanks.

H.J.

Follow-Ups:
- Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
  - From: Uros Bizjak

References:
- Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
  - From: Uros Bizjak
- Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
  - From: H.J. Lu
- Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
  - From: H.J. Lu
- Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
  - From: Uros Bizjak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]