This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns
On Sat, Apr 12, 2008 at 8:24 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Apr 12, 2008 at 5:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > vzeroall isn't used for correctness. It is used for performance when an
> > AVX function is called from an SSE function. We can not optimize out
> > vzeroall if any SSE registers are used.
>
> Can you perhaps post an illustrative example for this?
>
----
bash-3.2$ cat avx.c
#include <gmmintrin.h>
extern __m128i bar (__m128i, __m256i);
__m128i
foo2 (__m128i x)
{
__m256i y = { 0 };
_mm256_zeroall ();
return bar (x, y);
}
----
When foo2 is called from a function with SSE instructions, the upper 128bits of
all AVX registers are undefined, which has a performance penalty. So we call
_mm256_zeroall to clear all AVX registers to improve performance when
the upper 128bits of all AVX registers may be undefined.
With my implementation, I got
bash-3.2$ /export/build/gnu/gcc-avx/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-avx/build-x86_64-linux/gcc/ -mavx -Wall -S
avx.c -O2
bash-3.2$ cat avx.s
.file "avx.c"
.text
.p2align 4,,15
.globl foo2
.type foo2, @function
foo2:
.LFB686:
movdqa %xmm0, -24(%rsp)
vzeroall
vpxor %xmm1, %xmm1, %xmm1
movdqa -24(%rsp), %xmm0
jmp bar
.LFE686:
.size foo2, .-foo2
That is we spill the argument 'x'. But I couldn't optimize out
vpxor %xmm1, %xmm1, %xmm1
when I tried an approach similar to yours. Instead, vzeroall was optimized out.
BTW, you can play with AVX branch. It is as stable as trunk and I am keeping
it as close to trunk as possible. The current Linux binutils 2.18.50.0.6
supports AVX.
Thanks.
H.J.