This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns


vzeroall isn't used for correctness. It is used for performance when an
AVX function is called from an SSE function. We can not optimize out
vzeroall if any SSE registers are used.


H.J.
On Sat, Apr 12, 2008 at 7:49 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Apr 12, 2008 at 02:53:22PM +0200, Uros Bizjak wrote:
>  > Hello!
>  >
>  >> +(define_insn "avx_vzeroall"
>  >> +  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROALL)
>  >> +   (clobber (reg:V8SI XMM0_REG))
>  >> +   (clobber (reg:V8SI XMM1_REG))
>  >> +   (clobber (reg:V8SI XMM2_REG))
>  >> +   (clobber (reg:V8SI XMM3_REG))
>  >> +   (clobber (reg:V8SI XMM4_REG))
>  >> +   (clobber (reg:V8SI XMM5_REG))
>  >> +   (clobber (reg:V8SI XMM6_REG))
>  >> +   (clobber (reg:V8SI XMM7_REG))]
>  >
>  > This is not good approach, since by using unspecs, you are hiding from the
>  > compiler what the pattern actually does. I would recommend using parallel
>  > of (set (reg:V8SI XMMx_REG) (const_vector: V8SI [(const_int:SI 0)...]))),
>  > something like attached patch (modeled as V4SI version, so I was able to
>  > "test" it on SSE2):
>  >
>  > Also, by exactly defining true operation of vzeroall, IMO it does not need
>  > to be declared as volatile.
>  >
>  > Using attached patch (please note that these patterns handle REX SSE
>  > registers as well), I was able to generate vzeroall by following test:
>  >
>  > --cut here--
>  > typedef float __m128f __attribute__ ((__vector_size__ (16)));
>  >
>  > __m128f _mm_vsetzero (int i)
>  > {
>  >  register __m128f x __asm__ ("xmm5");
>  >
>  >  __builtin_ia32_vzeroall ();
>  >
>  >  //  x = (__m128f) { 0.0f, 0.0f, 0.0f, 0.0f };
>  >  return __builtin_ia32_cvtsi2ss (x, i);
>  > }
>  > --cut here--
>  >
>  > gcc -O2:
>  >
>  > .LFB2:
>  >        vzeroall
>  >        movaps  %xmm5, %xmm0
>  >        cvtsi2ss        %edi, %xmm0
>  >        ret
>  >
>  > when the third line was uncommented, gcc figured out that the dependency
>  > chain between vzeroall and cvtsi2ss was broken and since all other xmm
>  > registers were unused, it correctly removed vzeroall.
>  >
>  > Uros.
>
>  I tried it. It doesn't work as expected:
>
>  bash-3.2$ cat bad.c
>  #include <gmmintrin.h>
>
>  extern __m256i bar (__m256i);
>
>  extern __m128i bar (__m128i);
>
>  __m128i
>  foo2 (__m128i x, __m256i y)
>  {
>   _mm256_zeroall ();
>   return bar (x);
>  }
>  bash-3.2$ /export/build/gnu/gcc-avx/build-x86_64-linux/gcc/xgcc
>  -B/export/build/gnu/gcc-avx/build-x86_64-linux/gcc/ -mavx -Wall  -S
>  bad.c -O2
>  bash-3.2$ cat bad.s
>         .file   "bad.c"
>         .text
>         .p2align 4,,15
>  .globl foo2
>         .type   foo2, @function
>  foo2:
>  .LFB686:
>         jmp     bar
>  .LFE686:
>         .size   foo2, .-foo2
>
>  We don't save and restore x. Since we generate
>
>  (set (reg/v:V4SI 60 [ x ]) (reg:V4SI 21 xmm0 [ x ]))
>  (parallel [
>         (set (reg:V8SI 21 xmm0)
>  ...
>  set (reg:V4SI 21 xmm0) (reg/v:V4SI 60 [ x ]))
>
>  We optimize out vzeroall. Do you have any suggestions?
>
>  Thanks.
>
>
>  H.J.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]