This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AVX] PATCH: Add vzeroall/vzeroupper patterns


On Sat, Apr 12, 2008 at 02:53:22PM +0200, Uros Bizjak wrote:
> Hello!
>
>> +(define_insn "avx_vzeroall"
>> +  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROALL)
>> +   (clobber (reg:V8SI XMM0_REG))
>> +   (clobber (reg:V8SI XMM1_REG))
>> +   (clobber (reg:V8SI XMM2_REG))
>> +   (clobber (reg:V8SI XMM3_REG))
>> +   (clobber (reg:V8SI XMM4_REG))
>> +   (clobber (reg:V8SI XMM5_REG))
>> +   (clobber (reg:V8SI XMM6_REG))
>> +   (clobber (reg:V8SI XMM7_REG))]
>
> This is not good approach, since by using unspecs, you are hiding from the 
> compiler what the pattern actually does. I would recommend using parallel 
> of (set (reg:V8SI XMMx_REG) (const_vector: V8SI [(const_int:SI 0)...]))),
> something like attached patch (modeled as V4SI version, so I was able to 
> "test" it on SSE2):
>
> Also, by exactly defining true operation of vzeroall, IMO it does not need 
> to be declared as volatile.
>
> Using attached patch (please note that these patterns handle REX SSE 
> registers as well), I was able to generate vzeroall by following test:
>
> --cut here--
> typedef float __m128f __attribute__ ((__vector_size__ (16)));
>
> __m128f _mm_vsetzero (int i)
> {
>  register __m128f x __asm__ ("xmm5");
>
>  __builtin_ia32_vzeroall ();
>
>  //  x = (__m128f) { 0.0f, 0.0f, 0.0f, 0.0f };
>  return __builtin_ia32_cvtsi2ss (x, i);
> }
> --cut here--
>
> gcc -O2:
>
> .LFB2:
>        vzeroall
>        movaps  %xmm5, %xmm0
>        cvtsi2ss        %edi, %xmm0
>        ret
>
> when the third line was uncommented, gcc figured out that the dependency 
> chain between vzeroall and cvtsi2ss was broken and since all other xmm 
> registers were unused, it correctly removed vzeroall.
>
> Uros.

I tried it. It doesn't work as expected:

bash-3.2$ cat bad.c
#include <gmmintrin.h>

extern __m256i bar (__m256i);

extern __m128i bar (__m128i);

__m128i
foo2 (__m128i x, __m256i y)
{
  _mm256_zeroall ();
  return bar (x);
}
bash-3.2$ /export/build/gnu/gcc-avx/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-avx/build-x86_64-linux/gcc/ -mavx -Wall  -S
bad.c -O2
bash-3.2$ cat bad.s
        .file   "bad.c"
        .text
        .p2align 4,,15
.globl foo2
        .type   foo2, @function
foo2:
.LFB686:
        jmp     bar
.LFE686:
        .size   foo2, .-foo2

We don't save and restore x. Since we generate

(set (reg/v:V4SI 60 [ x ]) (reg:V4SI 21 xmm0 [ x ]))
(parallel [
	(set (reg:V8SI 21 xmm0)
...
set (reg:V4SI 21 xmm0) (reg/v:V4SI 60 [ x ]))

We optimize out vzeroall. Do you have any suggestions?

Thanks.


H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]