PPC64 libmvec implementation of sincos

Richard Biener richard.guenther@gmail.com
Fri Dec 6 17:43:00 GMT 2019


On December 6, 2019 5:50:25 PM GMT+01:00, GT <tnggil@protonmail.com> wrote:
>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>On Friday, December 6, 2019 6:38 AM, Richard Biener
><richard.guenther@gmail.com> wrote:
>
>> On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek jakub@redhat.com wrote:
>>
>> > On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
>> >
>> > > So I used
>> > > void sincos(double x, double *sin, double *cos);
>> > > _Complex double attribute((simd("notinbranch")))
>> > > __builtin_cexpi (double);
>> >
>> > While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex
>numbers,
>> > the reason we punt:
>> > unsupported return type ‘complex double’ for simd
>> > etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE
>element
>> > type, I guess the vectorizer doesn't do anything with that either
>unless
>> > some earlier optimization was able to scalarize the complex halves.
>> > In theory we could represent the vector counterparts of complex
>types
>> > as just vectors of double width with element type of COMPLEX_TYPE
>element
>> > type, have a look at what exactly ICC does to find out if the
>vector
>> > ordering is real0 complex0 real1 complex1 ... or
>> > real0 real1 real2 ... complex0 complex1 complex2 ...
>> > and tweak everything that needs to cope.
>>
>> I hope real0 complex0, ...
>>
>> Anyway, the first step is to support vectorizing code where parts of
>it are
>> already vectors:
>>
>> typedef double v2df attribute((vector_size(16)));
>> #define N 1024
>> v2df a[N];
>> double b[N];
>> double c[N];
>> void foo()
>> {
>> for (int i = 0; i < N; ++i)
>> {
>> v2df tem = a[i];
>> b[i] = tem[0];
>> c[i] = tem[1];
>> }
>> }
>>
>> that can be "re-vectorized" for AVX for example. If you substitute
>> _Complex double for the vector type we only handle it during
>> vectorization because forwprop combines the load and the
>> __real/imag which helps.
>>
>
>Are we certain the change we want is to support _Complex double so that
>cexpi is auto-vectorized?
>Looking at the resulting executable of the code with sincos in the
>loop, the only function called
>is sincos. Not builtin_cexpi or any variant of cexpi. File
>gcc/builtins.c expands calls to builtin_cexpi
>to sincos! What is gained by the compiler going through the
>transformations sincos -> builtin_cexpi ->
>sincos?

Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL. 

Richard. 

>Bert.



More information about the Gcc mailing list