PPC64 libmvec implementation of sincos

GT tnggil@protonmail.com
Fri Dec 6 16:50:00 GMT 2019


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 6, 2019 6:38 AM, Richard Biener <richard.guenther@gmail.com> wrote:

> On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek jakub@redhat.com wrote:
>
> > On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> >
> > > So I used
> > > void sincos(double x, double *sin, double *cos);
> > > _Complex double attribute((simd("notinbranch")))
> > > __builtin_cexpi (double);
> >
> > While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
> > the reason we punt:
> > unsupported return type ‘complex double’ for simd
> > etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
> > type, I guess the vectorizer doesn't do anything with that either unless
> > some earlier optimization was able to scalarize the complex halves.
> > In theory we could represent the vector counterparts of complex types
> > as just vectors of double width with element type of COMPLEX_TYPE element
> > type, have a look at what exactly ICC does to find out if the vector
> > ordering is real0 complex0 real1 complex1 ... or
> > real0 real1 real2 ... complex0 complex1 complex2 ...
> > and tweak everything that needs to cope.
>
> I hope real0 complex0, ...
>
> Anyway, the first step is to support vectorizing code where parts of it are
> already vectors:
>
> typedef double v2df attribute((vector_size(16)));
> #define N 1024
> v2df a[N];
> double b[N];
> double c[N];
> void foo()
> {
> for (int i = 0; i < N; ++i)
> {
> v2df tem = a[i];
> b[i] = tem[0];
> c[i] = tem[1];
> }
> }
>
> that can be "re-vectorized" for AVX for example. If you substitute
> _Complex double for the vector type we only handle it during
> vectorization because forwprop combines the load and the
> __real/imag which helps.
>

Are we certain the change we want is to support _Complex double so that cexpi is auto-vectorized?
Looking at the resulting executable of the code with sincos in the loop, the only function called
is sincos. Not builtin_cexpi or any variant of cexpi. File gcc/builtins.c expands calls to builtin_cexpi
to sincos! What is gained by the compiler going through the transformations sincos -> builtin_cexpi ->
sincos?

Bert.



More information about the Gcc mailing list