The following should be vectorized with proper -mveclib: float xf[1024]; float sf[1024]; float cf[1024]; void foo (void) { int i; for (i = 0; i < 1024; ++i) { sf[i] = __builtin_sinf (xf[i]); cf[i] = __builtin_cosf (xf[i]); } } double xd[1024]; double sd[1024]; double cd[1024]; void bar (void) { int i; for (i = 0; i < 1024; ++i) { sd[i] = __builtin_sin (xd[i]); cd[i] = __builtin_cos (xd[i]); } }
The middle-end presents the vectorizer with <bb 3>: # i_13 = PHI <i_7(4), 0(2)> # ivtmp.26_8 = PHI <ivtmp.26_16(4), 1024(2)> D.1623_3 = xd[i_13]; sincostmp.21_1 = __builtin_cexpi (D.1623_3); D.1624_4 = IMAGPART_EXPR <sincostmp.21_1>; sd[i_13] = D.1624_4; D.1625_6 = REALPART_EXPR <sincostmp.21_1>; cd[i_13] = D.1625_6; i_7 = i_13 + 1; ivtmp.26_16 = ivtmp.26_8 - 1; if (ivtmp.26_16 != 0) goto <bb 4>; else goto <bb 5>; which has first of all complex types (they should be recognized as V2DF with vectorization factor 1, thus SLP-able). For the float case <bb 3>: # i_13 = PHI <i_7(4), 0(2)> # ivtmp.6_8 = PHI <ivtmp.6_16(4), 1024(2)> D.1610_3 = xf[i_13]; sincostmp.1_1 = __builtin_cexpif (D.1610_3); D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>; sf[i_13] = D.1611_4; D.1612_6 = REALPART_EXPR <sincostmp.1_1>; cf[i_13] = D.1612_6; i_7 = i_13 + 1; ivtmp.6_16 = ivtmp.6_8 - 1; if (ivtmp.6_16 != 0) goto <bb 4>; else goto <bb 5>; they should be V2SF, thus use V4SF and vectorization factor 2. Still use SLP probably.
pr40770.c:20: note: ==> examining statement: sincostmp.21_1 = __builtin_cexpi (D.1625_3); pr40770.c:20: note: get vectype for scalar type: complex double pr40770.c:20: note: not vectorized: unsupported data-type complex double make_vector_type returns NULL for this type.
Subject: Re: Vectorization of complex types, vectorization of sincos missing On Thu, 16 Jul 2009, irar at il dot ibm dot com wrote: > ------- Comment #2 from irar at il dot ibm dot com 2009-07-16 12:29 ------- > pr40770.c:20: note: ==> examining statement: sincostmp.21_1 = __builtin_cexpi > (D.1625_3); > pr40770.c:20: note: get vectype for scalar type: complex double > pr40770.c:20: note: not vectorized: unsupported data-type complex double > > make_vector_type returns NULL for this type. Yes - there is no vector type for complex double. But the vectorizer could query for a vector type for the complex component type (double) and divide the vector element count by 2 (for complex) to get the vectorization factor which would be 1 here. Should SLP the be possible for that loop? Richard.
(In reply to comment #3) > Yes - there is no vector type for complex double. But the vectorizer > could query for a vector type for the complex component type (double) > and divide the vector element count by 2 (for complex) to get the > vectorization factor which would be 1 here. I do not know much about this, but wouldn't that fail if one wants to vectorize true complex functions such as ccosf (assuming that they are in principle vectorizable)?
Subject: Re: Vectorization of complex types, vectorization of sincos missing On Thu, 16 Jul 2009, burnus at gcc dot gnu dot org wrote: > ------- Comment #4 from burnus at gcc dot gnu dot org 2009-07-16 13:32 ------- > (In reply to comment #3) > > Yes - there is no vector type for complex double. But the vectorizer > > could query for a vector type for the complex component type (double) > > and divide the vector element count by 2 (for complex) to get the > > vectorization factor which would be 1 here. > > I do not know much about this, but wouldn't that fail if one wants to vectorize > true complex functions such as ccosf (assuming that they are in principle > vectorizable)? Well, for ccosf we would have a vectorization factor of 2 left for V4SF. Of course this assumes that we present the vectorizer with a vectorized ccosf with the signature v4sf (*)(v4sf). Or we would need to introduce complex vector modes - which I'd rather avoid. Richard.
(In reply to comment #3) > > make_vector_type returns NULL for this type. > Yes - there is no vector type for complex double. But the vectorizer > could query for a vector type for the complex component type (double) > and divide the vector element count by 2 (for complex) to get the > vectorization factor which would be 1 here. I see. > Should SLP the be possible > for that loop? Not with the current implementation - SLP needs strided stores to start. Here the stores are not even adjacent. I think, it would be better to vectorize this loop with regular loop-based vectorization to avoid permutations. I'll take a better look on Sunday. Ira > Richard.
AFAIU, querying for the component type of complex type is not difficult to implement. I think, that loop-based vectorization is preferable here, so we should stay with vectorization factor of 2 for doubles. The next problem is to vectorize D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>; and D.1612_6 = REALPART_EXPR <sincostmp.1_1>; Currently, we support only loads and stores with IMAGPART/REALPART_EXPR, vectorizing them as strided accesses, with extract odd and even operations for loads. So, we will have to support interleaving of non-memory variables. Does __builtin_cexpi have a vector implementation? If so, does it return two vectors? If not, I guess, we need something like: sincostmp.1 = __builtin_cexpi (xd[i]); sincostmp.2 = __builtin_cexpi (xd[i+1]); v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2); v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2); sf[i:i+1] = v1; cf[i:i+1] = v2; i = i + 2; Or we can use the two vectors from vectorized __builtin_cexpi as parameters of extract operations. Does that make sense? Ira
(In reply to comment #7) > Does __builtin_cexpi have a vector implementation? If so, does it return two > vectors? No, only vectorized sincos is implemented (see also links at PR40766, Comment #11).
Subject: Re: Vectorization of complex types, vectorization of sincos missing On Mon, 20 Jul 2009, irar at il dot ibm dot com wrote: > > > ------- Comment #7 from irar at il dot ibm dot com 2009-07-20 11:18 ------- > AFAIU, querying for the component type of complex type is not difficult to > implement. > I think, that loop-based vectorization is preferable here, so we should stay > with vectorization factor of 2 for doubles. > > The next problem is to vectorize > D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>; > and > D.1612_6 = REALPART_EXPR <sincostmp.1_1>; > > Currently, we support only loads and stores with IMAGPART/REALPART_EXPR, > vectorizing them as strided accesses, with extract odd and even operations for > loads. So, we will have to support interleaving of non-memory variables. > > Does __builtin_cexpi have a vector implementation? If so, does it return two > vectors? No, currently cexpi doesn't have a vectorized version. We could add an internal builtin for that that takes a vector as argument and returns a vector with complex components. And lower this during expansion to a suitable available form (eventually just two calls). > If not, I guess, we need something like: > > sincostmp.1 = __builtin_cexpi (xd[i]); > sincostmp.2 = __builtin_cexpi (xd[i+1]); > v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2); > v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2); > sf[i:i+1] = v1; > cf[i:i+1] = v2; > i = i + 2; Yes, that was my initial idea. > Or we can use the two vectors from vectorized __builtin_cexpi as parameters of > extract operations. > Does that make sense? Yes, I think so. With a vectorized builtin we'd have v0 = xd[i:i+1]; sincostmp.1 = __builtin_vect_cexpi (v0); v1 = VEC_EXTRACT_EVEN (sincostmp.1[0], sincostmp.1[1]); v2 = VEC_EXTRACT_ODD (sincostmp.1[0], sincostmp.1[1]); sf[i:i+1] = v1; cf[i:i+1] = v2; i = i + 2; where sincostmp.1[0] would select the lower half of a V4DF and sincostmp.1[1] the upper half of a V4DF. But that's probably more difficult as we'd have both V2DF and V4DF in the IL. Richard.
resurrecting this: just checked with gcc version 4.7.0 20110910 -mveclibabi=svml -LwhereverYouhaveIntelSoftware/linux/x86_64/Compiler/11.1/072/lib/intel64/ -lsvml -lirc and sincos does not vectorize nor emits svml symbols (sin, cos independently are ok)
*** Bug 50534 has been marked as a duplicate of this bug. ***
*** Bug 97149 has been marked as a duplicate of this bug. ***
(In reply to Richard Biener from comment #12) > *** Bug 97149 has been marked as a duplicate of this bug. *** Will you look at this?
On Tue, 22 Sep 2020, kangshan0910 at hotmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770 > > --- Comment #13 from ? ? <kangshan0910 at hotmail dot com> --- > (In reply to Richard Biener from comment #12) > > *** Bug 97149 has been marked as a duplicate of this bug. *** > > Will you look at this? glibc needs to think of an ABI for OMP SIMD for this, then we can autovectorize. Alternatively we could of course use vector sin and vector cos separately which might be better than not vectorizing it at all?
OpenMP has a way to express this, #pragma omp declare simd notinbranch linear(y,z) double sincos (double x, double *y, double *z); As can be seen on -O2 -fopenmp-simd: #pragma omp declare simd notinbranch linear(y, z) double foo (double x, double *y, double *z); double a[1024], b[1024], c[1024]; void bar (void) { #pragma omp simd for (int i = 0; i < 1024; i++) foo (a[i], &b[i], &c[i]); } it works fine with that. But if #pragma omp simd is commented out, it is not, because the compiler has no assurance on what the function call behavior will be e.g. for data reference analysis (it could e.g. modify the global vars). Now, for sincos proper, the compiler knows what the function does if not -fno-builtin-sincos. Unfortunately, I think glibc currently defines sincos in math-vector.h the unuseful way with just simd ("notinbranch") attribute, which effectively means that the caller is supposed to pass vectors of pointers (well integers with those sizes) and so needs scatter under the hood. So the best thing would be to change glibc first.
(In reply to Jakub Jelinek from comment #15) > Unfortunately, I think glibc currently defines sincos in math-vector.h the > unuseful way with just simd ("notinbranch") attribute, which effectively > means > that the caller is supposed to pass vectors of pointers (well integers with > those sizes) and so needs scatter under the hood. > So the best thing would be to change glibc first. What exactly do we need to change? Factor out the declaration of sincos, so that we can apply “linear(__p1, __p2)” to it? Should glibc provide a cexpi function? The complex result would avoid having to go through memory at all.
IIRC cepxi has the complication that we have to define the mangling for vector _Complex double. But cexpi is what the vectorizer currently sees, of course (we can of course vectorize it as sincos with arguments in memory again). So yeah, glibc is lacking here. Can one actually define multiple OMP SIMD variants for a single function?
(In reply to Florian Weimer from comment #16) > (In reply to Jakub Jelinek from comment #15) > > Unfortunately, I think glibc currently defines sincos in math-vector.h the > > unuseful way with just simd ("notinbranch") attribute, which effectively > > means > > that the caller is supposed to pass vectors of pointers (well integers with > > those sizes) and so needs scatter under the hood. > > So the best thing would be to change glibc first. > > What exactly do we need to change? Factor out the declaration of sincos, so > that we can apply “linear(__p1, __p2)” to it? It could provide #pragma omp declare simd linear(__p1, __p2) double sincos (double __p0, double *__p1, double *__p2); prototypes and assembly definition for that function. Note, right now the simd attribute syntax doesn't allow specifying linear arguments, there have been talks about extending the syntax, but it hasn't been finished yet. Looking at sysdeps/x86_64/fpu/*sincos*.S I actually see the correct symbols there already, just they aren't exported from the library (the symbols with *l4l4*, *l8l8* etc. in there). So I guess we first need to extend the simd attribute syntax, then change glibc to use it and export those, and finally benefit. > Should glibc provide a cexpi function? The complex result would avoid having > to go through memory at all. Not sure if that would help, because vectorization of complex types doesn't really work well in GCC unless they are lowered to scalar ops before vectorization.
(In reply to Richard Biener from comment #17) > Can one actually define multiple OMP SIMD variants for a single function? Yes.
Created attachment 49255 [details] test source file
(In reply to Jakub Jelinek from comment #15) > OpenMP has a way to express this, > #pragma omp declare simd notinbranch linear(y,z) > double sincos (double x, double *y, double *z); > As can be seen on -O2 -fopenmp-simd: > #pragma omp declare simd notinbranch linear(y, z) > double foo (double x, double *y, double *z); > > double a[1024], b[1024], c[1024]; > > void > bar (void) > { > #pragma omp simd > for (int i = 0; i < 1024; i++) > foo (a[i], &b[i], &c[i]); > } > it works fine with that. But if #pragma omp simd is commented out, it is > not, because the compiler has no assurance on what the function call > behavior will be e.g. for data reference analysis (it could e.g. modify the > global vars). > Now, for sincos proper, the compiler knows what the function does if not > -fno-builtin-sincos. > > Unfortunately, I think glibc currently defines sincos in math-vector.h the > unuseful way with just simd ("notinbranch") attribute, which effectively > means > that the caller is supposed to pass vectors of pointers (well integers with > those sizes) and so needs scatter under the hood. > So the best thing would be to change glibc first. Thanks for your reply.(In reply to Jakub Jelinek from comment #15) > OpenMP has a way to express this, > #pragma omp declare simd notinbranch linear(y,z) > double sincos (double x, double *y, double *z); > As can be seen on -O2 -fopenmp-simd: > #pragma omp declare simd notinbranch linear(y, z) > double foo (double x, double *y, double *z); > > double a[1024], b[1024], c[1024]; > > void > bar (void) > { > #pragma omp simd > for (int i = 0; i < 1024; i++) > foo (a[i], &b[i], &c[i]); > } > it works fine with that. But if #pragma omp simd is commented out, it is > not, because the compiler has no assurance on what the function call > behavior will be e.g. for data reference analysis (it could e.g. modify the > global vars). > Now, for sincos proper, the compiler knows what the function does if not > -fno-builtin-sincos. > > Unfortunately, I think glibc currently defines sincos in math-vector.h the > unuseful way with just simd ("notinbranch") attribute, which effectively > means > that the caller is supposed to pass vectors of pointers (well integers with > those sizes) and so needs scatter under the hood. > So the best thing would be to change glibc first. Thanks for your reply. I tried your method. But it still doesn't work for sincos. #pragma omp declare simd notinbranch linear(y, z) void sincos(double x, double *y, double *z); for (unsigned int i = 0; i < loopCount; i++) { #pragma omp simd for (unsigned int j = i * dim; j < (i + 1) * dim; j++) { sincos(input_array[j], &result_array[j], &result_array1[j]); } } $ gcc -m64 -g -O2 -fopenmp-simd -fPIC -Wall test_sincos.c -o test_sincos -lm -march=skylake-avx512 -ffast-math -ftree-loop-vectorize $ ldd test_sincos linux-vdso.so.1 (0x00007ffd2393f000) libm.so.6 => /lib64/libm.so.6 (0x00007f574f49d000) libc.so.6 => /lib64/libc.so.6 (0x00007f574f2a2000) /lib64/ld-linux-x86-64.so.2 (0x00007f574f601000) My gcc version is "10.1.1 20200507". And I have uploaded my test source file.
(In reply to 康 珊 from comment #21) > $ gcc -m64 -g -O2 -fopenmp-simd -fPIC -Wall test_sincos.c -o test_sincos -lm > -march=skylake-avx512 -ffast-math -ftree-loop-vectorize Try adding -fno-builtin-sincos here > $ ldd test_sincos > linux-vdso.so.1 (0x00007ffd2393f000) > libm.so.6 => /lib64/libm.so.6 (0x00007f574f49d000) > libc.so.6 => /lib64/libc.so.6 (0x00007f574f2a2000) > /lib64/ld-linux-x86-64.so.2 (0x00007f574f601000) > My gcc version is "10.1.1 20200507". > And I have uploaded my test source file.
It seems that gcc will use "_ZGVeN8vl8l8_sincos" which doesn't exit in libmvec. /usr/bin/ld: /tmp/ccTYyEZM.o: in function `main': /home/pnp/ks/test_sincos.c:38: undefined reference to `_ZGVeN8vl8l8_sincos' /usr/bin/ld: /home/pnp/ks/test_sincos.c:38: undefined reference to `_ZGVeN8vl8l8_sincos' collect2: error: ld returned 1 exit status I'm using clear linux the glibc of which is 2.31 and I also tried Ubuntu 19.10 the glibc of which is 2.30.
On September 22, 2020 1:14:48 PM GMT+02:00, kangshan0910 at hotmail dot com <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770 > >--- Comment #23 from 康 珊 <kangshan0910 at hotmail dot com> --- >It seems that gcc will use "_ZGVeN8vl8l8_sincos" which doesn't exit in >libmvec. >/usr/bin/ld: /tmp/ccTYyEZM.o: in function `main': >/home/pnp/ks/test_sincos.c:38: undefined reference to >`_ZGVeN8vl8l8_sincos' >/usr/bin/ld: /home/pnp/ks/test_sincos.c:38: undefined reference to >`_ZGVeN8vl8l8_sincos' >collect2: error: ld returned 1 exit status That's expected and the thing that needs to be fixed. >I'm using clear linux the glibc of which is 2.31 and I also tried >Ubuntu 19.10 >the glibc of which is 2.30.
(In reply to rguenther@suse.de from comment #24) > On September 22, 2020 1:14:48 PM GMT+02:00, kangshan0910 at hotmail dot com > <gcc-bugzilla@gcc.gnu.org> wrote: > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770 > > > >--- Comment #23 from 康 珊 <kangshan0910 at hotmail dot com> --- > >It seems that gcc will use "_ZGVeN8vl8l8_sincos" which doesn't exit in > >libmvec. > >/usr/bin/ld: /tmp/ccTYyEZM.o: in function `main': > >/home/pnp/ks/test_sincos.c:38: undefined reference to > >`_ZGVeN8vl8l8_sincos' > >/usr/bin/ld: /home/pnp/ks/test_sincos.c:38: undefined reference to > >`_ZGVeN8vl8l8_sincos' > >collect2: error: ld returned 1 exit status > > That's expected and the thing that needs to be fixed. > > >I'm using clear linux the glibc of which is 2.31 and I also tried > >Ubuntu 19.10 > >the glibc of which is 2.30. But it has _ZGVeN8vvv_sincos(__m512d, __m512i, __m512i), why gcc not use this call?