This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RFC] Make the function vectorizer capable of doing type transformations
- From: Richard Guenther <rguenther at suse dot de>
- To: Dorit Nuzman <DORIT at il dot ibm dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 31 Jan 2007 11:52:58 +0100 (CET)
- Subject: Re: [PATCH][RFC] Make the function vectorizer capable of doing type transformations
- References: <OFCE8BECCA.B4132B38-ONC2257274.0036A65B-C2257274.0039A111@il.ibm.com>
On Wed, 31 Jan 2007, Dorit Nuzman wrote:
> >
> > This enhances the function vectorizer to handle functions with
> > differing result and argument types.
> >
> > RFC because the code needs a cleanup.
> >
> > This enables us to vectorize
> >
> > int a[256];
> > float b[256];
> > long lrintf (float);
> > void foo(void)
> > {
> > int i;
> > for (i=0; i<256; ++i)
> > {
> > a[i] = lrintf (b[i]);
> > }
> > }
> >
> > on 32bit SSE2.
> >
>
> looks good to me. and it also addresses the problem I mentioned here -
> http://gcc.gnu.org/ml/gcc-patches/2007-01/msg02088.html:
>
> "By the way - this (supporting the case that ncopies>1) is something that
> is
> also missing in vectorizable_call ... I'm travelling next week, but could
> provide a patch to add the required support in the following week. "
Yes, indeed
> So this patch takes of that. You may want to add a testcase that checks
> that. E.g. something like:
>
> int a[256];
> float b[256];
> long lrintf (float);
> void foo(void)
> {
> int i;
> for (i=0; i<256; ++i)
> {
> a[i] = lrintf (b[i]);
> char_arr[i] = 0;
> }
> }
sure, I have a bunch of testcases that I did not include in the patch yet.
> A few small questions/comments:
>
> > +
> > + nargs++;
> > + if (nargs >= 2)
> > + return false;
> > + }
>
> any inherent problem behind this check, or just restricting (FORNOW?) to
> the certain function-calls you expect to see? (which is fine, just
> wondering)
It's laziness - but also I don't expect vectorizable calls with more
than 2 parameters (I'll add a comment clarifying that).
Note that all the analysis stuff needs to go to vectorizable_function ()
(or I rather am going to merge vectorizable_function and
vectorizable_call).
> > + case BUILT_IN_LRINT:
> > + if (out_mode == SImode && out_n == 2
> > + && in_mode == DFmode && in_n == 2)
> > + return ix86_builtins[IX86_BUILTIN_CVTPD2PI];
> > + return NULL_TREE;
>
> (I assume you'll have a testcase for each of those?)
Only the BUILT_IN_LRINTF case on i?86 will ever trigger - with for
example cvtpd2pi which converts to a sse1 register, the vectorizer
does not consider using the V2SI sse1 vector type for the result so
we have the same problem as with cvtpd2dq. I'll leave the cases
that don't trigger out for now - they were in for testing, but I
didn't manage to get the vectorizer use V2SI ;)
> > + /* Only handle the case of vectors with the same number of elements.
> > + FIXME: We need a way to handle for example the SSE2 cvtpd2dq
> > + instruction which converts V2DFmode to V4SImode but only
> > + using the lower half of the V4SImode result. */
> > + if (TYPE_VECTOR_SUBPARTS (vectype_in) != TYPE_VECTOR_SUBPARTS
> > (vectype_out))
>
> yes. this requires similar functionality to the one that vectorizes
> v2di->v4si in vectorizable_demotion, expect we need a different idiom
> instead of the vec_pack/unpack to "convert-and-unpack" 4 doubles (organized
> in 2 regs) into 4 ints (some target hook maybe?).
We also need to somehow tell the vectorizer that the function call
we want to vectorize needs this. Or it might be able to tell by itself
seeing a V2DF -> V4SI conversion - I'll look into vec_pack/unpack to
see if I can teach the function vectorizer to do it magically.
One other problem is that on x86_64 long is 64bits, so the prototypes
for lrint would require V2DF -> V2DI conversion which is also not
available (there's only the scalar variant DF -> DI). But I guess
that's better handled by earlier recognizing the case we have
(int)lrint(x) and converting this to an internal si_lrint(x) call.
> Could you please also add a testcase for this (with xfail?)
Yes, I'll do that.
Thanks,
Richard.