This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch][vectorizer, SPU, PPC] Support load permutation in loop-aware SLP


Ira Rosen/Haifa/IBM wrote on 21/08/2008 14:21:41:

> Dorit Nuzman/Haifa/IBM wrote on 08/08/2008 18:06:46:
>
> > I have a problem with the fact that this specific permutation is so
> > hard-coded into the analysis. It's ok to support only one
> > permutation as a start, but the analysis itself should be general.
> > Hopefully this could be rewritten to identify more general patterns
> > during the analysis, represent the identified permutation somehow
> > (e.g. [3,2,1,0]), and then decide if we can proceed to vectorize
> it or not.
>
> I changed the analysis part, so now during the SLP tree construction
> we only store the permutation, and check if the permutation is
> supported afterwards. I am attaching the updated (not fully tested)
> analysis part of the patch.
>

great, thanks! (when you ci this patch, maybe add a couple testcases for
permutations that are not yet supported).

(small question/request: can you please document what's the difference
between vect_supported_slp_permutation_p  and
vect_supported_load_permutation_p?)

...
> > I also have a problem with the transformation: it assumes a very
> > specific form of permute at the gimple level - a permute that takes
> > two vectors as input and a byte mask. I don't think this is a
> > general enough representation (I don't think that the SSE shuffles
> > take a byte mask for example?).
> > We need to think of a more general
> > way to represent a permute at this level, and maybe have a target
> > specific builtin expand it using byte mask when appropriate.
>
> AFAIK, SSE5 permute does take two vectors as input and a byte mask.
> But the mask is not similar to altivec/spu mask.

Intels SSE/AVX shuffle/permute insns (e.g. pshuf*, vpermil2*) have 8-bit
control fields per element (rather than per byte), and some of these insns
shufle/permute elements only from a single input vector (rather than
two) ...

> Maybe I can create
> an element mask at the tree level and leave the correct mask
> creation to the target (builtin)?),
>

yes, I think the mask creation should be done on a target specific basis;
the vectorizer could create a control mask given as a vector of indices per
element.

I guess we can start by introducing a 2-operand permute (this is what the
vectorizer would currently know how to use), but it may be useful to
consider a single operand permute (+ control mask) later on.

> >
> > Actually, I think the particular testcase you are targeting could be
> > vectorized by preparing an appropriate vector of constants instead
> > of working so hard on permuting the loads. Maybe we can try
> > something like that for now (and potentially defer the decision on a
> > representation of permute to a separate patch (and testcase)?)
>
> I don't think this will work. If we only permute the constants, we
> can't get the multiples in the correct order and we will have to
> permute them anyway:
> yi = M00 * ri + M01 * gi + M02 * bi
> ui = M11 * gi + M12 * bi + M10 * ri
> vi = M22 * bi + M20 * ri + M21 * gi
> (we have gbr and brg in the second an third columns instead of rgb).
>
> In case that the number of the grouped statements is smaller than
> the vector size (as in the rgb conversion), we need to unroll the
> loop, and then such permutation will be done across several vectors
> and will be as painful as the load permutation.
>

ok. In a separate followup patch we could look into optimizing cases in
which the group size is equal to the vector size (like rgba).

thanks,
dorit

> Thanks,
> Ira
>
> [attachment "slp-perm-updated.txt" deleted by Dorit Nuzman/Haifa/IBM]


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]