This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [rfc] new tree-codes/optabs for vectorization of non-unit-strideaccesses
- From: Dorit Naishlos <DORIT at il dot ibm dot com>
- To: Paul Brook <paul at codesourcery dot com>
- Cc: gcc at gcc dot gnu dot org, Ira Rosen <IRAR at il dot ibm dot com>, Richard Henderson <rth at redhat dot com>
- Date: Thu, 17 Nov 2005 17:56:29 +0200
- Subject: Re: [rfc] new tree-codes/optabs for vectorization of non-unit-strideaccesses
Paul Brook <paul@codesourcery.com> wrote on 11/16/2005 05:03:47 PM:
> On Wednesday 16 November 2005 14:35, Dorit Naishlos wrote:
> > We're going to commit to autovect-branch vectorization support for
> > non-unit-stride accesses.
> > We'd like to suggest a few new tree-codes/optabs in order to express
the
> > extraction and merging of elements from/to vectors.
>
> > Background:
> > ? ? ? The new functionality is going to allow us to vectorize
computations
> > with strides that are a power-of-2, like in the example below, in which
the
> > real and imaginary parts are interleaved, and therefore each of the
> > data-refs accesses data with stride 2:
> >
> > ? for (i = 0; i < n; i++) {
> > ? ? ?tmp_re = in[2*i] * coefs[2*i] - in[2*i+1] * coefs[2*i+1];
> > ? ? ?tmp_im = in[2*i] * coefs[2*i+1] + in[2*i+1] * coefs[2*i];
> > ? ? ?out[2*i] = tmp_re;
> > ? ? ?out[2*i+1] = temp_im;
> > ? }
> >
> > What is generally going to happen is that, for a VF=4, we're going to:
> >
> > (1) load this data from memory:
> > ? ? ? vec_in1 = [re0,im0,re1,im1] = vload &in
> > ? ? ? vec_in2 = [re2,im2,re3,im3] = vload &in[VF]
> > ? ? ? (and similarly for the coefs array)
> >
> > and then, because we're doing different operations on the odd and even
> > elements, we need to
> > (2) arrange them into separate vectors:
> > ? ? ? vec_in_re = [re0,re1,re2,re3] = extract_even (vec_in1, vec_in2)
> > ? ? ? vec_in_im = [im0,im1,im2,im3] = extract_odd (vec_in1, vec_in2)
> > ? ? ? (and similarly for the coefs array)
>
> Have you considered targets that support interleaved load/store
instructions?
> I'm not sure if this is supported by existing targets, but in the next
year
> there will be targets that can perform steps 1+2 in a single
load-interleaved
> instruction.
>
I don't know of existing targets that have this capability - it usually
requires explicit reordering.
Anyhow, when such a time comes, we can consider either adding a new
tree-code for that (but sounds like we're running short of tree-codes...)
or detect later on (combine?) that a {load,load,extract_even,extract_odd}
sequence can be replaced by an "interleaved_load". (I assume this
specialized load exists only for stride 2?)
thanks,
dorit
> Paul