This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.


Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi All,
>
> I am trying to add support to the auto-vectorizer for complex operations where
> a  target has instructions for.
>
> The instructions I have are only available as vector instructions. The operations
> are complex addition with a rotation or complex fmla with a rotation for
> half floats, floats and doubles.
>
> They expect the complex number to be broken down and stored in vectors as
> real/img parts.  GCC already does this first part when it lowers complex numbers
> very early on in tree, so that's good.
>
> As a simple example, I am trying to get GCC to emit an internal function
> .FCOMPLEX_ADD_ROT_90 (Complex addition with a 90* rotation)
> when the target supports it.
>
> my C example is:
>
> void f90 (double complex a[N], double complex b[N], double complex c[N])
> {
>   for (int i=0; i < N; i++)
>       c[i] = a[i] + b[i] * I;
> }
>
> Which in tree looks like
>
> _3 = a_15(D) + _2;
> _12 = REALPART_EXPR <*_3>;
> _22 = IMAGPART_EXPR <*_3>;
> _5 = b_16(D) + _2;
> _6 = IMAGPART_EXPR <*_5>;
> _8 = REALPART_EXPR <*_5>;
> _10 = c_17(D) + _2;
> _4 = _12 - _6;
> _13 = _8 + _22;
> REALPART_EXPR <*_10> = _4;
> IMAGPART_EXPR <*_10> = _13;
> [...]
> 3) So I abandoned vec-patterns and instead tried to do it in
> tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree
> is created.  Matching the SLP tree is quite simple and getting it to
> emit the right SLP tree was simple enough,except that at this point
> all data references and loads have already been calculated.

(3) seems like the way to go.  Can you explain in more detail why it
didn't work?  The SLP tree after matching should look something like this:

  REALPART_EXPR <*_10> = _4;
  IMAGPART_EXPR <*_10> = _13;

  _4 = .COMPLEX_ADD_ROT_90 (_12, _8)
  _13 = .COMPLEX_ADD_ROT_90 (_22, _6)

  _12 = REALPART_EXPR <*_3>;
  _22 = IMAGPART_EXPR <*_3>;

  _8 = REALPART_EXPR <*_5>;
  _6 = IMAGPART_EXPR <*_5>;

The operands to the individual .COMPLEX_ADD_ROT_90s aren't the
operands that actually determine the associated scalar result, but
that's bound to be the case with something that includes an internal
permute.  All we're trying to describe is an operation that does the
right thing when vectorised.

If you didn't have the .COMPLEX_ADD_ROT_90 and just fell back
on mixed two-operator SLP, the final node would be in the
opposite order:

  _6 = IMAGPART_EXPR <*_5>;
  _8 = REALPART_EXPR <*_5>;

So if you're doing the matching after building the initial tree,
you'd need to swap the statements in that node so that _8 comes
first and cancel the associated load permute.  If you're doing the
matching on the fly while building the SLP tree then the subnodes
should start out in the right order.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]