(R5900) Implementing Vector Support
Richard Henderson
rth@redhat.com
Mon Apr 4 17:53:00 GMT 2016
On 04/03/2016 09:12 PM, Woon yung Liu wrote:
> I can't figure out how to implement comparison operations (specifically,
> equals and the greater than operators). The GCC documentation mentions that
> the pattern for comparison (==) should be vec_cmp, but I don't understand
> why it has 4 operands and what they are used for.
The second operand is the comparison operator. So given
(set (reg:V4SI x) (eq:V4SI (reg:V4SI y) (reg:V4SI z))
operand 0 is x,
operand 1 is the entire (eq ...) expression,
operand 2 is y,
operand 3 is z.
This is exactly the same as the normal integer cbranch<mode> patterns.
> I've implemented it
> anyway, but GCC does not use it. I've taken a look at the rs6000 and
> Loongson ports, and they seem to be implementing their comparison operators
> with some non-standard pattern name and the pattern operations are different
> too (i.e. Loongson uses unspec, while rs6000 uses gt and eq).
rs6000 doesn't implement bare comparisons, but only implements the "vcond"
conditional move upon which uses the comparison. Many of the other targets do
the same thing.
> What happens what multiplication or division is performed a vector? For
> example: c = a * b; Whereby a and b are both V4SI vectors. What vector type
> would C be? Would it become another V4SI (meaning that the multiplication
> result is truncated) or V4DI?
It would be the truncated V4SI mode.
There are other named patterns that implement widening multiply. Which you
choose depends on how the hardware selects which operands to include in the
multiply. Let { A, B, C, D } and { W, X, Y, Z} be V4SI inputs, then
Optab Result
vec_widen_{s,u}mult_hi_<mode> { A * W, B * X }
vec_widen_{s,u}mult_lo_<mode> { C * Y, D * Z }
vec_widen_{s,u}mult_even_<mode> { A * W, C * Y }
vec_widen_{s,u}mult_odd_<mode> { B * X, D * Z }
> I also would like to ask about implementing bitwise-shifting. The R5900's
> vector-shifting instructions are like the MIPS sll, srl and sra instructions,
> whereby they use an immediate to shift all elements within the vector. Based on
> the GCC documentation, a scalar can be used, but it will be first converted
> into a similarly-sized vector
There are three different types of shifting: by a scalar (all elements shifted
by the same amount), by a vector (every element receives its own shift amount),
and full vector (shifting is not restricted to the element boundaries).
Scalar shifts: ashr<mode>3, ashl<mode>3, lshr<mode>3.
Vector shifts: vashr<mode>3, vashl<mode>3, vlshr<mode>3.
Full shifts: vec_shl_<mode>, vec_shr_<mode>.
> Finally, what should I be modifying, if I want to implement extraction and
> packing of the upper 64-bits of the 128-bit vector? Right now, GCC will just
> generate multiple shifts (i.e. dsll32) to access the upper 64-bits, which is
> not legal. This means that using any operation that requires unimplemented
> patterns will not work correctly.
You want to implement vec_init<mode>, vec_extract<mode>, and vec_set<mode>.
You also want to implement as many vec_perm_const<mode> patterns as you can.
The existing mips_expand_vec_perm_const_1 code for loongson should be a good
starting point. The most important patterns that you'll want to be sure that
you can handle are interleave, even/odd, and broadcast. These are generated by
the vectorizer. You may wish to examine the aarch64 code for additional ideas;
it all depends on what sort of instructions you have available.
r~
More information about the Gcc
mailing list