This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC] optabs and tree-codes for vector operations
Richard Henderson wrote:
> Jim writes:
>(3) mips, apparently, results in a series of flags registers.
Yes. The vector condition move instructions use multiple flag
registers. Also, we have branches that can test multiple flag registers
in the MIPS-3D ASE.
Jim writes:
misalignment can already be handled with shifts or rotate. It isn't
clear that we need a special operator for this.
Actully, Jim, it can't, since at least x86 doesn't have a shift that
applies to the entire 128-bit quantity.
I meant that the operation can be expressed as a shift in RTL. There
would be no need for a new RTL operator to express this. Likewise for
trees. If might be useful to have a new operator for other reasons, but
it isn't required to have one.
Whether the hardware has an instruction that does this is a different
question. I can see the value of having a new named pattern for this,
so that we can generate appropriate code. We might need a new tree
operation so we can call the named pattern, but we probably still don't
need a new RTL operation.
I'm not sure where to find the MIPS SIMD stuff that's been talked
about on the list recently, but I suspect that there are no new
unaligned access instructions over the base architecture, and so
an unaligned load would be
http://www.mips.com/content/Documentation/MIPSDocumentation/ProcessorArchitecture/doclibrary
The MIPS64 Architecture and Instruction Set manuals is really all you
need. There is also other vector support, and other kinds of vectors,
in the MIPS-3D ASE and the MDMX ASE. Although I see the MDMX ASE is
still not available there.
There are luxc1/suxc1 instructions. load/store double-word indexed
unaligned to floating-point. They take reg+reg addresses, and ignore
the low 3 bits of the resulting address.
There is also an alnv.ps instruction which takes two FP regs and an
address, and then shifts depending on the low-order 3 bits of the
address and the endianness. These bits must be either 0 or 4.
Taken together, these instructions allow one to vectorize code that uses
a float array with only 4-byte alignment.
Jim wrote:
I think VSEL_MULT_EXPR is an unnecessary complication. Similarly the
lo/hi/odd/even stuff. This can all be represented with a permutation
and a multiply.
What do you mean? Certainly you could represent the operations
at the tree level with nested variants of these tree codes. I
think you'd be hard-pressed to generate the available hardware
instructions if you represented the operations this way.
I have two main concerns.
One is canonicalization. At least at the RTL level, it is important to
be able to canonicalize an expression. If we have too many operations,
then we won't be able to determine what the canonical form is. We
already have this problem at the RTL level, and we only have 4 vector
operations. Anything that can be expressed with vec_concat/vec_select
and also be expressed with vec_merge/vec_duplicate, and vice versa.
Sometimes one is more convenient than the other, but other than that,
there is no good way to determine which is the canonical representation.
I ran into this problem defining the mips p[lu][lu] instructions which
perform a combined merge/shuffle operation. I wrote the patterns both
ways, and then picked the one that was four lines shorter (62 lines vs
66 lines). I am not convinced that I have a canonical result though.
There is also the concern of how we map these tree codes into RTL insns.
Currently, we don't try to match hardware instructions until we lower
into RTL. If we have tree codes for complicated vector operations, how
do we lower them to RTL if the hardware does not have a corresponding
instruction? Worst case, we end up with ugly code that the RTL
optimizer won't be able to fix. Even if we can avoid that, we still end
up with a lot of middle end code to convert all of the vector tree codes
into RTL for all combinations of target RTL instructions, which could
get ugly.
You are right though that there is the problem of how we emit the
complicated vector instructions that some targets like altivec have. I
would suggest some kind of combine pass for vector operations in RTL,
but I don't know if that is feasible.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com