This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC] optabs and tree-codes for vector operations
- From: James E Wilson <wilson at specifixinc dot com>
- To: Dorit Naishlos <DORIT at il dot ibm dot com>
- Cc: Devang Patel <dpatel at apple dot com>, gcc at gcc dot gnu dot org
- Date: Thu, 12 Aug 2004 17:49:17 -0700
- Subject: Re: [RFC] optabs and tree-codes for vector operations
- References: <OFF27DB315.64DA68A4-ONC2256EEB.00746817-C2256EEB.0075C359@il.ibm.com>
Dorit Naishlos wrote:
The attached html document contains the first major set of proposed
optabs/tree-codes for vectorization.
I read through this. It seems like a reasonable start. I do have some
comments, but the way you have posted this makes it hard for me to quote
you. Hopefully my comments will make sense.
First a general comment, that we need better definitions of what the
existing vector operations do. They are ambiguous as to endianness, and
some targets defining them differently for different endians, and some
don't. Also, there appears to be confusion about what vec_merge means,
as it means something different in the altivec.md file than it does in
all other md files. We need well defined definitions used consistently
by all ports before we can have good optimization support for vectors.
Trees have type info, RTL does not. So we have two kinds of compare
optabs to represent signed/unsigned compares, and different rtl
operations for signed/unsigned compares, but we only need one tree
operator to represent both.
I don't see any reason why we can't use the existing
cmp_optab/ucmp_optab for vectors.
The current RTL vec_select operator takes a parallel containing a list
of integers. You are defining the tree operator VSELECT_EXPR as taking
a vector of condition codes, produced by a vector compare. Also,
vec_select takes one vector operand, and VSELECT_EXPR takes two. These
are very different operations. This could be confusing. vec_select is
really more like a shuffle, whereas vec_merge is more like your
VSELECT_EXPR.
The current RTL vec_select operator is really not much different than
the vec_merge operator. It just rearranges elements in a different way.
I have found that almost anything that can be expressed with one can
be expressed with the other. Maybe redefining the RTL vec_select
operator would make sense to eliminate the duplication.
This duplication at the RTL level could make optimization difficult.
All RTL needs to have a canonical representation for combine and other
optimizations to work, but with the current situation there is no
obvious way to pick whether vec_select or vec_merge is more canonical
than the other. The result is that targets may get stuck defining
multiple patterns to match every way to express an operation, which is
undesirable.
There is a general principle here, that you should avoid defining new
RTL operations that can be easily represented by others, and that
prevents us from easily determining the canonical representations.
There can also be similar problems at the tree level, since optimizers
need to know how to simplify something, and this can be a problem if
there are multiple ways to represent something.
In MIPS, a vector compare does not return a vector mask of 0/1. It
returns a vector of condition code results. We have 8 condition code
registers, and if you compare an order 8 vector, it sets every condition
code register. Similarly, the conditional move instructions check
multiple condition code registers. The current RTL vec_merge/vec_select
can not express this, because neither one can take a vector of condition
code registers as input. This should be considered in the design of the
vector compare support.
Your compare proposals don't seem to take the needs of FP into account.
We need additional comparison operators for FP, because of unordered
results. This also matters when inverting FP compares, as the opposite
of < is not >=.
The VPERM_EXPR you define is similar to the vec_select rtx operation
that already exists. It is confusing to have closely related tree and
rtl operators with different names.
Your VMERGE_EXPR proprosal is confusing. You defined it with
permutation, but the existing vec_merge RTL operator does not do
permutation. Also, you didn't define how you can do the permutation if
you only have one index mask. However, I see that you defined the range
of a 4-element vector mask to be 0-7 which implies it can index any
element of either vector. That seems confusing to me. Why not just
have a separate permute and merge operations? That is what we do now at
the RTL level with vec_select and vec_merge.
If a 4 element vector mask can go from 0-7, then what happens for
VPERM_EXPR which has only one input vector? You need a different kind
of vector mask here that only goes from 0 to 3.
misalignment can already be handled with shifts or rotate. It isn't
clear that we need a special operator for this.
I think VSEL_MULT_EXPR is an unnecessary complication. Similarly the
lo/hi/odd/even stuff. This can all be represented with a permutation
and a multiply.
It isn't clear that we need MULT_HI_EXPR.
I suppose there is some trade off here about having tree operations that
can represent VMX instructions, so they can be directly generated,
versus have an optimization pass like combine create some of them after
the fact. It isn't obvious without implementation experience where to
draw the line.
These complicated tree operations will all need to be expanded into rtl
operations for targets that don't have appropriate instructions. So
there is also a trade off here about how much middle end code we need
for expanding tree operations.
The saturation, packing, unpacking stuff seems pretty reasonable.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com