This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] optabs and tree-codes for vector operations

Dorit Naishlos wrote:
The attached html document contains the first major set of proposed
optabs/tree-codes for vectorization.

I read through this. It seems like a reasonable start. I do have some comments, but the way you have posted this makes it hard for me to quote you. Hopefully my comments will make sense.

First a general comment, that we need better definitions of what the existing vector operations do. They are ambiguous as to endianness, and some targets defining them differently for different endians, and some don't. Also, there appears to be confusion about what vec_merge means, as it means something different in the file than it does in all other md files. We need well defined definitions used consistently by all ports before we can have good optimization support for vectors.

Trees have type info, RTL does not. So we have two kinds of compare optabs to represent signed/unsigned compares, and different rtl operations for signed/unsigned compares, but we only need one tree operator to represent both.

I don't see any reason why we can't use the existing cmp_optab/ucmp_optab for vectors.

The current RTL vec_select operator takes a parallel containing a list of integers. You are defining the tree operator VSELECT_EXPR as taking a vector of condition codes, produced by a vector compare. Also, vec_select takes one vector operand, and VSELECT_EXPR takes two. These are very different operations. This could be confusing. vec_select is really more like a shuffle, whereas vec_merge is more like your VSELECT_EXPR.

The current RTL vec_select operator is really not much different than the vec_merge operator. It just rearranges elements in a different way. I have found that almost anything that can be expressed with one can be expressed with the other. Maybe redefining the RTL vec_select operator would make sense to eliminate the duplication.

This duplication at the RTL level could make optimization difficult. All RTL needs to have a canonical representation for combine and other optimizations to work, but with the current situation there is no obvious way to pick whether vec_select or vec_merge is more canonical than the other. The result is that targets may get stuck defining multiple patterns to match every way to express an operation, which is undesirable.

There is a general principle here, that you should avoid defining new RTL operations that can be easily represented by others, and that prevents us from easily determining the canonical representations. There can also be similar problems at the tree level, since optimizers need to know how to simplify something, and this can be a problem if there are multiple ways to represent something.

In MIPS, a vector compare does not return a vector mask of 0/1. It returns a vector of condition code results. We have 8 condition code registers, and if you compare an order 8 vector, it sets every condition code register. Similarly, the conditional move instructions check multiple condition code registers. The current RTL vec_merge/vec_select can not express this, because neither one can take a vector of condition code registers as input. This should be considered in the design of the vector compare support.

Your compare proposals don't seem to take the needs of FP into account. We need additional comparison operators for FP, because of unordered results. This also matters when inverting FP compares, as the opposite of < is not >=.

The VPERM_EXPR you define is similar to the vec_select rtx operation that already exists. It is confusing to have closely related tree and rtl operators with different names.

Your VMERGE_EXPR proprosal is confusing. You defined it with permutation, but the existing vec_merge RTL operator does not do permutation. Also, you didn't define how you can do the permutation if you only have one index mask. However, I see that you defined the range of a 4-element vector mask to be 0-7 which implies it can index any element of either vector. That seems confusing to me. Why not just have a separate permute and merge operations? That is what we do now at the RTL level with vec_select and vec_merge.

If a 4 element vector mask can go from 0-7, then what happens for VPERM_EXPR which has only one input vector? You need a different kind of vector mask here that only goes from 0 to 3.

misalignment can already be handled with shifts or rotate. It isn't clear that we need a special operator for this.

I think VSEL_MULT_EXPR is an unnecessary complication. Similarly the lo/hi/odd/even stuff. This can all be represented with a permutation and a multiply.

It isn't clear that we need MULT_HI_EXPR.

I suppose there is some trade off here about having tree operations that can represent VMX instructions, so they can be directly generated, versus have an optimization pass like combine create some of them after the fact. It isn't obvious without implementation experience where to draw the line.

These complicated tree operations will all need to be expanded into rtl operations for targets that don't have appropriate instructions. So there is also a trade off here about how much middle end code we need for expanding tree operations.

The saturation, packing, unpacking stuff seems pretty reasonable.
Jim Wilson, GNU Tools Support,

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]