This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] optabs and tree-codes for vector operations


> > <3> same as above, but the elements are extracted from two vectors:
> > vmt_a = vperm2 (vmt_x, vmt_y, vm_indx_mask)
> ...
> I would prefer 3 over 2/4.

Agreed.

> > <6> load from an unaligned address...
>
> If dannyb's alignment data can somehow make it through to the rtl
> level, we can use MODIFY_EXPR for this case too.

Agreed - there's no need for a separate tree-code if we can query the
alignment at the RTL level. We'll need to either apply daniel's analysis
again after vectorization, or set the alignment information during
vectorization for the new pointers we create. The question is indeed how to
propagate this info to the RTL. I hope it would be OK to introduce a new
tree-code for this purpose (LOADU_EXPR?) temporarily until we figure out
how to do that.

> I wonder if a better solution is an 'r' class reference that
> indicates that low bits should be dropped.  So we build
>
>            MODIFY_EXPR<ALIGN_REF<obj[index]>, value>
>
> for a store, and the opposite for a load.  Alternately, something
> more restricted like ALIGN_INDIRECT_REF, which only takes a
> pointer, rather than an entire complex lvalue.

OK. I think I'll go with ALIGN_INDIRECT_REF, since the references we create
during vectorization are always of the same form (currently (*vp)[indx],
but this can be changes to (*vp)).

> > <8> load low/high (e.g. ldl/ldr (mips), ...
>
> I would not consider this a separate case

OK, dropped.

> > <9> compute from an address the input to a vperm instruction...
> ...
> > <10> compute from an address the input to a vor instruction...
> ...
> Back to the "we're only going to use these together" argument, I
> think perhaps a better representation might be
>
>            vectr = REALIGN_LOAD_EXPR < vect1, vect2, magic >

Allowing a third argument whose "shape" is "flexible" and that can be
generated by a builtin is a good idea. The only possible reservation here
is that it's very similar to the vperm_expr that we are going to use for
other purposes, and we are not trying to reuse here. When 'magic' is a
vector (of indices) then it has exactly the same functionality as the
VEC_PERM that we are going to introduce. Maybe if we gave it a more general
name (VEC_EXTRACT? VEC_PAIR_EXTRACT? VEC_PAIR_SELECT? VEC_JOIN?
VEC_SEL_JOIN?) we could reuse the same tree-code for the different
purposes, including unaligned stores and loads.

> I'm also thinking that perhaps the "mask" versions should not use
> an optab at all, but rather a builtin function.  The reason here
> is that the form of the mask differs between systems.

Indeed. I hesitated to suggest a builtin due to past objections to using
target builtins, but I agree that it's appropriate here.

OK, so let's see what we have so far:

1. load from an aligned address ("LOADA")
   optab: mov_optab
   tree: MODIFY_EXPR (z,ref)

example: LOADA = ldqa[alpha]/ movdqa[sse]/ lvx[vmx]

2. load from a "floor aligned" address ("LOADFLOOR")
   optab (new): load_floor_optab
   tree (new): MODIFY_EXPR (z, ALIGN_INDIRECT_REF(ptr_ref))

example: LOADFLOOR = ldqu[alpha]/ lvx[vmx]

3. load from an arbitrary address ("LOADU")
   optab (new): loadu_optab
   tree: option1 (preferred): MODIFY_EXPR (z,ref),
                 in which ref has alignment info (ala dannyb)
         option2 (new tree, until above is supported): LOADU_EXPR (z,ref)

example: LOADU = ldl,ldr,or[mips]/ movdqu[sse]/
ldqu,ldqu,extql,extqh,or[alpha]/ lvx,lvx,lvsr,vperm[vmx]

4. join/merge the relevant data from two vectors, given the address of the
requested data ("JOIN")
   option1: optab (new): realign_load_optab
            tree (new): REALIGN_LOAD_EXPR (x,y,magic)
   option2: reuse vec_perm_optab, VEC_PERM_EXPR (x,y,magic) (or however
we're going to call it).

As in the case of permute, here I would also request to have two optabs -
for the case that the misalignment is unknown ("JOIN_ARBITRARY"), and for
the case that it is a constant ("JOIN_CST"). So we'll have:

4.1. "JOIN_CST": optab (new): realign_cst_load_optab?
example: extql,extqh,or[alpha]/ lvsl(addr),vperm[vmx]

4.2. "JOIN_ARBIRARY": optab (new): realign_load_optab?
example: lvsr(-addr),vperm[vmx]


I think the above answers the vectorizer needs for handling unsaligned
loads. The vectorizer will consider 2 possible vectorization schemes:

***** [scheme1]
loop start
  z = LOAD (addr)
  addr += vecsize
loop end

***** [scheme2] (software-pipelined)
x = LOADFLOOR (addr)
loop start
  y = LOADFLOOR (addr+d)
  z = JOIN (x, y, addr)
  addr += vecsize
  x = y
loop end

Notes:
1. d = (misalignment is constant) ? vecsize : vecsize - 1.
2. JOIN = (misalignment is constant) ? JOIN_CST : JOIN_ARBITRARY. (If
JOIN_ARBITARY is not supported, we can't use scheme 3 when the alignment is
unknown).

Choosing between the two schemes will be done as follows:

if (alignment == 0)
  {
     if (mov_optab)
       LOAD <-- MODIFY_EXPR
       do scheme1
     else
       can't vectorize
  }
else if (alignment is constant)
  {
     if (realign_cst_load_optab && load_floor_optab)
       LOADFLOOR <-- MODIFY_EXPR (ALIGN_INDIRECT_REF)
       JOIN <-- REALIGN_LOAD_EXPR
       do scheme2
     else if (loadu_optab)
       LOAD <-- LOADU_EXPR (or MODIFY_EXPR with alignment info)
       do scheme1
     else
       can't vectorize
  }
else /* alignment is unknown */
  {
     if (realign_load_optab && load_floor_optab)
       LOADFLOOR <-- MODIFY_EXPR (ALIGN_INDIRECT_REF)
       JOIN <-- REALIGN_LOAD_EXPR
       do scheme2
     else if (loadu_optab)
       LOAD <-- LOADU_EXPR (or MODIFY_EXPR with alignment info)
       do scheme1
     else
       can't vectorize
  }

(The design above implies that the vectorizer is only aware of whether it
can software-pipeline the loads, or not. If it can't software-pipeline, it
would generate a "LOADU" expression, and leave it to the RTL expander to
generate the required sequence. I was wondering if it would make sense to
also have a scheme that breaks the "LOADU" to "LOAD,LOAD,JOIN" when
software pipelining cannot take place, but I don't see a benefit for this).


thanks

dorit


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]