[RFC] Vectorization of indexed elements

Vidya Praveen vidyapraveen@arm.com
Mon Sep 9 17:25:00 GMT 2013


Hello,

This post details some thoughts on an enhancement to the vectorizer that 
could take advantage of the SIMD instructions that allows indexed element
as an operand thus reducing the need for duplication and possibly improve
reuse of previously loaded data.

Appreciate your opinion on this. 

--- 

A phrase like this:

 for(i=0;i<4;i++)
   a[i] = b[i] <op> c[2];

is usually vectorized as:

  va:V4SI = a[0:3]
  vb:V4SI = b[0:3]
  t = c[2]
  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
  ...
  va:V4SI = vb:V4SI <op> vc:V4SI

But this could be simplified further if a target has instructions that support
indexed element as a parameter. For example an instruction like this:

  mul v0.4s, v1.4s, v2.4s[2]

can perform multiplication of each element of v2.4s with the third element of
v2.4s (specified as v2.4s[2]) and store the results in the corresponding 
elements of v0.4s. 

For this to happen, vectorizer needs to understand this idiom and treat the 
operand c[2] specially (and by taking in to consideration if the machine 
supports indexed element as an operand for <op> through a target hook or macro)
and consider this as vectorizable statement without having to duplicate the 
elements explicitly. 

There are fews ways this could be represented at gimple:

  ...
  va:V4SI = vb:V4SI <op> VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
  ...

or by allowing a vectorizer treat an indexed element as a valid operand in a 
vectorizable statement:

  ...
  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 2)
  ...

For the sake of explanation, the above two representations assumes that 
c[0:3] is loaded in vc for some other use and reused here. But when c[2] is the
only use of 'c' then it may be safer to just load one element and use it like
this:

  vc:V4SI[0] = c[2]
  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)

This could also mean that expressions involving scalar could be treated 
similarly. For example,

  for(i=0;i<4;i++)
    a[i] = b[i] <op> c

could be vectorized as:

  vc:V4SI[0] = c
  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
  
Such a change would also require new standard pattern names to be defined for
each <op>.

Alternatively, having something like this:

  ...
  vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
  va:V4SI = vb:V4SI <op> vt:V4SI
  ...

would remove the need to introduce several new standard pattern names but have
just one to represent vec_duplicate(vec_select()) but ofcourse this will expect
the target to have combiner patterns.

This enhancement could possibly help further optimizing larger scenarios such 
as linear systems.

Regards
VP





More information about the Gcc mailing list