rfc and [autovect patch] supporting reduction patterns

Thu Apr 28 17:30:00 GMT 2005

Diego Novillo <dnovillo@redhat.com> wrote on 20/04/2005 22:25:08:
> On Tue, Apr 05, 2005 at 05:37:46PM +0300, Dorit Naishlos wrote:
>
> > - option 1.2: the type of the reduction variable is always X (some
default
> > predefined by each target). e.g., always sum into 32bit accumulators
(if
> > the target defines X to be 32). This may not be suitable for targets
that
> > have multiple accumulation sizes, however, one could often support the
> > smaller-sized accumulations by truncating the final result produced by
> > wider-sized accumulations, so this could potentially suffice to cover
all
> > reduction forms a target supports. If not, then we could resort to
target
> > specific builtins for the cases we can't express with these optabs.
> >
> How often do you think we will have code that reduces into
> such different type sizes?

The question is not whether we have code that has different
reduction-type-sizes. I'm sure that across different applications we'll
find different reduction-type-sizes (for example, in communication apps I
expect we'll have summation/dot-product of shorts into int (i.e.,
reduction-type-size=32bit), in video apps we'll probably have
summation/dot-product of chars into shorts (i.e.,
reduction-type-size=16bit)). The question is what to so if we can have
codes that have reduction-type-size that is different than the
target-predefined-reduction-type-size (is we use option 1.2).

Let me try to calrify what I mean:

Say we introduce widen_sum_optab to represent an operation that sums-up
elements of type x into a variable of some wider size y. I'm trying to
define exactly what should be the size of y, or in other words what is the
mode of the result generated by widen_sum_optab for mode 'mode' (what does
widen_sum_optab->handlers[mode].insn_code generate).

Option 1.1: widen_sum_optab->handlers[mode].insn_code represents an
operation that sums up elements of size 'mode' into a result of size
'double_mode'. i.e., we can directly express the following (va is the
reduction variable):
- sum chars into a short variable:
  va_v8hi = WIDEN_SUM <vb_v16qi, va_v8hi>,
  which is expanded using widen_sum_optab->handlers[v16qi].insn_code
  (see example below)
- sum shorts into an int variable:
  va_v4si = WIDEN_SUM <vb_v8hi, va_v4si>
  which is expanded using widen_sum_optab->handlers[v8hi].insn_code
- sum int into a long long variable:
  va_v2di = WIDEN_SUM <vb_v4si, va_v2di>.
  which is expanded using widen_sum_optab->handlers[v4si].insn_code

So, for example, the following loop:

short s = 0;
char a, *p;
for (i=0, i<n; i++){
  a = *p++;
  s += a;
}

if widen_sum_optab->handlers[v16qi].insn_code != CODE_FOR_nothing,
it can be directly vectorized into:

v8hi vs = {0,0,0,0,0,0,0,0};
v16qi va, *vp;
short s;
for (i=0, i<n/16; i++){
  va = *vp++;
  vs = WIDEN_SUM <vs, va>;
}
s = extract_field < reduc_sum <vs> >;

Option 1.2: A target defines that the size of widening-reduction operations
that it supports is 32bit - i.e, we introduce a "WIDEN_REDUC_BIT_SIZE"
parameter in defaults.h and the target sets it to 32. This means that
widen_sum_optab->handlers[mode].insn_code represents an operation that sums
up elements of size 'mode' into a result of size 'WIDEN_REDUC_BIT_SIZE',
regardless what 'mode' is. So, we can express the following (again, va is
the reduction variable):
- sum chars into an int variable:
  va_v4si = WIDEN_SUM <vb_v16qi, va_v4si>,
  which is expanded using widen_sum_optab->handlers[v16qi].insn_code
- sum shorts into an int variable:
  va_v4si = WIDEN_SUM <vb_v8hi, va_v4si>
  which is expanded using widen_sum_optab->handlers[v8hi].insn_code

So, back to the example loop:

short s = 0;
char a, *p;
for (i=0, i<n; i++){
  a = *p++;
  s += a;
}

now, if widen_sum_optab->handlers[v16qi].insn_code != CODE_FOR_nothing,
we know that we can generate summation of chars into ints, so we'll do that
and then truncate the result to short:

v4si vs = {0,0,0,0};
v16qi va, *vp;
int tmp;
short s;
for (i=0, i<n/16; i++){
  va = *vp++;
  vs = WIDEN_SUM <vs, va>;
}
tmp = extract_field < reduc_sum <vs> >;
s = (short) tmp;

Maybe this - http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02226.html can
help clarify a bit too.

Is this making any sense?

thanks,

dorit