This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: rfc and [autovect patch] supporting reduction patterns
- From: Dorit Naishlos <DORIT at il dot ibm dot com>
- To: Richard Henderson <rth at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 21 Apr 2005 01:39:55 +0300
- Subject: Re: rfc and [autovect patch] supporting reduction patterns
- Reply-to:
- Sensitivity:
> On Tue, Apr 05, 2005 at 05:37:46PM +0300, Dorit Naishlos wrote:
> > I think option 1 (generic) is suitable here because the idioms in
question
> > are general and pretty common, and this way we can avoid code
duplication
> > between different targets.
>
> What idioms, precisely? Because I don't see anything of the kind for
> any of the Intel architectures. Before I accept the assertion that
> they are "pretty common" I would like to know that they actually ocurr
> on more than just Altivec.
Sure - the idioms I'm talking about are average, dot-product, sum of
absolute differences - all of which are supported by MMX/SSE (pavgb/pavgw,
pmaddw, psadbw) and generated by the icc compiler (also using idiom
recognition).
>
> > If we go with option 1 (generic), there are still a couple of
alternatives
> > for how to define the semantics of the new optabs:
> >
> > - option 1.1: the type-size of the reduction variable (which is also
the
> > result of the computation) is exactly double the type-size of the
reduction
> > arguments. i.e, we can express summation of QI into HI, but we can't
> > express summation of QI into SI. We can solve this by either
introducing an
> > additional tree-code&optab for "wider_widen_sum" (for which the
type-size
> > of the reduction variable is 4 times the type-size of the other
arguments),
> > or, leave the wider reduction forms for target specific builtins.
> >
> > - option 1.2: the type of the reduction variable is always X (some
default
> > predefined by each target). e.g., always sum into 32bit accumulators
(if
> > the target defines X to be 32). This may not be suitable for targets
that
> > have multiple accumulation sizes, however, one could often support the
> > smaller-sized accumulations by truncating the final result produced by
> > wider-sized accumulations, so this could potentially suffice to cover
all
> > reduction forms a target supports. If not, then we could resort to
target
> > specific builtins for the cases we can't express with these optabs.
>
> I am aware of some targets that have, e.g. 40 bit accumulators. But
> these are scalars, not vectors, and so don't really apply in your case.
>
I'm talking about operations like intel's psadbw which operates on 8bit
elements producing a 16bit result, or pmaddw which operates on 32bit
elements producing 64bit results. These are examples of what I was
referring to as "widening reduction" forms. In these cases the size of the
result (which I was also referring to as the "accumulator" or the
"reduction variable") is exactly double the size of the reduction arguments
(e.g. 8bit to 16bit), but this is not the case always. You may have support
for widening-reduction that operates on 8bit elements and produces 32bit
results, e.g. Altivec's vmsumubm which sums products of 8bit elements into
32bit results. It seems like option 1.1 above is more suitable for MMX/SSE,
whereas option 1.2 is more suitable for Altivec: Altivec seems to always
widen/sum into 32bit (whether the data is 8bit or 16bit). SSE/MMX seems to
usually widen into double the size of the data - e.g. psadw (8bit to
16bit), pmadw (32bit to 64bit) (this is by the way what I meant by "targets
that have multiple accumulation sizes" - targets, like intel, in which
different summation-like operations widen into different type-sizes. I
realize now how unclear this phrasing was).
thanks
dorit
>
> r~