This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.


On Fri, Nov 01, 2013 at 04:48:53PM +0000, Cong Hou wrote:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 2a5a2e1..8f5d39a 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or
>  wider than the mode of the product. The result is placed in operand 0, which
>  is of the same mode as operand 3.
> 
> +@cindex @code{ssad@var{m}} instruction pattern
> +@item @samp{ssad@var{m}}
> +@cindex @code{usad@var{m}} instruction pattern
> +@item @samp{usad@var{m}}
> +Compute the sum of absolute differences of two signed/unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their absolute difference, which
> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode
> +equal or wider than the mode of the absolute difference. The result is placed
> +in operand 0, which is of the same mode as operand 3.
> +
>  @cindex @code{ssum_widen@var{m3}} instruction pattern
>  @item @samp{ssum_widen@var{m3}}
>  @cindex @code{usum_widen@var{m3}} instruction pattern
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 4975a64..1db8a49 100644

I'm not sure I follow, and if I do - I don't think it matches what
you have implemented for i386.

>From your text description I would guess the series of operations to be:

  v1 = widen (operands[1])
  v2 = widen (operands[2])
  v3 = abs (v1 - v2)
  operands[0] = v3 + operands[3]

But if I understand the behaviour of PSADBW correctly, what you have
actually implemented is:

  v1 = widen (operands[1])
  v2 = widen (operands[2])
  v3 = abs (v1 - v2)
  v4 = reduce_plus (v3)
  operands[0] = v4 + operands[3]

To my mind, synthesizing the reduce_plus step will be wasteful for targets
who do not get this for free with their Absolute Difference step. Imagine a
simple loop where we have synthesized the reduce_plus, we compute partial
sums each loop iteration, though we would be better to leave the reduce_plus
step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate
Tree code for this.

I would prefer to see this Tree code not imply the reduce_plus.

Thanks,
James


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]