This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Cong Hou <congh at google dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, "ramana dot gcc at googlemail dot com" <ramana dot gcc at googlemail dot com>, Richard Biener <rguenther at suse dot de>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 4 Nov 2013 10:06:17 +0000
- Subject: Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.
- Authentication-results: sourceware.org; auth=none
- References: <CAFULd4Znkj7WxsP9kmng069XKXb2CWL3es4myY7tE-5JmykJFw at mail dot gmail dot com> <CAK=A3=33kpXdwCEhbPFfs4=mov0k6Z6J+O0HnBdE0fB41K7vvQ at mail dot gmail dot com> <20131101101656 dot GA1347 at arm dot com> <CAK=A3=3s+nG7CJUDAeXGARz+1_WjjzEYgwz9TK47QSvjqGoySg at mail dot gmail dot com>
On Fri, Nov 01, 2013 at 04:48:53PM +0000, Cong Hou wrote:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 2a5a2e1..8f5d39a 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or
> wider than the mode of the product. The result is placed in operand 0, which
> is of the same mode as operand 3.
>
> +@cindex @code{ssad@var{m}} instruction pattern
> +@item @samp{ssad@var{m}}
> +@cindex @code{usad@var{m}} instruction pattern
> +@item @samp{usad@var{m}}
> +Compute the sum of absolute differences of two signed/unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their absolute difference, which
> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode
> +equal or wider than the mode of the absolute difference. The result is placed
> +in operand 0, which is of the same mode as operand 3.
> +
> @cindex @code{ssum_widen@var{m3}} instruction pattern
> @item @samp{ssum_widen@var{m3}}
> @cindex @code{usum_widen@var{m3}} instruction pattern
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 4975a64..1db8a49 100644
I'm not sure I follow, and if I do - I don't think it matches what
you have implemented for i386.
>From your text description I would guess the series of operations to be:
v1 = widen (operands[1])
v2 = widen (operands[2])
v3 = abs (v1 - v2)
operands[0] = v3 + operands[3]
But if I understand the behaviour of PSADBW correctly, what you have
actually implemented is:
v1 = widen (operands[1])
v2 = widen (operands[2])
v3 = abs (v1 - v2)
v4 = reduce_plus (v3)
operands[0] = v4 + operands[3]
To my mind, synthesizing the reduce_plus step will be wasteful for targets
who do not get this for free with their Absolute Difference step. Imagine a
simple loop where we have synthesized the reduce_plus, we compute partial
sums each loop iteration, though we would be better to leave the reduce_plus
step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate
Tree code for this.
I would prefer to see this Tree code not imply the reduce_plus.
Thanks,
James