This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.


On Mon, Nov 4, 2013 at 2:06 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Fri, Nov 01, 2013 at 04:48:53PM +0000, Cong Hou wrote:
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 2a5a2e1..8f5d39a 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
>> Operand 3 is of a mode equal or
>>  wider than the mode of the product. The result is placed in operand 0, which
>>  is of the same mode as operand 3.
>>
>> +@cindex @code{ssad@var{m}} instruction pattern
>> +@item @samp{ssad@var{m}}
>> +@cindex @code{usad@var{m}} instruction pattern
>> +@item @samp{usad@var{m}}
>> +Compute the sum of absolute differences of two signed/unsigned elements.
>> +Operand 1 and operand 2 are of the same mode. Their absolute difference, which
>> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode
>> +equal or wider than the mode of the absolute difference. The result is placed
>> +in operand 0, which is of the same mode as operand 3.
>> +
>>  @cindex @code{ssum_widen@var{m3}} instruction pattern
>>  @item @samp{ssum_widen@var{m3}}
>>  @cindex @code{usum_widen@var{m3}} instruction pattern
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index 4975a64..1db8a49 100644
>
> I'm not sure I follow, and if I do - I don't think it matches what
> you have implemented for i386.
>
> From your text description I would guess the series of operations to be:
>
>   v1 = widen (operands[1])
>   v2 = widen (operands[2])
>   v3 = abs (v1 - v2)
>   operands[0] = v3 + operands[3]
>
> But if I understand the behaviour of PSADBW correctly, what you have
> actually implemented is:
>
>   v1 = widen (operands[1])
>   v2 = widen (operands[2])
>   v3 = abs (v1 - v2)
>   v4 = reduce_plus (v3)
>   operands[0] = v4 + operands[3]
>
> To my mind, synthesizing the reduce_plus step will be wasteful for targets
> who do not get this for free with their Absolute Difference step. Imagine a
> simple loop where we have synthesized the reduce_plus, we compute partial
> sums each loop iteration, though we would be better to leave the reduce_plus
> step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate
> Tree code for this.

What do you mean when you use "synthesizing" here? For each pattern,
the only synthesized operation is the one being returned from the
pattern recognizer. In this case, it is USAD_EXPR. The recognition of
reduce sum is necessary as we need corresponding prolog and epilog for
reductions, which is already done before pattern recognition. Note
that reduction is not a pattern but is a type of vector definition. A
vectorization pattern can still be a reduction operation as long as
STMT_VINFO_RELATED_STMT of this pattern is a reduction operation. You
can check the other two reduction patterns: widen_sum_pattern and
dot_prod_pattern for reference.

Thank you for your comment!


Cong

>
> I would prefer to see this Tree code not imply the reduce_plus.
>
> Thanks,
> James
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]