[PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

Wed Nov 20 18:59:00 GMT 2013

Ping...

thanks,
Cong

On Fri, Nov 15, 2013 at 9:52 AM, Cong Hou <congh@google.com> wrote:
> Any more comments?
>
>
>
> thanks,
> Cong
>
>
> On Wed, Nov 13, 2013 at 6:06 PM, Cong Hou <congh@google.com> wrote:
>> Ping?
>>
>>
>> thanks,
>> Cong
>>
>>
>> On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <congh@google.com> wrote:
>>> Hi James
>>>
>>> Sorry for the late reply.
>>>
>>>
>>> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh
>>> <james.greenhalgh@arm.com> wrote:
>>>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <congh@google.com> wrote:
>>>>> > Thank you for your detailed explanation.
>>>>> >
>>>>> > Once GCC detects a reduction operation, it will automatically
>>>>> > accumulate all elements in the vector after the loop. In the loop the
>>>>> > reduction variable is always a vector whose elements are reductions of
>>>>> > corresponding values from other vectors. Therefore in your case the
>>>>> > only instruction you need to generate is:
>>>>> >
>>>>> >     VABAL   ops[3], ops[1], ops[2]
>>>>> >
>>>>> > It is OK if you accumulate the elements into one in the vector inside
>>>>> > of the loop (if one instruction can do this), but you have to make
>>>>> > sure other elements in the vector should remain zero so that the final
>>>>> > result is correct.
>>>>> >
>>>>> > If you are confused about the documentation, check the one for
>>>>> > udot_prod (just above usad in md.texi), as it has very similar
>>>>> > behavior as usad. Actually I copied the text from there and did some
>>>>> > changes. As those two instruction patterns are both for vectorization,
>>>>> > their behavior should not be difficult to explain.
>>>>> >
>>>>> > If you have more questions or think that the documentation is still
>>>>> > improper please let me know.
>>>>
>>>> Hi Cong,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and
>>>> DOT_PROD_EXPR and I see that the same ambiguity exists for
>>>> DOT_PROD_EXPR. Can you please add a note in your tree.def
>>>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either:
>>>>
>>>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>>>   tmp2 = ABS_EXPR (tmp)
>>>>   arg3 = PLUS_EXPR (tmp2, arg3)
>>>>
>>>> or:
>>>>
>>>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>>>   tmp2 = ABS_EXPR (tmp)
>>>>   arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
>>>>
>>>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a
>>>> a value of the same (widened) type as arg3.
>>>>
>>>
>>>
>>> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I
>>> mentioned it in tree.def).
>>>
>>>
>>>> Also, while looking for the history of DOT_PROD_EXPR I spotted this
>>>> patch:
>>>>
>>>>   [autovect] [patch] detect mult-hi and sad patterns
>>>>   http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html
>>>>
>>>> I wonder what the reason was for that patch to be dropped?
>>>>
>>>
>>> It has been 8 years.. I have no idea why this patch is not accepted
>>> finally. There is even no reply in that thread. But I believe the SAD
>>> pattern is very important to be recognized. ARM also provides
>>> instructions for it.
>>>
>>>
>>> Thank you for your comment again!
>>>
>>>
>>> thanks,
>>> Cong
>>>
>>>
>>>
>>>> Thanks,
>>>> James
>>>>