[PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

Thu Oct 31 03:18:00 GMT 2013

On Thu, Oct 31, 2013 at 12:29 AM, Cong Hou <congh@google.com> wrote:
> On Tue, Oct 29, 2013 at 4:49 PM, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> Cong,
>>
>> Please don't do the following.
>>
>>>+++ b/gcc/testsuite/gcc.dg/vect/
>> vect-reduc-sad.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
>>
>> you are adding a test to gcc.dg/vect - It's a common directory
>> containing tests that need to run on multiple architectures and such
>> tests should be keyed by the feature they enable which can be turned
>> on for ports that have such an instruction.
>>
>> The correct way of doing this is to key this on the feature something
>> like dg-require-effective-target vect_sad_char . And define the
>> equivalent routine in testsuite/lib/target-supports.exp and enable it
>> for sse2 for the x86 port. If in doubt look at
>> check_effective_target_vect_int and a whole family of such functions
>> in testsuite/lib/target-supports.exp
>>
>> This makes life easy for other port maintainers who want to turn on
>> this support. And for bonus points please update the testcase writing
>> wiki page with this information if it isn't already there.
>>
>
> OK, I will likely move the test case to gcc.target/i386 as currently
> only SSE2 provides SAD instruction. But your suggestion also helps!

Sorry, no - I really don't like that approach, if the test remains in
the common directory keyed off as I suggested, it makes life easier
when turning this on in other ports as adding this pattern in the port
would take this test from being UNSUPPORTED->XPASS and keeps
gcc.dg/vect reasonably up to date with respect to testing the features
of the vectorizer and in touch with the way in which the tests in
gcc.dg/vect have been written till date.

I think Neon has an equivalent instruction called vaba but I will have
to check in the morning when I get back to my machine.

regards
Ramana

>
>6  abs_diff = ABS_EXPR <diff>;
>>>      [S7  abs_diff = (TYPE2) abs_diff;  #optional]
>>>      S8  sum_1 = abs_diff + sum_0;
>>>
>>>    where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the
>>>    same size of 'TYPE1' or bigger. This is a special case of a reduction
>>>    computation.
>>>
>>> For SSE2, type is char, and TYPE1 and TYPE2 are int.
>>>
>>>
>>> In order to express this new operation, a new expression SAD_EXPR is
>>> introduced in tree.def, and the corresponding entry in optabs is
>>> added. The patch also added the "define_expand" for SSE2 and AVX2
>>> platforms for i386.
>>>
>>> The patch is pasted below and also attached as a text file (in which
>>> you can see tabs). Bootstrap and make check got passed on x86. Please
>>> give me your comments.
>>>
>>>
>>>
>>> thanks,
>>> Cong
>>>
>>>
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 8a38316..d528307 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,23 @@
>>> +2013-10-29  Cong Hou  <congh@google.com>
>>> +
>>> + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD
>>> + pattern recognition.
>>> + (type_conversion_p): PROMOTION is true if it's a type promotion
>>> + conversion, and false otherwise.  Return true if the given expression
>>> + is a type conversion one.
>>> + * tree-vectorizer.h: Adjust the number of patterns.
>>> + * tree.def: Add SAD_EXPR.
>>> + * optabs.def: Add sad_optab.
>>> + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case.
>>> + * expr.c (expand_expr_real_2): Likewise.
>>> + * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>>> + * gimple.c (get_gimple_rhs_num_ops): Likewise.
>>> + * optabs.c (optab_for_tree_code): Likewise.
>>> + * tree-cfg.c (estimate_operator_cost): Likewise.
>>> + * tree-ssa-operands.c (get_expr_operands): Likewise.
>>> + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise.
>>> + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD.
>>> +
>>>  2013-10-14  David Malcolm  <dmalcolm@redhat.com>
>>>
>>>   * dumpfile.h (gcc::dump_manager): New class, to hold state
>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>>> index 7ed29f5..9ec761a 100644
>>> --- a/gcc/cfgexpand.c
>>> +++ b/gcc/cfgexpand.c
>>> @@ -2730,6 +2730,7 @@ expand_debug_expr (tree exp)
>>>   {
>>>   case COND_EXPR:
>>>   case DOT_PROD_EXPR:
>>> + case SAD_EXPR:
>>>   case WIDEN_MULT_PLUS_EXPR:
>>>   case WIDEN_MULT_MINUS_EXPR:
>>>   case FMA_EXPR:
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index c3f6c94..ca1ab70 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -6052,6 +6052,40 @@
>>>    DONE;
>>>  })
>>>
>>> +(define_expand "sadv16qi"
>>> +  [(match_operand:V4SI 0 "register_operand")
>>> +   (match_operand:V16QI 1 "register_operand")
>>> +   (match_operand:V16QI 2 "register_operand")
>>> +   (match_operand:V4SI 3 "register_operand")]
>>> +  "TARGET_SSE2"
>>> +{
>>> +  rtx t1 = gen_reg_rtx (V2DImode);
>>> +  rtx t2 = gen_reg_rtx (V4SImode);
>>> +  emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2]));
>>> +  convert_move (t2, t1, 0);
>>> +  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
>>> +  gen_rtx_PLUS (V4SImode,
>>> + operands[3], t2)));
>>> +  DONE;
>>> +})
>>> +
>>> +(define_expand "sadv32qi"
>>> +  [(match_operand:V8SI 0 "register_operand")
>>> +   (match_operand:V32QI 1 "register_operand")
>>> +   (match_operand:V32QI 2 "register_operand")
>>> +   (match_operand:V8SI 3 "register_operand")]
>>> +  "TARGET_AVX2"
>>> +{
>>> +  rtx t1 = gen_reg_rtx (V4DImode);
>>> +  rtx t2 = gen_reg_rtx (V8SImode);
>>> +  emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2]));
>>> +  convert_move (t2, t1, 0);
>>> +  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
>>> +  gen_rtx_PLUS (V8SImode,
>>> + operands[3], t2)));
>>> +  DONE;
>>> +})
>>> +
>>>  (define_insn "ashr<mode>3"
>>>    [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
>>>   (ashiftrt:VI24_AVX2
>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index 4975a64..1db8a49 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -9026,6 +9026,20 @@ expand_expr_real_2 (sepops ops, rtx target,
>>> enum machine_mode tmode,
>>>   return target;
>>>        }
>>>
>>> +      case SAD_EXPR:
>>> +      {
>>> + tree oprnd0 = treeop0;
>>> + tree oprnd1 = treeop1;
>>> + tree oprnd2 = treeop2;
>>> + rtx op2;
>>> +
>>> + expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
>>> + op2 = expand_normal (oprnd2);
>>> + target = expand_widen_pattern_expr (ops, op0, op1, op2,
>>> +    target, unsignedp);
>>> + return target;
>>> +      }
>>> +
>>>      case REALIGN_LOAD_EXPR:
>>>        {
>>>          tree oprnd0 = treeop0;
>>> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
>>> index f0f8166..514ddd1 100644
>>> --- a/gcc/gimple-pretty-print.c
>>> +++ b/gcc/gimple-pretty-print.c
>>> @@ -425,6 +425,16 @@ dump_ternary_rhs (pretty_printer *buffer, gimple
>>> gs, int spc, int flags)
>>>        dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
>>>        pp_greater (buffer);
>>>        break;
>>> +
>>> +    case SAD_EXPR:
>>> +      pp_string (buffer, "SAD_EXPR <");
>>> +      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
>>> +      pp_string (buffer, ", ");
>>> +      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
>>> +      pp_string (buffer, ", ");
>>> +      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
>>> +      pp_greater (buffer);
>>> +      break;
>>>
>>>      case VEC_PERM_EXPR:
>>>        pp_string (buffer, "VEC_PERM_EXPR <");
>>> diff --git a/gcc/gimple.c b/gcc/gimple.c
>>> index a12dd67..4975959 100644
>>> --- a/gcc/gimple.c
>>> +++ b/gcc/gimple.c
>>> @@ -2562,6 +2562,7 @@ get_gimple_rhs_num_ops (enum tree_code code)
>>>        || (SYM) == WIDEN_MULT_PLUS_EXPR    \
>>>        || (SYM) == WIDEN_MULT_MINUS_EXPR    \
>>>        || (SYM) == DOT_PROD_EXPR    \
>>> +      || (SYM) == SAD_EXPR    \
>>>        || (SYM) == REALIGN_LOAD_EXPR    \
>>>        || (SYM) == VEC_COND_EXPR    \
>>>        || (SYM) == VEC_PERM_EXPR                                             \
>>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>>> index 06a626c..4ddd4d9 100644
>>> --- a/gcc/optabs.c
>>> +++ b/gcc/optabs.c
>>> @@ -462,6 +462,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>>>      case DOT_PROD_EXPR:
>>>        return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
>>>
>>> +    case SAD_EXPR:
>>> +      return sad_optab;
>>> +
>>>      case WIDEN_MULT_PLUS_EXPR:
>>>        return (TYPE_UNSIGNED (type)
>>>        ? (TYPE_SATURATING (type)
>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>>> index 6b924ac..e35d567 100644
>>> --- a/gcc/optabs.def
>>> +++ b/gcc/optabs.def
>>> @@ -248,6 +248,7 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>>>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>>>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>>>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>>> +OPTAB_D (sad_optab, "sad$I$a")
>>>  OPTAB_D (vec_extract_optab, "vec_extract$a")
>>>  OPTAB_D (vec_init_optab, "vec_init$a")
>>>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index 075d071..226b8d5 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,7 @@
>>> +2013-10-29  Cong Hou  <congh@google.com>
>>> +
>>> + * gcc.dg/vect/vect-reduc-sad.c: New.
>>> +
>>>  2013-10-14  Tobias Burnus  <burnus@net-b.de>
>>>
>>>   PR fortran/58658
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
>>> b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
>>> new file mode 100644
>>> index 0000000..14ebb3b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
>>> @@ -0,0 +1,54 @@
>>> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
>>> +
>>> +#include <stdarg.h>
>>> +#include "tree-vect.h"
>>> +
>>> +#define N 64
>>> +#define SAD N*N/2
>>> +
>>> +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>> +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>> +
>>> +/* Sum of absolute differences between arrays of unsigned char types.
>>> +   Detected as a sad pattern.
>>> +   Vectorized on targets that support sad for unsigned chars.  */
>>> +
>>> +__attribute__ ((noinline)) int
>>> +foo (int len)
>>> +{
>>> +  int i;
>>> +  int result = 0;
>>> +
>>> +  for (i = 0; i < len; i++)
>>> +    result += abs (X[i] - Y[i]);
>>> +
>>> +  return result;
>>> +}
>>> +
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  int i;
>>> +  int sad;
>>> +
>>> +  check_vect ();
>>> +
>>> +  for (i = 0; i < N; i++)
>>> +    {
>>> +      X[i] = i;
>>> +      Y[i] = N - i;
>>> +      __asm__ volatile ("");
>>> +    }
>>> +
>>> +  sad = foo (N);
>>> +  if (sad != SAD)
>>> +    abort ();
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern:
>>> detected" 1 "vect" } } */
>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>>> +
>>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>>> index 8b66791..d689cac 100644
>>> --- a/gcc/tree-cfg.c
>>> +++ b/gcc/tree-cfg.c
>>> @@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt)
>>>        return false;
>>>
>>>      case DOT_PROD_EXPR:
>>> +    case SAD_EXPR:
>>>      case REALIGN_LOAD_EXPR:
>>>        /* FIXME.  */
>>>        return false;
>>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>>> index 2221b9c..44261a3 100644
>>> --- a/gcc/tree-inline.c
>>> +++ b/gcc/tree-inline.c
>>> @@ -3601,6 +3601,7 @@ estimate_operator_cost (enum tree_code code,
>>> eni_weights *weights,
>>>      case WIDEN_SUM_EXPR:
>>>      case WIDEN_MULT_EXPR:
>>>      case DOT_PROD_EXPR:
>>> +    case SAD_EXPR:
>>>      case WIDEN_MULT_PLUS_EXPR:
>>>      case WIDEN_MULT_MINUS_EXPR:
>>>      case WIDEN_LSHIFT_EXPR:
>>> diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
>>> index 603f797..393efc3 100644
>>> --- a/gcc/tree-ssa-operands.c
>>> +++ b/gcc/tree-ssa-operands.c
>>> @@ -854,6 +854,7 @@ get_expr_operands (gimple stmt, tree *expr_p, int flags)
>>>        }
>>>
>>>      case DOT_PROD_EXPR:
>>> +    case SAD_EXPR:
>>>      case REALIGN_LOAD_EXPR:
>>>      case WIDEN_MULT_PLUS_EXPR:
>>>      case WIDEN_MULT_MINUS_EXPR:
>>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>> index 638b981..89aa8c7 100644
>>> --- a/gcc/tree-vect-loop.c
>>> +++ b/gcc/tree-vect-loop.c
>>> @@ -3607,6 +3607,7 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
>>>      {
>>>        case WIDEN_SUM_EXPR:
>>>        case DOT_PROD_EXPR:
>>> +      case SAD_EXPR:
>>>        case PLUS_EXPR:
>>>        case MINUS_EXPR:
>>>        case BIT_IOR_EXPR:
>>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>>> index 0a4e812..7919449 100644
>>> --- a/gcc/tree-vect-patterns.c
>>> +++ b/gcc/tree-vect-patterns.c
>>> @@ -45,6 +45,8 @@ static gimple vect_recog_widen_mult_pattern
>>> (vec<gimple> *, tree *,
>>>       tree *);
>>>  static gimple vect_recog_dot_prod_pattern (vec<gimple> *, tree *,
>>>     tree *);
>>> +static gimple vect_recog_sad_pattern (vec<gimple> *, tree *,
>>> +      tree *);
>>>  static gimple vect_recog_pow_pattern (vec<gimple> *, tree *, tree *);
>>>  static gimple vect_recog_over_widening_pattern (vec<gimple> *, tree *,
>>>                                                   tree *);
>>> @@ -62,6 +64,7 @@ static vect_recog_func_ptr
>>> vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
>>>   vect_recog_widen_mult_pattern,
>>>   vect_recog_widen_sum_pattern,
>>>   vect_recog_dot_prod_pattern,
>>> +        vect_recog_sad_pattern,
>>>   vect_recog_pow_pattern,
>>>   vect_recog_widen_shift_pattern,
>>>   vect_recog_over_widening_pattern,
>>> @@ -140,9 +143,8 @@ vect_single_imm_use (gimple def_stmt)
>>>  }
>>>
>>>  /* Check whether NAME, an ssa-name used in USE_STMT,
>>> -   is a result of a type promotion or demotion, such that:
>>> +   is a result of a type promotion, such that:
>>>       DEF_STMT: NAME = NOP (name0)
>>> -   where the type of name0 (ORIG_TYPE) is smaller/bigger than the type of NAME.
>>>     If CHECK_SIGN is TRUE, check that either both types are signed or both are
>>>     unsigned.  */
>>>
>>> @@ -189,10 +191,8 @@ type_conversion_p (tree name, gimple use_stmt,
>>> bool check_sign,
>>>
>>>    if (TYPE_PRECISION (type) >= (TYPE_PRECISION (*orig_type) * 2))
>>>      *promotion = true;
>>> -  else if (TYPE_PRECISION (*orig_type) >= (TYPE_PRECISION (type) * 2))
>>> -    *promotion = false;
>>>    else
>>> -    return false;
>>> +    *promotion = false;
>>>
>>>    if (!vect_is_simple_use (oprnd0, *def_stmt, loop_vinfo,
>>>     bb_vinfo, &dummy_gimple, &dummy, &dt))
>>> @@ -433,6 +433,242 @@ vect_recog_dot_prod_pattern (vec<gimple> *stmts,
>>> tree *type_in,
>>>  }
>>>
>>>
>>> +/* Function vect_recog_sad_pattern
>>> +
>>> +   Try to find the following Sum of Absolute Difference (SAD) pattern:
>>> +
>>> +     unsigned type x_t, y_t;
>>> +     signed TYPE1 diff, abs_diff;
>>> +     TYPE2 sum = init;
>>> +   loop:
>>> +     sum_0 = phi <init, sum_1>
>>> +     S1  x_t = ...
>>> +     S2  y_t = ...
>>> +     S3  x_T = (TYPE1) x_t;
>>> +     S4  y_T = (TYPE1) y_t;
>>> +     S5  diff = x_T - y_T;
>>> +     S6  abs_diff = ABS_EXPR <diff>;
>>> +     [S7  abs_diff = (TYPE2) abs_diff;  #optional]
>>> +     S8  sum_1 = abs_diff + sum_0;
>>> +
>>> +   where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the
>>> +   same size of 'TYPE1' or bigger. This is a special case of a reduction
>>> +   computation.
>>> +
>>> +   Input:
>>> +
>>> +   * STMTS: Contains a stmt from which the pattern search begins.  In the
>>> +   example, when this function is called with S8, the pattern
>>> +   {S3,S4,S5,S6,S7,S8} will be detected.
>>> +
>>> +   Output:
>>> +
>>> +   * TYPE_IN: The type of the input arguments to the pattern.
>>> +
>>> +   * TYPE_OUT: The type of the output of this pattern.
>>> +
>>> +   * Return value: A new stmt that will be used to replace the sequence of
>>> +   stmts that constitute the pattern. In this case it will be:
>>> +        SAD_EXPR <x_t, y_t, sum_0>
>>> +  */
>>> +
>>> +static gimple
>>> +vect_recog_sad_pattern (vec<gimple> *stmts, tree *type_in,
>>> +     tree *type_out)
>>> +{
>>> +  gimple last_stmt = (*stmts)[0];
>>> +  tree sad_oprnd0, sad_oprnd1;
>>> +  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
>>> +  tree half_type;
>>> +  loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
>>> +  struct loop *loop;
>>> +  bool promotion;
>>> +
>>> +  if (!loop_info)
>>> +    return NULL;
>>> +
>>> +  loop = LOOP_VINFO_LOOP (loop_info);
>>> +
>>> +  if (!is_gimple_assign (last_stmt))
>>> +    return NULL;
>>> +
>>> +  tree sum_type = gimple_expr_type (last_stmt);
>>> +
>>> +  /* Look for the following pattern
>>> +          DX = (TYPE1) X;
>>> +          DY = (TYPE1) Y;
>>> +          DDIFF = DX - DY;
>>> +          DAD = ABS_EXPR <DDIFF>;
>>> +          DDPROD = (TYPE2) DPROD;
>>> +          sum_1 = DAD + sum_0;
>>> +     In which
>>> +     - DX is at least double the size of X
>>> +     - DY is at least double the size of Y
>>> +     - DX, DY, DDIFF, DAD all have the same type
>>> +     - sum is the same size of DAD or bigger
>>> +     - sum has been recognized as a reduction variable.
>>> +
>>> +     This is equivalent to:
>>> +       DDIFF = X w- Y;          #widen sub
>>> +       DAD = ABS_EXPR <DDIFF>;
>>> +       sum_1 = DAD w+ sum_0;    #widen summation
>>> +     or
>>> +       DDIFF = X w- Y;          #widen sub
>>> +       DAD = ABS_EXPR <DDIFF>;
>>> +       sum_1 = DAD + sum_0;     #summation
>>> +   */
>>> +
>>> +  /* Starting from LAST_STMT, follow the defs of its uses in search
>>> +     of the above pattern.  */
>>> +
>>> +  if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR)
>>> +    return NULL;
>>> +
>>> +  tree plus_oprnd0, plus_oprnd1;
>>> +
>>> +  if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
>>> +    {
>>> +      /* Has been detected as widening-summation?  */
>>> +
>>> +      gimple stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo);
>>> +      sum_type = gimple_expr_type (stmt);
>>> +      if (gimple_assign_rhs_code (stmt) != WIDEN_SUM_EXPR)
>>> +        return NULL;
>>> +      plus_oprnd0 = gimple_assign_rhs1 (stmt);
>>> +      plus_oprnd1 = gimple_assign_rhs2 (stmt);
>>> +      half_type = TREE_TYPE (plus_oprnd0);
>>> +    }
>>> +  else
>>> +    {
>>> +      gimple def_stmt;
>>> +
>>> +      if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
>>> +        return NULL;
>>> +      plus_oprnd0 = gimple_assign_rhs1 (last_stmt);
>>> +      plus_oprnd1 = gimple_assign_rhs2 (last_stmt);
>>> +      if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type)
>>> +  || !types_compatible_p (TREE_TYPE (plus_oprnd1), sum_type))
>>> +        return NULL;
>>> +
>>> +      /* The type conversion could be promotion, demotion,
>>> +         or just signed -> unsigned.  */
>>> +      if (type_conversion_p (plus_oprnd0, last_stmt, false,
>>> +                             &half_type, &def_stmt, &promotion))
>>> +        plus_oprnd0 = gimple_assign_rhs1 (def_stmt);
>>> +      else
>>> +        half_type = sum_type;
>>> +    }
>>> +
>>> +  /* So far so good.  Since last_stmt was detected as a (summation) reduction,
>>> +     we know that plus_oprnd1 is the reduction variable (defined by a
>>> loop-header
>>> +     phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop body.
>>> +     Then check that plus_oprnd0 is defined by an abs_expr  */
>>> +
>>> +  if (TREE_CODE (plus_oprnd0) != SSA_NAME)
>>> +    return NULL;
>>> +
>>> +  tree abs_type = half_type;
>>> +  gimple abs_stmt = SSA_NAME_DEF_STMT (plus_oprnd0);
>>> +
>>> +  /* It could not be the sad pattern if the abs_stmt is outside the loop.  */
>>> +  if (!gimple_bb (abs_stmt) || !flow_bb_inside_loop_p (loop,
>>> gimple_bb (abs_stmt)))
>>> +    return NULL;
>>> +
>>> +  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>>> +     inside the loop (in case we are analyzing an outer-loop).  */
>>> +  if (!is_gimple_assign (abs_stmt))
>>> +    return NULL;
>>> +
>>> +  stmt_vec_info abs_stmt_vinfo = vinfo_for_stmt (abs_stmt);
>>> +  gcc_assert (abs_stmt_vinfo);
>>> +  if (STMT_VINFO_DEF_TYPE (abs_stmt_vinfo) != vect_internal_def)
>>> +    return NULL;
>>> +  if (gimple_assign_rhs_code (abs_stmt) != ABS_EXPR)
>>> +    return NULL;
>>> +
>>> +  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
>>> +  if (!types_compatible_p (TREE_TYPE (abs_oprnd), abs_type))
>>> +    return NULL;
>>> +  if (TYPE_UNSIGNED (abs_type))
>>> +    return NULL;
>>> +
>>> +  /* We then detect if the operand of abs_expr is defined by a minus_expr.  */
>>> +
>>> +  if (TREE_CODE (abs_oprnd) != SSA_NAME)
>>> +    return NULL;
>>> +
>>> +  gimple diff_stmt = SSA_NAME_DEF_STMT (abs_oprnd);
>>> +
>>> +  /* It could not be the sad pattern if the diff_stmt is outside the loop.  */
>>> +  if (!gimple_bb (diff_stmt)
>>> +      || !flow_bb_inside_loop_p (loop, gimple_bb (diff_stmt)))
>>> +    return NULL;
>>> +
>>> +  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>>> +     inside the loop (in case we are analyzing an outer-loop).  */
>>> +  if (!is_gimple_assign (diff_stmt))
>>> +    return NULL;
>>> +
>>> +  stmt_vec_info diff_stmt_vinfo = vinfo_for_stmt (diff_stmt);
>>> +  gcc_assert (diff_stmt_vinfo);
>>> +  if (STMT_VINFO_DEF_TYPE (diff_stmt_vinfo) != vect_internal_def)
>>> +    return NULL;
>>> +  if (gimple_assign_rhs_code (diff_stmt) != MINUS_EXPR)
>>> +    return NULL;
>>> +
>>> +  tree half_type0, half_type1;
>>> +  gimple def_stmt;
>>> +
>>> +  tree minus_oprnd0 = gimple_assign_rhs1 (diff_stmt);
>>> +  tree minus_oprnd1 = gimple_assign_rhs2 (diff_stmt);
>>> +
>>> +  if (!types_compatible_p (TREE_TYPE (minus_oprnd0), abs_type)
>>> +      || !types_compatible_p (TREE_TYPE (minus_oprnd1), abs_type))
>>> +    return NULL;
>>> +  if (!type_conversion_p (minus_oprnd0, diff_stmt, false,
>>> +                          &half_type0, &def_stmt, &promotion)
>>> +      || !promotion)
>>> +    return NULL;
>>> +  sad_oprnd0 = gimple_assign_rhs1 (def_stmt);
>>> +
>>> +  if (!type_conversion_p (minus_oprnd1, diff_stmt, false,
>>> +                          &half_type1, &def_stmt, &promotion)
>>> +      || !promotion)
>>> +    return NULL;
>>> +  sad_oprnd1 = gimple_assign_rhs1 (def_stmt);
>>> +
>>> +  if (!types_compatible_p (half_type0, half_type1))
>>> +    return NULL;
>>> +  if (!TYPE_UNSIGNED (half_type0))
>>> +    return NULL;
>>> +  if (TYPE_PRECISION (abs_type) < TYPE_PRECISION (half_type0) * 2
>>> +      || TYPE_PRECISION (sum_type) < TYPE_PRECISION (half_type0) * 2)
>>> +    return NULL;
>>> +
>>> +  *type_in = TREE_TYPE (sad_oprnd0);
>>> +  *type_out = sum_type;
>>> +
>>> +  /* Pattern detected. Create a stmt to be used to replace the pattern: */
>>> +  tree var = vect_recog_temp_ssa_var (sum_type, NULL);
>>> +  gimple pattern_stmt = gimple_build_assign_with_ops
>>> +                          (SAD_EXPR, var, sad_oprnd0, sad_oprnd1, plus_oprnd1);
>>> +
>>> +  if (dump_enabled_p ())
>>> +    {
>>> +      dump_printf_loc (MSG_NOTE, vect_location,
>>> +                       "vect_recog_sad_pattern: detected: ");
>>> +      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
>>> +      dump_printf (MSG_NOTE, "\n");
>>> +    }
>>> +
>>> +  /* We don't allow changing the order of the computation in the inner-loop
>>> +     when doing outer-loop vectorization.  */
>>> +  gcc_assert (!nested_in_vect_loop_p (loop, last_stmt));
>>> +
>>> +  return pattern_stmt;
>>> +}
>>> +
>>> +
>>>  /* Handle widening operation by a constant.  At the moment we support MULT_EXPR
>>>     and LSHIFT_EXPR.
>>>
>>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>>> index 8b7b345..0aac75b 100644
>>> --- a/gcc/tree-vectorizer.h
>>> +++ b/gcc/tree-vectorizer.h
>>> @@ -1044,7 +1044,7 @@ extern void vect_slp_transform_bb (basic_block);
>>>     Additional pattern recognition functions can (and will) be added
>>>     in the future.  */
>>>  typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *);
>>> -#define NUM_PATTERNS 11
>>> +#define NUM_PATTERNS 12
>>>  void vect_pattern_recog (loop_vec_info, bb_vec_info);
>>>
>>>  /* In tree-vectorizer.c.  */
>>> diff --git a/gcc/tree.def b/gcc/tree.def
>>> index 88c850a..31a3b64 100644
>>> --- a/gcc/tree.def
>>> +++ b/gcc/tree.def
>>> @@ -1146,6 +1146,15 @@ DEFTREECODE (REDUC_PLUS_EXPR,
>>> "reduc_plus_expr", tcc_unary, 1)
>>>          arg3 = WIDEN_SUM_EXPR (tmp, arg3); */
>>>  DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
>>>
>>> +/* Widening sad (sum of absolute differences).
>>> +   The first two arguments are of type t1 which should be unsigned integer.
>>> +   The third argument and the result are of type t2, such that t2 is at least
>>> +   twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to:
>>> + tmp1 = WIDEN_MINUS_EXPR (arg1, arg2);
>>> + tmp2 = ABS_EXPR (tmp1);
>>> + arg3 = PLUS_EXPR (tmp2, arg3); */
>>> +DEFTREECODE (SAD_EXPR, "sad_expr", tcc_expression, 3)
>>> +
>>>  /* Widening summation.
>>>     The first argument is of type t1.
>>>     The second argument is of type t2, such that t2 is at least twice