[PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.
Ramana Radhakrishnan
ramana.gcc@googlemail.com
Thu Oct 31 03:18:00 GMT 2013
On Thu, Oct 31, 2013 at 12:29 AM, Cong Hou <congh@google.com> wrote:
> On Tue, Oct 29, 2013 at 4:49 PM, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> Cong,
>>
>> Please don't do the following.
>>
>>>+++ b/gcc/testsuite/gcc.dg/vect/
>> vect-reduc-sad.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
>>
>> you are adding a test to gcc.dg/vect - It's a common directory
>> containing tests that need to run on multiple architectures and such
>> tests should be keyed by the feature they enable which can be turned
>> on for ports that have such an instruction.
>>
>> The correct way of doing this is to key this on the feature something
>> like dg-require-effective-target vect_sad_char . And define the
>> equivalent routine in testsuite/lib/target-supports.exp and enable it
>> for sse2 for the x86 port. If in doubt look at
>> check_effective_target_vect_int and a whole family of such functions
>> in testsuite/lib/target-supports.exp
>>
>> This makes life easy for other port maintainers who want to turn on
>> this support. And for bonus points please update the testcase writing
>> wiki page with this information if it isn't already there.
>>
>
> OK, I will likely move the test case to gcc.target/i386 as currently
> only SSE2 provides SAD instruction. But your suggestion also helps!
Sorry, no - I really don't like that approach, if the test remains in
the common directory keyed off as I suggested, it makes life easier
when turning this on in other ports as adding this pattern in the port
would take this test from being UNSUPPORTED->XPASS and keeps
gcc.dg/vect reasonably up to date with respect to testing the features
of the vectorizer and in touch with the way in which the tests in
gcc.dg/vect have been written till date.
I think Neon has an equivalent instruction called vaba but I will have
to check in the morning when I get back to my machine.
regards
Ramana
>
>6 abs_diff = ABS_EXPR <diff>;
>>> [S7 abs_diff = (TYPE2) abs_diff; #optional]
>>> S8 sum_1 = abs_diff + sum_0;
>>>
>>> where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the
>>> same size of 'TYPE1' or bigger. This is a special case of a reduction
>>> computation.
>>>
>>> For SSE2, type is char, and TYPE1 and TYPE2 are int.
>>>
>>>
>>> In order to express this new operation, a new expression SAD_EXPR is
>>> introduced in tree.def, and the corresponding entry in optabs is
>>> added. The patch also added the "define_expand" for SSE2 and AVX2
>>> platforms for i386.
>>>
>>> The patch is pasted below and also attached as a text file (in which
>>> you can see tabs). Bootstrap and make check got passed on x86. Please
>>> give me your comments.
>>>
>>>
>>>
>>> thanks,
>>> Cong
>>>
>>>
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 8a38316..d528307 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,23 @@
>>> +2013-10-29 Cong Hou <congh@google.com>
>>> +
>>> + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD
>>> + pattern recognition.
>>> + (type_conversion_p): PROMOTION is true if it's a type promotion
>>> + conversion, and false otherwise. Return true if the given expression
>>> + is a type conversion one.
>>> + * tree-vectorizer.h: Adjust the number of patterns.
>>> + * tree.def: Add SAD_EXPR.
>>> + * optabs.def: Add sad_optab.
>>> + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case.
>>> + * expr.c (expand_expr_real_2): Likewise.
>>> + * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>>> + * gimple.c (get_gimple_rhs_num_ops): Likewise.
>>> + * optabs.c (optab_for_tree_code): Likewise.
>>> + * tree-cfg.c (estimate_operator_cost): Likewise.
>>> + * tree-ssa-operands.c (get_expr_operands): Likewise.
>>> + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise.
>>> + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD.
>>> +
>>> 2013-10-14 David Malcolm <dmalcolm@redhat.com>
>>>
>>> * dumpfile.h (gcc::dump_manager): New class, to hold state
>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>>> index 7ed29f5..9ec761a 100644
>>> --- a/gcc/cfgexpand.c
>>> +++ b/gcc/cfgexpand.c
>>> @@ -2730,6 +2730,7 @@ expand_debug_expr (tree exp)
>>> {
>>> case COND_EXPR:
>>> case DOT_PROD_EXPR:
>>> + case SAD_EXPR:
>>> case WIDEN_MULT_PLUS_EXPR:
>>> case WIDEN_MULT_MINUS_EXPR:
>>> case FMA_EXPR:
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index c3f6c94..ca1ab70 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -6052,6 +6052,40 @@
>>> DONE;
>>> })
>>>
>>> +(define_expand "sadv16qi"
>>> + [(match_operand:V4SI 0 "register_operand")
>>> + (match_operand:V16QI 1 "register_operand")
>>> + (match_operand:V16QI 2 "register_operand")
>>> + (match_operand:V4SI 3 "register_operand")]
>>> + "TARGET_SSE2"
>>> +{
>>> + rtx t1 = gen_reg_rtx (V2DImode);
>>> + rtx t2 = gen_reg_rtx (V4SImode);
>>> + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2]));
>>> + convert_move (t2, t1, 0);
>>> + emit_insn (gen_rtx_SET (VOIDmode, operands[0],
>>> + gen_rtx_PLUS (V4SImode,
>>> + operands[3], t2)));
>>> + DONE;
>>> +})
>>> +
>>> +(define_expand "sadv32qi"
>>> + [(match_operand:V8SI 0 "register_operand")
>>> + (match_operand:V32QI 1 "register_operand")
>>> + (match_operand:V32QI 2 "register_operand")
>>> + (match_operand:V8SI 3 "register_operand")]
>>> + "TARGET_AVX2"
>>> +{
>>> + rtx t1 = gen_reg_rtx (V4DImode);
>>> + rtx t2 = gen_reg_rtx (V8SImode);
>>> + emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2]));
>>> + convert_move (t2, t1, 0);
>>> + emit_insn (gen_rtx_SET (VOIDmode, operands[0],
>>> + gen_rtx_PLUS (V8SImode,
>>> + operands[3], t2)));
>>> + DONE;
>>> +})
>>> +
>>> (define_insn "ashr<mode>3"
>>> [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
>>> (ashiftrt:VI24_AVX2
>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index 4975a64..1db8a49 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -9026,6 +9026,20 @@ expand_expr_real_2 (sepops ops, rtx target,
>>> enum machine_mode tmode,
>>> return target;
>>> }
>>>
>>> + case SAD_EXPR:
>>> + {
>>> + tree oprnd0 = treeop0;
>>> + tree oprnd1 = treeop1;
>>> + tree oprnd2 = treeop2;
>>> + rtx op2;
>>> +
>>> + expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
>>> + op2 = expand_normal (oprnd2);
>>> + target = expand_widen_pattern_expr (ops, op0, op1, op2,
>>> + target, unsignedp);
>>> + return target;
>>> + }
>>> +
>>> case REALIGN_LOAD_EXPR:
>>> {
>>> tree oprnd0 = treeop0;
>>> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
>>> index f0f8166..514ddd1 100644
>>> --- a/gcc/gimple-pretty-print.c
>>> +++ b/gcc/gimple-pretty-print.c
>>> @@ -425,6 +425,16 @@ dump_ternary_rhs (pretty_printer *buffer, gimple
>>> gs, int spc, int flags)
>>> dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
>>> pp_greater (buffer);
>>> break;
>>> +
>>> + case SAD_EXPR:
>>> + pp_string (buffer, "SAD_EXPR <");
>>> + dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
>>> + pp_string (buffer, ", ");
>>> + dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
>>> + pp_string (buffer, ", ");
>>> + dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
>>> + pp_greater (buffer);
>>> + break;
>>>
>>> case VEC_PERM_EXPR:
>>> pp_string (buffer, "VEC_PERM_EXPR <");
>>> diff --git a/gcc/gimple.c b/gcc/gimple.c
>>> index a12dd67..4975959 100644
>>> --- a/gcc/gimple.c
>>> +++ b/gcc/gimple.c
>>> @@ -2562,6 +2562,7 @@ get_gimple_rhs_num_ops (enum tree_code code)
>>> || (SYM) == WIDEN_MULT_PLUS_EXPR \
>>> || (SYM) == WIDEN_MULT_MINUS_EXPR \
>>> || (SYM) == DOT_PROD_EXPR \
>>> + || (SYM) == SAD_EXPR \
>>> || (SYM) == REALIGN_LOAD_EXPR \
>>> || (SYM) == VEC_COND_EXPR \
>>> || (SYM) == VEC_PERM_EXPR \
>>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>>> index 06a626c..4ddd4d9 100644
>>> --- a/gcc/optabs.c
>>> +++ b/gcc/optabs.c
>>> @@ -462,6 +462,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>>> case DOT_PROD_EXPR:
>>> return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
>>>
>>> + case SAD_EXPR:
>>> + return sad_optab;
>>> +
>>> case WIDEN_MULT_PLUS_EXPR:
>>> return (TYPE_UNSIGNED (type)
>>> ? (TYPE_SATURATING (type)
>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>>> index 6b924ac..e35d567 100644
>>> --- a/gcc/optabs.def
>>> +++ b/gcc/optabs.def
>>> @@ -248,6 +248,7 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>>> OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>>> OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>>> OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>>> +OPTAB_D (sad_optab, "sad$I$a")
>>> OPTAB_D (vec_extract_optab, "vec_extract$a")
>>> OPTAB_D (vec_init_optab, "vec_init$a")
>>> OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index 075d071..226b8d5 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,7 @@
>>> +2013-10-29 Cong Hou <congh@google.com>
>>> +
>>> + * gcc.dg/vect/vect-reduc-sad.c: New.
>>> +
>>> 2013-10-14 Tobias Burnus <burnus@net-b.de>
>>>
>>> PR fortran/58658
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
>>> b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
>>> new file mode 100644
>>> index 0000000..14ebb3b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
>>> @@ -0,0 +1,54 @@
>>> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
>>> +
>>> +#include <stdarg.h>
>>> +#include "tree-vect.h"
>>> +
>>> +#define N 64
>>> +#define SAD N*N/2
>>> +
>>> +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>> +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>> +
>>> +/* Sum of absolute differences between arrays of unsigned char types.
>>> + Detected as a sad pattern.
>>> + Vectorized on targets that support sad for unsigned chars. */
>>> +
>>> +__attribute__ ((noinline)) int
>>> +foo (int len)
>>> +{
>>> + int i;
>>> + int result = 0;
>>> +
>>> + for (i = 0; i < len; i++)
>>> + result += abs (X[i] - Y[i]);
>>> +
>>> + return result;
>>> +}
>>> +
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> + int i;
>>> + int sad;
>>> +
>>> + check_vect ();
>>> +
>>> + for (i = 0; i < N; i++)
>>> + {
>>> + X[i] = i;
>>> + Y[i] = N - i;
>>> + __asm__ volatile ("");
>>> + }
>>> +
>>> + sad = foo (N);
>>> + if (sad != SAD)
>>> + abort ();
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern:
>>> detected" 1 "vect" } } */
>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>>> +/* { dg-final { cleanup-tree-dump "vect" } } */
>>> +
>>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>>> index 8b66791..d689cac 100644
>>> --- a/gcc/tree-cfg.c
>>> +++ b/gcc/tree-cfg.c
>>> @@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt)
>>> return false;
>>>
>>> case DOT_PROD_EXPR:
>>> + case SAD_EXPR:
>>> case REALIGN_LOAD_EXPR:
>>> /* FIXME. */
>>> return false;
>>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>>> index 2221b9c..44261a3 100644
>>> --- a/gcc/tree-inline.c
>>> +++ b/gcc/tree-inline.c
>>> @@ -3601,6 +3601,7 @@ estimate_operator_cost (enum tree_code code,
>>> eni_weights *weights,
>>> case WIDEN_SUM_EXPR:
>>> case WIDEN_MULT_EXPR:
>>> case DOT_PROD_EXPR:
>>> + case SAD_EXPR:
>>> case WIDEN_MULT_PLUS_EXPR:
>>> case WIDEN_MULT_MINUS_EXPR:
>>> case WIDEN_LSHIFT_EXPR:
>>> diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
>>> index 603f797..393efc3 100644
>>> --- a/gcc/tree-ssa-operands.c
>>> +++ b/gcc/tree-ssa-operands.c
>>> @@ -854,6 +854,7 @@ get_expr_operands (gimple stmt, tree *expr_p, int flags)
>>> }
>>>
>>> case DOT_PROD_EXPR:
>>> + case SAD_EXPR:
>>> case REALIGN_LOAD_EXPR:
>>> case WIDEN_MULT_PLUS_EXPR:
>>> case WIDEN_MULT_MINUS_EXPR:
>>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>> index 638b981..89aa8c7 100644
>>> --- a/gcc/tree-vect-loop.c
>>> +++ b/gcc/tree-vect-loop.c
>>> @@ -3607,6 +3607,7 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
>>> {
>>> case WIDEN_SUM_EXPR:
>>> case DOT_PROD_EXPR:
>>> + case SAD_EXPR:
>>> case PLUS_EXPR:
>>> case MINUS_EXPR:
>>> case BIT_IOR_EXPR:
>>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>>> index 0a4e812..7919449 100644
>>> --- a/gcc/tree-vect-patterns.c
>>> +++ b/gcc/tree-vect-patterns.c
>>> @@ -45,6 +45,8 @@ static gimple vect_recog_widen_mult_pattern
>>> (vec<gimple> *, tree *,
>>> tree *);
>>> static gimple vect_recog_dot_prod_pattern (vec<gimple> *, tree *,
>>> tree *);
>>> +static gimple vect_recog_sad_pattern (vec<gimple> *, tree *,
>>> + tree *);
>>> static gimple vect_recog_pow_pattern (vec<gimple> *, tree *, tree *);
>>> static gimple vect_recog_over_widening_pattern (vec<gimple> *, tree *,
>>> tree *);
>>> @@ -62,6 +64,7 @@ static vect_recog_func_ptr
>>> vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
>>> vect_recog_widen_mult_pattern,
>>> vect_recog_widen_sum_pattern,
>>> vect_recog_dot_prod_pattern,
>>> + vect_recog_sad_pattern,
>>> vect_recog_pow_pattern,
>>> vect_recog_widen_shift_pattern,
>>> vect_recog_over_widening_pattern,
>>> @@ -140,9 +143,8 @@ vect_single_imm_use (gimple def_stmt)
>>> }
>>>
>>> /* Check whether NAME, an ssa-name used in USE_STMT,
>>> - is a result of a type promotion or demotion, such that:
>>> + is a result of a type promotion, such that:
>>> DEF_STMT: NAME = NOP (name0)
>>> - where the type of name0 (ORIG_TYPE) is smaller/bigger than the type of NAME.
>>> If CHECK_SIGN is TRUE, check that either both types are signed or both are
>>> unsigned. */
>>>
>>> @@ -189,10 +191,8 @@ type_conversion_p (tree name, gimple use_stmt,
>>> bool check_sign,
>>>
>>> if (TYPE_PRECISION (type) >= (TYPE_PRECISION (*orig_type) * 2))
>>> *promotion = true;
>>> - else if (TYPE_PRECISION (*orig_type) >= (TYPE_PRECISION (type) * 2))
>>> - *promotion = false;
>>> else
>>> - return false;
>>> + *promotion = false;
>>>
>>> if (!vect_is_simple_use (oprnd0, *def_stmt, loop_vinfo,
>>> bb_vinfo, &dummy_gimple, &dummy, &dt))
>>> @@ -433,6 +433,242 @@ vect_recog_dot_prod_pattern (vec<gimple> *stmts,
>>> tree *type_in,
>>> }
>>>
>>>
>>> +/* Function vect_recog_sad_pattern
>>> +
>>> + Try to find the following Sum of Absolute Difference (SAD) pattern:
>>> +
>>> + unsigned type x_t, y_t;
>>> + signed TYPE1 diff, abs_diff;
>>> + TYPE2 sum = init;
>>> + loop:
>>> + sum_0 = phi <init, sum_1>
>>> + S1 x_t = ...
>>> + S2 y_t = ...
>>> + S3 x_T = (TYPE1) x_t;
>>> + S4 y_T = (TYPE1) y_t;
>>> + S5 diff = x_T - y_T;
>>> + S6 abs_diff = ABS_EXPR <diff>;
>>> + [S7 abs_diff = (TYPE2) abs_diff; #optional]
>>> + S8 sum_1 = abs_diff + sum_0;
>>> +
>>> + where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the
>>> + same size of 'TYPE1' or bigger. This is a special case of a reduction
>>> + computation.
>>> +
>>> + Input:
>>> +
>>> + * STMTS: Contains a stmt from which the pattern search begins. In the
>>> + example, when this function is called with S8, the pattern
>>> + {S3,S4,S5,S6,S7,S8} will be detected.
>>> +
>>> + Output:
>>> +
>>> + * TYPE_IN: The type of the input arguments to the pattern.
>>> +
>>> + * TYPE_OUT: The type of the output of this pattern.
>>> +
>>> + * Return value: A new stmt that will be used to replace the sequence of
>>> + stmts that constitute the pattern. In this case it will be:
>>> + SAD_EXPR <x_t, y_t, sum_0>
>>> + */
>>> +
>>> +static gimple
>>> +vect_recog_sad_pattern (vec<gimple> *stmts, tree *type_in,
>>> + tree *type_out)
>>> +{
>>> + gimple last_stmt = (*stmts)[0];
>>> + tree sad_oprnd0, sad_oprnd1;
>>> + stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
>>> + tree half_type;
>>> + loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
>>> + struct loop *loop;
>>> + bool promotion;
>>> +
>>> + if (!loop_info)
>>> + return NULL;
>>> +
>>> + loop = LOOP_VINFO_LOOP (loop_info);
>>> +
>>> + if (!is_gimple_assign (last_stmt))
>>> + return NULL;
>>> +
>>> + tree sum_type = gimple_expr_type (last_stmt);
>>> +
>>> + /* Look for the following pattern
>>> + DX = (TYPE1) X;
>>> + DY = (TYPE1) Y;
>>> + DDIFF = DX - DY;
>>> + DAD = ABS_EXPR <DDIFF>;
>>> + DDPROD = (TYPE2) DPROD;
>>> + sum_1 = DAD + sum_0;
>>> + In which
>>> + - DX is at least double the size of X
>>> + - DY is at least double the size of Y
>>> + - DX, DY, DDIFF, DAD all have the same type
>>> + - sum is the same size of DAD or bigger
>>> + - sum has been recognized as a reduction variable.
>>> +
>>> + This is equivalent to:
>>> + DDIFF = X w- Y; #widen sub
>>> + DAD = ABS_EXPR <DDIFF>;
>>> + sum_1 = DAD w+ sum_0; #widen summation
>>> + or
>>> + DDIFF = X w- Y; #widen sub
>>> + DAD = ABS_EXPR <DDIFF>;
>>> + sum_1 = DAD + sum_0; #summation
>>> + */
>>> +
>>> + /* Starting from LAST_STMT, follow the defs of its uses in search
>>> + of the above pattern. */
>>> +
>>> + if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR)
>>> + return NULL;
>>> +
>>> + tree plus_oprnd0, plus_oprnd1;
>>> +
>>> + if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
>>> + {
>>> + /* Has been detected as widening-summation? */
>>> +
>>> + gimple stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo);
>>> + sum_type = gimple_expr_type (stmt);
>>> + if (gimple_assign_rhs_code (stmt) != WIDEN_SUM_EXPR)
>>> + return NULL;
>>> + plus_oprnd0 = gimple_assign_rhs1 (stmt);
>>> + plus_oprnd1 = gimple_assign_rhs2 (stmt);
>>> + half_type = TREE_TYPE (plus_oprnd0);
>>> + }
>>> + else
>>> + {
>>> + gimple def_stmt;
>>> +
>>> + if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
>>> + return NULL;
>>> + plus_oprnd0 = gimple_assign_rhs1 (last_stmt);
>>> + plus_oprnd1 = gimple_assign_rhs2 (last_stmt);
>>> + if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type)
>>> + || !types_compatible_p (TREE_TYPE (plus_oprnd1), sum_type))
>>> + return NULL;
>>> +
>>> + /* The type conversion could be promotion, demotion,
>>> + or just signed -> unsigned. */
>>> + if (type_conversion_p (plus_oprnd0, last_stmt, false,
>>> + &half_type, &def_stmt, &promotion))
>>> + plus_oprnd0 = gimple_assign_rhs1 (def_stmt);
>>> + else
>>> + half_type = sum_type;
>>> + }
>>> +
>>> + /* So far so good. Since last_stmt was detected as a (summation) reduction,
>>> + we know that plus_oprnd1 is the reduction variable (defined by a
>>> loop-header
>>> + phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop body.
>>> + Then check that plus_oprnd0 is defined by an abs_expr */
>>> +
>>> + if (TREE_CODE (plus_oprnd0) != SSA_NAME)
>>> + return NULL;
>>> +
>>> + tree abs_type = half_type;
>>> + gimple abs_stmt = SSA_NAME_DEF_STMT (plus_oprnd0);
>>> +
>>> + /* It could not be the sad pattern if the abs_stmt is outside the loop. */
>>> + if (!gimple_bb (abs_stmt) || !flow_bb_inside_loop_p (loop,
>>> gimple_bb (abs_stmt)))
>>> + return NULL;
>>> +
>>> + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi
>>> + inside the loop (in case we are analyzing an outer-loop). */
>>> + if (!is_gimple_assign (abs_stmt))
>>> + return NULL;
>>> +
>>> + stmt_vec_info abs_stmt_vinfo = vinfo_for_stmt (abs_stmt);
>>> + gcc_assert (abs_stmt_vinfo);
>>> + if (STMT_VINFO_DEF_TYPE (abs_stmt_vinfo) != vect_internal_def)
>>> + return NULL;
>>> + if (gimple_assign_rhs_code (abs_stmt) != ABS_EXPR)
>>> + return NULL;
>>> +
>>> + tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
>>> + if (!types_compatible_p (TREE_TYPE (abs_oprnd), abs_type))
>>> + return NULL;
>>> + if (TYPE_UNSIGNED (abs_type))
>>> + return NULL;
>>> +
>>> + /* We then detect if the operand of abs_expr is defined by a minus_expr. */
>>> +
>>> + if (TREE_CODE (abs_oprnd) != SSA_NAME)
>>> + return NULL;
>>> +
>>> + gimple diff_stmt = SSA_NAME_DEF_STMT (abs_oprnd);
>>> +
>>> + /* It could not be the sad pattern if the diff_stmt is outside the loop. */
>>> + if (!gimple_bb (diff_stmt)
>>> + || !flow_bb_inside_loop_p (loop, gimple_bb (diff_stmt)))
>>> + return NULL;
>>> +
>>> + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi
>>> + inside the loop (in case we are analyzing an outer-loop). */
>>> + if (!is_gimple_assign (diff_stmt))
>>> + return NULL;
>>> +
>>> + stmt_vec_info diff_stmt_vinfo = vinfo_for_stmt (diff_stmt);
>>> + gcc_assert (diff_stmt_vinfo);
>>> + if (STMT_VINFO_DEF_TYPE (diff_stmt_vinfo) != vect_internal_def)
>>> + return NULL;
>>> + if (gimple_assign_rhs_code (diff_stmt) != MINUS_EXPR)
>>> + return NULL;
>>> +
>>> + tree half_type0, half_type1;
>>> + gimple def_stmt;
>>> +
>>> + tree minus_oprnd0 = gimple_assign_rhs1 (diff_stmt);
>>> + tree minus_oprnd1 = gimple_assign_rhs2 (diff_stmt);
>>> +
>>> + if (!types_compatible_p (TREE_TYPE (minus_oprnd0), abs_type)
>>> + || !types_compatible_p (TREE_TYPE (minus_oprnd1), abs_type))
>>> + return NULL;
>>> + if (!type_conversion_p (minus_oprnd0, diff_stmt, false,
>>> + &half_type0, &def_stmt, &promotion)
>>> + || !promotion)
>>> + return NULL;
>>> + sad_oprnd0 = gimple_assign_rhs1 (def_stmt);
>>> +
>>> + if (!type_conversion_p (minus_oprnd1, diff_stmt, false,
>>> + &half_type1, &def_stmt, &promotion)
>>> + || !promotion)
>>> + return NULL;
>>> + sad_oprnd1 = gimple_assign_rhs1 (def_stmt);
>>> +
>>> + if (!types_compatible_p (half_type0, half_type1))
>>> + return NULL;
>>> + if (!TYPE_UNSIGNED (half_type0))
>>> + return NULL;
>>> + if (TYPE_PRECISION (abs_type) < TYPE_PRECISION (half_type0) * 2
>>> + || TYPE_PRECISION (sum_type) < TYPE_PRECISION (half_type0) * 2)
>>> + return NULL;
>>> +
>>> + *type_in = TREE_TYPE (sad_oprnd0);
>>> + *type_out = sum_type;
>>> +
>>> + /* Pattern detected. Create a stmt to be used to replace the pattern: */
>>> + tree var = vect_recog_temp_ssa_var (sum_type, NULL);
>>> + gimple pattern_stmt = gimple_build_assign_with_ops
>>> + (SAD_EXPR, var, sad_oprnd0, sad_oprnd1, plus_oprnd1);
>>> +
>>> + if (dump_enabled_p ())
>>> + {
>>> + dump_printf_loc (MSG_NOTE, vect_location,
>>> + "vect_recog_sad_pattern: detected: ");
>>> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
>>> + dump_printf (MSG_NOTE, "\n");
>>> + }
>>> +
>>> + /* We don't allow changing the order of the computation in the inner-loop
>>> + when doing outer-loop vectorization. */
>>> + gcc_assert (!nested_in_vect_loop_p (loop, last_stmt));
>>> +
>>> + return pattern_stmt;
>>> +}
>>> +
>>> +
>>> /* Handle widening operation by a constant. At the moment we support MULT_EXPR
>>> and LSHIFT_EXPR.
>>>
>>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>>> index 8b7b345..0aac75b 100644
>>> --- a/gcc/tree-vectorizer.h
>>> +++ b/gcc/tree-vectorizer.h
>>> @@ -1044,7 +1044,7 @@ extern void vect_slp_transform_bb (basic_block);
>>> Additional pattern recognition functions can (and will) be added
>>> in the future. */
>>> typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *);
>>> -#define NUM_PATTERNS 11
>>> +#define NUM_PATTERNS 12
>>> void vect_pattern_recog (loop_vec_info, bb_vec_info);
>>>
>>> /* In tree-vectorizer.c. */
>>> diff --git a/gcc/tree.def b/gcc/tree.def
>>> index 88c850a..31a3b64 100644
>>> --- a/gcc/tree.def
>>> +++ b/gcc/tree.def
>>> @@ -1146,6 +1146,15 @@ DEFTREECODE (REDUC_PLUS_EXPR,
>>> "reduc_plus_expr", tcc_unary, 1)
>>> arg3 = WIDEN_SUM_EXPR (tmp, arg3); */
>>> DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
>>>
>>> +/* Widening sad (sum of absolute differences).
>>> + The first two arguments are of type t1 which should be unsigned integer.
>>> + The third argument and the result are of type t2, such that t2 is at least
>>> + twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to:
>>> + tmp1 = WIDEN_MINUS_EXPR (arg1, arg2);
>>> + tmp2 = ABS_EXPR (tmp1);
>>> + arg3 = PLUS_EXPR (tmp2, arg3); */
>>> +DEFTREECODE (SAD_EXPR, "sad_expr", tcc_expression, 3)
>>> +
>>> /* Widening summation.
>>> The first argument is of type t1.
>>> The second argument is of type t2, such that t2 is at least twice
More information about the Gcc-patches
mailing list