[RFC, vectorizer] Allow single element vector types for vector reduction operations
Richard Biener
rguenther@suse.de
Tue Sep 5 10:41:00 GMT 2017
On Tue, 5 Sep 2017, Tamar Christina wrote:
> Hi Richard,
>
> That was an really interesting analysis, thanks for the details!
>
> Would you be submitting the patch you proposed at the end as a fix?
I'm testing it currently.
Richard.
> Thanks,
> Tamar
>
> > -----Original Message-----
> > From: Richard Biener [mailto:rguenther@suse.de]
> > Sent: 05 September 2017 10:38
> > To: Andrew Pinski
> > Cc: Tamar Christina; Andreas Schwab; Jon Beniston; gcc-
> > patches@gcc.gnu.org; nd
> > Subject: Re: [RFC, vectorizer] Allow single element vector types for vector
> > reduction operations
> >
> > On Mon, 4 Sep 2017, Andrew Pinski wrote:
> >
> > > On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina
> > <Tamar.Christina@arm.com> wrote:
> > > >> > vect__5.25_58 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > > >> intD.11>(vect__4.21_65);
> > > >> > vect__5.25_57 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > > >> intD.11>(vect__4.22_63);
> > > >> > vect__5.25_56 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > > >> intD.11>(vect__4.23_61);
> > > >> > vect__5.25_55 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > > >> > intD.11>(vect__4.24_59);
> > > >> >
> > > >> > I suspect this patch will be quite bad for us performance wise as
> > > >> > it thinks it's as cheap to do all our integer operations on the
> > > >> > vector side with
> > > >> vectors of 1 element. But I'm still waiting for the perf numbers to
> > confirm.
> > > >>
> > > >> Looks like the backend advertises that it can do POPCOUNT on V1DI.
> > > >> So SLP vectorization decides it can vectorize this without unrolling.
> > > >
> > > > We don't, POPCOUNT is only defined for vector modes V8QI and V16QI,
> > > > we also don't define support For V1DI anywhere in the backend, we do
> > > > however say we support V1DF, but removing That doesn't cause the ICE
> > to go away.
> > > >
> > > >> Vectorization with V2DI is rejected:
> > > >>
> > > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function
> > > >> is not vectorizable.
> > > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not
> > vectorized:
> > > >> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
> > > >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: *****
> > > >> Re-trying analysis with vector size 8
> > > >>
> > > >> and that now succeeds (it probably didn't succeed before the patch).
> > > >
> > > > In the .optimized file, I see it vectorised it to
> > > >
> > > > vect__5.25_58 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > intD.11>(vect__4.21_65);
> > > > vect__5.25_57 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > intD.11>(vect__4.22_63);
> > > > vect__5.25_56 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > intD.11>(vect__4.23_61);
> > > > vect__5.25_55 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> > intD.11>(vect__4.24_59);
> > > > _54 = POPCOUNT (vect__5.25_58);
> > > > _53 = POPCOUNT (vect__5.25_57);
> > > >
> > > > Which is something we just don't have a pattern for. Before this patch, it
> > was rejecting "long unsigned int"
> > > > With this patch is somehow thinks we support an integer vector of 1
> > > > element, even though 1) we don't have an optab Defined for this
> > operation for POPCOUNT (or at all in aarch64 as far as I can tell), and 2) we
> > don't have it in our supported list of vector modes.
> > >
> > > Here are the two popcount optab aarch64 has:
> > > (define_mode_iterator GPI [SI DI])
> > > (define_expand "popcount<mode>2"
> > > [(match_operand:GPI 0 "register_operand")
> > > (match_operand:GPI 1 "register_operand")]
> > >
> > >
> > > (define_insn "popcount<mode>2"
> > > (define_mode_iterator VB [V8QI V16QI])
> > > [(set (match_operand:VB 0 "register_operand" "=w")
> > > (popcount:VB (match_operand:VB 1 "register_operand" "w")))]
> > >
> > > As you can see we only define popcount optab for SI, DI, V8QI and
> > > V16QI. (note SI and DI uses the V8QI and V16QI during the expansion
> > > but that is a different story).
> > >
> > > Maybe somehow the vectorizer is thinking V1DI and DI are
> > > interchangeable in some places.
> >
> > We ask
> >
> > Breakpoint 5, vectorizable_internal_function
> > (cfn=CFN_BUILT_IN_POPCOUNTL,
> > fndecl=<function_decl 0x7ffff694b500 __builtin_popcountl>,
> > vectype_out=<vector_type 0x7ffff69789d8>,
> > vectype_in=<vector_type 0x7ffff697a690>)
> > at /tmp/trunk/gcc/tree-vect-stmts.c:1666
> > 1666 if (internal_fn_p (cfn))
> > (gdb) p debug_generic_expr (vectype_out)
> > vector(2) int
> > $10 = void
> > (gdb) p debug_generic_expr (vectype_in)
> > vector(1) long unsigned int
> > $11 = void
> > (gdb) fin
> > Run till exit from #0 vectorizable_internal_function (
> > cfn=CFN_BUILT_IN_POPCOUNTL,
> > fndecl=<function_decl 0x7ffff694b500 __builtin_popcountl>,
> > vectype_out=<vector_type 0x7ffff69789d8>,
> > vectype_in=<vector_type 0x7ffff697a690>)
> > at /tmp/trunk/gcc/tree-vect-stmts.c:1666
> > 0x0000000001206afc in vectorizable_call (gs=<gimple_call 0x7ffff7fee7e0>,
> > gsi=0x0, vec_stmt=0x0, slp_node=0x2451290)
> > at /tmp/trunk/gcc/tree-vect-stmts.c:2762
> > 2762 ifn = vectorizable_internal_function (cfn, callee,
> > vectype_out,
> > Value returned is $12 = IFN_POPCOUNT
> >
> > so somehow direct_internal_fn_supported_p says true for a POPCOUNTL
> > V1DI -> V2SI. I'd argue the question is odd already but the ultimative answer
> > is cleary wrong ;)
> >
> > We have
> >
> > (gdb) p info
> > $22 = (const direct_internal_fn_info &) @0x19b8e0c: {type0 = 0, type1 = 0,
> > vectorizable = 1}
> >
> > so require vectype_in as vectype_out which means we ask for V1DI -> V1DI
> > (and the vectorizer compensates for the demotion). But the mode of V1DI is
> > actually DI (because the target doesn't support V1DI).
> > Thus we end up with asking for popcount with DImode which is available.
> >
> > Note that this V1DI vector type having DImode is desirable for Jons case as
> > far as I understand. DImode gets used because aarch64 advertises generic
> > vectorization support with word_mode.
> >
> > Things may get tricky later when we look for VEC_PACK_TRUNC of V1DI,
> > V1DI -> V2SI but the vec_pack_trunc_optab advertises DImode support via
> > the VDN iterator (maybe expecting this only in case no V2SI support is
> > available).
> >
> > So in the end the vectorizer works as expected ;)
> >
> > What we don't seem to handle is a single-element vector typed, DImode
> > constructor with a single DImode element.
> >
> > We call store_bit_field with fieldmode DI but a value with V2SI, I guess things
> > go downhill from there. The SSA exp we store was DImode. Ah.
> >
> > <ssa_name 0x7ffff66c0438
> > type <integer_type 0x7ffff68a07e0 long unsigned int public unsigned DI
> > size <integer_cst 0x7ffff6891a98 constant 64>
> > unit-size <integer_cst 0x7ffff6891ab0 constant 8>
> > align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
> > 0x7ffff68a07e0 precision:64 min <integer_cst 0x7ffff6891d68 0> max
> > <integer_cst 0x7ffff6893540 18446744073709551615>
> > pointer_to_this <pointer_type 0x7ffff68b1c78>>
> > visited
> > def_stmt _62 = VEC_PACK_TRUNC_EXPR <_67, _64>;
> >
> > I suppose _that_'s already somewhat bogus. vector lowering creates it for
> > some reason:
> >
> > - vect__8.26_52 = VEC_PACK_TRUNC_EXPR <_54, _53>;
> > + _67 = BIT_FIELD_REF <_54, 64, 0>;
> > + _64 = BIT_FIELD_REF <_53, 64, 0>;
> > + _62 = VEC_PACK_TRUNC_EXPR <_67, _64>;
> > + _60 = {_62};
> > + _48 = VIEW_CONVERT_EXPR<vector(2) int>(_60);
> > + vect__8.26_52 = _48;
> >
> > which happens because
> >
> > static tree
> > get_compute_type (enum tree_code code, optab op, tree type) {
> > /* For very wide vectors, try using a smaller vector mode. */
> > tree compute_type = type;
> > if (op
> > && (!VECTOR_MODE_P (TYPE_MODE (type))
> > || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
> >
> > and/or
> >
> > if (compute_type == NULL_TREE)
> > compute_type = get_compute_type (code, op, type);
> > if (compute_type == type)
> > return;
> >
> > note that lowering sth like VEC_PACK_TRUNC_EXPR into scalars is pointless -
> > at least in the way the lowering code does.
> >
> > Index: gcc/tree-vect-generic.c
> > ==========================================================
> > =========
> > --- gcc/tree-vect-generic.c (revision 251642)
> > +++ gcc/tree-vect-generic.c (working copy)
> > @@ -1439,7 +1439,7 @@ get_compute_type (enum tree_code code, o
> > /* For very wide vectors, try using a smaller vector mode. */
> > tree compute_type = type;
> > if (op
> > - && (!VECTOR_MODE_P (TYPE_MODE (type))
> > + && (TYPE_MODE (type) == BLKmode
> > || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
> > {
> > tree vector_compute_type
> > @@ -1459,7 +1459,7 @@ get_compute_type (enum tree_code code, o
> > if (compute_type == type)
> > {
> > machine_mode compute_mode = TYPE_MODE (compute_type);
> > - if (VECTOR_MODE_P (compute_mode))
> > + if (compute_mode != BLKmode)
> > {
> > if (op && optab_handler (op, compute_mode) !=
> > CODE_FOR_nothing)
> > return compute_type;
> >
> > fixes this and we pass the testcase again. Note we vectorize that way even
> > with the cost model enabled so it might show us that the cost model fails to
> > account for some costs:
> >
> > /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: Cost model
> > analysis:
> > Vector inside of loop cost: 8
> > Vector prologue cost: 2
> > Vector epilogue cost: 0
> > Scalar iteration cost: 16
> > Scalar outside cost: 0
> > Vector outside cost: 2
> > prologue iterations: 0
> > epilogue iterations: 0
> > Calculated minimum iters for profitability: 1
> >
> > SLP costing seems to fail to account for promotion/demotion costs it seems
> > accounting for that still doesn't push it over the edge.
> >
> > Richard.
> >
> > > Thanks,
> > > Andrew
> > >
> > >
> > > >
> > > > So I don't quite understand how we end up with this expression.
> > > >
> > > > Regards,
> > > > Tamar
> > > >
> > > >>
> > > >> Richard.
> > > >>
> > > >> > ________________________________________
> > > >> > From: gcc-patches-owner@gcc.gnu.org
> > > >> > <gcc-patches-owner@gcc.gnu.org>
> > > >> on
> > > >> > behalf of Andreas Schwab <schwab@linux-m68k.org>
> > > >> > Sent: Saturday, September 2, 2017 10:58 PM
> > > >> > To: Jon Beniston
> > > >> > Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org
> > > >> > Subject: Re: [RFC, vectorizer] Allow single element vector types
> > > >> > for vector reduction operations
> > > >> >
> > > >> > On Aug 30 2017, "Jon Beniston" <jon@beniston.com> wrote:
> > > >> >
> > > >> > > gcc/
> > > >> > > 2017-08-30 Jon Beniston <jon@beniston.com>
> > > >> > > Richard Biener <rguenther@suse.de>
> > > >> > >
> > > >> > > diff --git a/gcc/tree-vect-patterns.c
> > > >> > > b/gcc/tree-vect-patterns.c index cfdb72c..5ebeac2 100644
> > > >> > > --- a/gcc/tree-vect-patterns.c
> > > >> > > +++ b/gcc/tree-vect-patterns.c
> > > >> > > @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func
> > > >> *recog_func,
> > > >> > > loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> > > >> > >
> > > >> > > if (VECTOR_BOOLEAN_TYPE_P (type_in)
> > > >> > > - || VECTOR_MODE_P (TYPE_MODE (type_in)))
> > > >> > > + || VECTOR_TYPE_P (type_in))
> > > >> > > {
> > > >> > > /* No need to check target support (already checked by the
> > pattern
> > > >> > > recognition function). */ diff --git
> > > >> > > a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index
> > > >> > > 013fb1f..fc62efb 100644
> > > >> > > --- a/gcc/tree-vect-stmts.c
> > > >> > > +++ b/gcc/tree-vect-stmts.c
> > > >> > > @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size
> > > >> > > (tree scalar_type, unsigned size)
> > > >> > > else
> > > >> > > simd_mode = mode_for_vector (inner_mode, size / nbytes);
> > > >> > > nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> > > >> > > - if (nunits <= 1)
> > > >> > > + /* NOTE: nunits == 1 is allowed to support single element vector
> > types.
> > > >> > > */
> > > >> > > + if (nunits < 1)
> > > >> > > return NULL_TREE;
> > > >> > >
> > > >> > > vectype = build_vector_type (scalar_type, nunits);
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >> > That breaks vect/pr68577.c on aarch64.
> > > >> >
> > > >> > during RTL pass: expand
> > > >> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c: In
> > > >> > function
> > > >> 'slp_test':
> > > >> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c:20:12:
> > > >> > internal compiler error: in simplify_subreg, at
> > > >> > simplify-rtx.c:6050
> > > >> > 0xb55983 simplify_subreg(machine_mode, rtx_def*, machine_mode,
> > > >> unsigned int)
> > > >> > ../../gcc/simplify-rtx.c:6049
> > > >> > 0xb595f7 simplify_gen_subreg(machine_mode, rtx_def*,
> > > >> > machine_mode,
> > > >> unsigned int)
> > > >> > ../../gcc/simplify-rtx.c:6278
> > > >> > 0x81d277 store_bit_field_1
> > > >> > ../../gcc/expmed.c:798
> > > >> > 0x81d55f store_bit_field(rtx_def*, unsigned long, unsigned long,
> > > >> > unsigned
> > > >> long, unsigned long, machine_mode, rtx_def*, bool)
> > > >> > ../../gcc/expmed.c:1133
> > > >> > 0x840bf7 store_field
> > > >> > ../../gcc/expr.c:6950
> > > >> > 0x84792f store_constructor_field
> > > >> > ../../gcc/expr.c:6142
> > > >> > 0x83edbf store_constructor
> > > >> > ../../gcc/expr.c:6726
> > > >> > 0x840443 expand_constructor
> > > >> > ../../gcc/expr.c:8027
> > > >> > 0x82d5bf expand_expr_real_1(tree_node*, rtx_def*,
> > machine_mode,
> > > >> expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/expr.c:10133
> > > >> > 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*,
> > machine_mode,
> > > >> expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/expr.c:9819
> > > >> > 0x82dadf expand_expr_real_1(tree_node*, rtx_def*,
> > machine_mode,
> > > >> expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/expr.c:10942
> > > >> > 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*,
> > machine_mode,
> > > >> expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/expr.c:9819
> > > >> > 0x83d197 expand_expr
> > > >> > ../../gcc/expr.h:276
> > > >> > 0x83d197 expand_assignment(tree_node*, tree_node*, bool)
> > > >> > ../../gcc/expr.c:4971
> > > >> > 0x71e2f3 expand_gimple_stmt_1
> > > >> > ../../gcc/cfgexpand.c:3653
> > > >> > 0x71e2f3 expand_gimple_stmt
> > > >> > ../../gcc/cfgexpand.c:3751 0x721cdb
> > > >> > expand_gimple_basic_block
> > > >> > ../../gcc/cfgexpand.c:5750
> > > >> > 0x726b07 execute
> > > >> > ../../gcc/cfgexpand.c:6357
> > > >> >
> > > >> > Andreas.
> > > >> >
> > > >> > --
> > > >> > Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA
> > > >> > 54C7 6D53 942B 1756 01D3 44D5 214B 8276
> > > >> > 4ED5 "And now for something completely different."
> > > >> >
> > > >> >
> > > >>
> > > >> --
> > > >> Richard Biener <rguenther@suse.de>
> > > >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > >> Norton, HRB 21284 (AG Nuernberg)
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > HRB 21284 (AG Nuernberg)
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
More information about the Gcc-patches
mailing list