[PATCH] PR90424 - lowpart vector set recognition
Richard Biener
rguenther@suse.de
Tue May 14 14:02:00 GMT 2019
On Tue, 14 May 2019, Richard Sandiford wrote:
> Richard Biener <rguenther@suse.de> writes:
> > The following makes SSA rewrite (update-address-taken) recognize
> > sets of aligned sub-vectors in aligned position
> > (v2qi into v16qi, but esp. v8qi into v16qi). It uses the
> > BIT_INSERT_EXPR support for this, enabling that for vector
> > typed values. This makes us turn for example
> >
> > typedef unsigned char v16qi __attribute__((vector_size(16)));
> > v16qi load (const void *p)
> > {
> > v16qi r;
> > __builtin_memcpy (&r, p, 8);
> > return r;
> > }
> >
> > into the following
> >
> > load (const void * p)
> > {
> > v16qi r;
> > long unsigned int _3;
> > v16qi _5;
> > vector(8) unsigned char _7;
> >
> > <bb 2> :
> > _3 = MEM[(char * {ref-all})p_2(D)];
> > _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3);
> > r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>;
> > _5 = r_9;
> > return _5;
> >
> > this isn't yet nicely expanded since the BIT_INSERT_EXPR
> > expansion simply goes through store_bit_field and there's
> > no vector-mode vec_set.
> >
> > Similar as to the single-element insert SSA rewrite already
> > handles the transform is conditional on the involved
> > vector types having non-BLKmode. This is somewhat bad
> > since the transform is supposed to enable SSA optimizations
> > by rewriting memory vectors into SSA form. Since splitting
> > of larger generic vectors happens very much later only
> > this pessimizes their use. But the BIT_INSERT_EXPR
> > expansion doesn't cope with BLKmode entities (source or
> > destination).
> >
> > Extending BIT_INSERT_EXPR this way seems natural given
> > the support of CONSTRUCTORs with smaller vectors.
> > BIT_FIELD_REF isn't particularly restricted so can be
> > used to extract sub-vectors as well.
> >
> > Code generation is as bad as before (RTL expansion eventually
> > spills) but SSA optimizations are enabled on less trivial
> > testcases.
> >
> > Boostrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Comments?
> >
> > Richard.
> >
> > 2019-05-14 Richard Biener <rguenther@suse.de>
> >
> > PR tree-optimization/90424
> > * tree-ssa.c (non_rewritable_lvalue_p): Handle inserts from
> > aligned subvectors.
> > (execute_update_addresses_taken): Likewise.
> > * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> >
> > * g++.target/i386/pr90424-1.C: New testcase.
> > * g++.target/i386/pr90424-2.C: Likewise.
> >
> > Index: gcc/tree-ssa.c
> > ===================================================================
> > --- gcc/tree-ssa.c (revision 271155)
> > +++ gcc/tree-ssa.c (working copy)
> > @@ -1521,14 +1521,28 @@ non_rewritable_lvalue_p (tree lhs)
> > if (DECL_P (decl)
> > && VECTOR_TYPE_P (TREE_TYPE (decl))
> > && TYPE_MODE (TREE_TYPE (decl)) != BLKmode
> > - && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)),
> > - TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0)
> > + && multiple_of_p (sizetype,
> > + TYPE_SIZE_UNIT (TREE_TYPE (decl)),
> > + TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> > && known_ge (mem_ref_offset (lhs), 0)
> > && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))),
> > mem_ref_offset (lhs))
> > && multiple_of_p (sizetype, TREE_OPERAND (lhs, 1),
> > TYPE_SIZE_UNIT (TREE_TYPE (lhs))))
> > - return false;
> > + {
> > + if (operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)),
> > + TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))),
> > + 0))
> > + return false;
> > + /* For sub-vector inserts the insert vector mode has to be
> > + supported. */
> > + tree vtype = build_vector_type
> > + (TREE_TYPE (TREE_TYPE (decl)),
> > + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs)))
> > + / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl)))));
>
> AFAICT nothing guarantees tree_fits_uhwi_p for the lhs, so this isn't
> poly-int clean. Is there a guarantee that lhs is a multiple of the
> element size even for integers? Or are we just relying on a vector
> of 0 elements being rejected?
That it is a multiple of the element size is tested indirectly by
> > + && multiple_of_p (sizetype,
> > + TYPE_SIZE_UNIT (TREE_TYPE (decl)),
> > + TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
given 'decl' has vector type. That also guarantees non-zero size?
But yes, I guess TYPE_SIZE_UNIT might be a poly-int.
> Maybe something like:
>
> tree elt_type = TREE_TYPE (TREE_TYPE (decl));
> unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type));
> poly_uint64 lhs_bits, nelts;
> if (poly_int_tree_p (TYPE_SIZE (TREE_TYPE (lhs)), &lhs_bits)
> && multiple_p (lhs_bits, elt_bits, &nelts))
> {
> tree vtype = build_vector_type (elt_type, nelts);
This should work.
Of course it even more so makes duplicating all the bits in
two places ugly :/
I guess I might take it as opportunity to refactor the pass
(as excuse to not work on that SLP thing...).
Thanks,
Richard.
More information about the Gcc-patches
mailing list