[PATCH] PR90424 - lowpart vector set recognition

Richard Biener rguenther@suse.de
Tue May 14 14:02:00 GMT 2019


On Tue, 14 May 2019, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > The following makes SSA rewrite (update-address-taken) recognize
> > sets of aligned sub-vectors in aligned position
> > (v2qi into v16qi, but esp. v8qi into v16qi).  It uses the
> > BIT_INSERT_EXPR support for this, enabling that for vector
> > typed values.  This makes us turn for example
> >
> > typedef unsigned char v16qi __attribute__((vector_size(16)));
> > v16qi load (const void *p)
> > {
> >   v16qi r;
> >   __builtin_memcpy (&r, p, 8);
> >   return r;
> > }
> >
> > into the following
> >
> > load (const void * p)
> > {
> >   v16qi r;
> >   long unsigned int _3;
> >   v16qi _5;
> >   vector(8) unsigned char _7;
> >
> >   <bb 2> :
> >   _3 = MEM[(char * {ref-all})p_2(D)];
> >   _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3);
> >   r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>;
> >   _5 = r_9;
> >   return _5;
> >
> > this isn't yet nicely expanded since the BIT_INSERT_EXPR
> > expansion simply goes through store_bit_field and there's
> > no vector-mode vec_set.
> >
> > Similar as to the single-element insert SSA rewrite already
> > handles the transform is conditional on the involved
> > vector types having non-BLKmode.  This is somewhat bad
> > since the transform is supposed to enable SSA optimizations
> > by rewriting memory vectors into SSA form.  Since splitting
> > of larger generic vectors happens very much later only
> > this pessimizes their use.  But the BIT_INSERT_EXPR
> > expansion doesn't cope with BLKmode entities (source or
> > destination).
> >
> > Extending BIT_INSERT_EXPR this way seems natural given
> > the support of CONSTRUCTORs with smaller vectors.
> > BIT_FIELD_REF isn't particularly restricted so can be
> > used to extract sub-vectors as well.
> >
> > Code generation is as bad as before (RTL expansion eventually
> > spills) but SSA optimizations are enabled on less trivial
> > testcases.
> >
> > Boostrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Comments?
> >
> > Richard.
> >
> > 2019-05-14  Richard Biener  <rguenther@suse.de>
> >
> > 	PR tree-optimization/90424
> > 	* tree-ssa.c (non_rewritable_lvalue_p): Handle inserts from
> > 	aligned subvectors.
> > 	(execute_update_addresses_taken): Likewise.
> > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> >
> > 	* g++.target/i386/pr90424-1.C: New testcase.
> > 	* g++.target/i386/pr90424-2.C: Likewise.
> >
> > Index: gcc/tree-ssa.c
> > ===================================================================
> > --- gcc/tree-ssa.c	(revision 271155)
> > +++ gcc/tree-ssa.c	(working copy)
> > @@ -1521,14 +1521,28 @@ non_rewritable_lvalue_p (tree lhs)
> >        if (DECL_P (decl)
> >  	  && VECTOR_TYPE_P (TREE_TYPE (decl))
> >  	  && TYPE_MODE (TREE_TYPE (decl)) != BLKmode
> > -	  && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)),
> > -			      TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0)
> > +	  && multiple_of_p (sizetype,
> > +			    TYPE_SIZE_UNIT (TREE_TYPE (decl)),
> > +			    TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> >  	  && known_ge (mem_ref_offset (lhs), 0)
> >  	  && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))),
> >  		       mem_ref_offset (lhs))
> >  	  && multiple_of_p (sizetype, TREE_OPERAND (lhs, 1),
> >  			    TYPE_SIZE_UNIT (TREE_TYPE (lhs))))
> > -	return false;
> > +	{
> > +	  if (operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)),
> > +			       TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))),
> > +			       0))
> > +	    return false;
> > +	  /* For sub-vector inserts the insert vector mode has to be
> > +	     supported.  */
> > +	  tree vtype = build_vector_type
> > +	      (TREE_TYPE (TREE_TYPE (decl)),
> > +	       tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs)))
> > +	       / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl)))));
> 
> AFAICT nothing guarantees tree_fits_uhwi_p for the lhs, so this isn't
> poly-int clean.  Is there a guarantee that lhs is a multiple of the
> element size even for integers?  Or are we just relying on a vector
> of 0 elements being rejected?

That it is a multiple of the element size is tested indirectly by

> > +     && multiple_of_p (sizetype, 
> > +                       TYPE_SIZE_UNIT (TREE_TYPE (decl)),
> > +                       TYPE_SIZE_UNIT (TREE_TYPE (lhs)))

given 'decl' has vector type.  That also guarantees non-zero size?

But yes, I guess TYPE_SIZE_UNIT might be a poly-int.

> Maybe something like:
> 
> 	  tree elt_type = TREE_TYPE (TREE_TYPE (decl));
> 	  unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type));
> 	  poly_uint64 lhs_bits, nelts;
> 	  if (poly_int_tree_p (TYPE_SIZE (TREE_TYPE (lhs)), &lhs_bits)
> 	      && multiple_p (lhs_bits, elt_bits, &nelts))
>             {
> 	      tree vtype = build_vector_type (elt_type, nelts);

This should work.

Of course it even more so makes duplicating all the bits in
two places ugly :/

I guess I might take it as opportunity to refactor the pass
(as excuse to not work on that SLP thing...).

Thanks,
Richard.



More information about the Gcc-patches mailing list