This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Masked load/store vectorization (take 6)

From: Richard Biener <rguenther at suse dot de>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: Sergey Ostanevich <sergos dot gnu at gmail dot com>, Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org
Date: Fri, 6 Dec 2013 14:32:06 +0100 (CET)
Subject: Re: [PATCH] Masked load/store vectorization (take 6)
Authentication-results: sourceware.org; auth=none
References: <CAGYS_TLjBfd6bLLh5Tc8Bysa9OXYdxfKKf_PgrqnPLpqJrrv2w at mail dot gmail dot com> <20131022132658 dot GM30970 at tucnak dot zalov dot cz> <CAGYS_T+UvX8YBUEpd5=vf8Xm5+kFDfY9pgsruiJ=6y=h_qEdnw at mail dot gmail dot com> <CAGYS_TJmppbJs-kbuxHnEr_swG+c1hDRg=wBywohFW=jBH8SEQ at mail dot gmail dot com> <20131023172220 dot GW30970 at tucnak dot zalov dot cz> <20131024111439 dot GZ30970 at tucnak dot zalov dot cz> <20131112142930 dot GT27813 at tucnak dot zalov dot cz> <alpine dot LNX dot 2 dot 00 dot 1311271608170 dot 8615 at zhemvz dot fhfr dot qr> <20131128230906 dot GX892 at tucnak dot redhat dot com> <alpine dot LNX dot 2 dot 00 dot 1312061249300 dot 8615 at zhemvz dot fhfr dot qr> <20131206132725 dot GN892 at tucnak dot redhat dot com>

On Fri, 6 Dec 2013, Jakub Jelinek wrote:

> On Fri, Dec 06, 2013 at 01:49:50PM +0100, Richard Biener wrote:
> > Comments inline (scary large this patch for this stage ...)
> 
> Thanks.
> 
> > > +(define_expand "maskload<mode>"
> > > +  [(set (match_operand:V48_AVX2 0 "register_operand")
> > > +	(unspec:V48_AVX2
> > > +	  [(match_operand:<sseintvecmode> 2 "register_operand")
> > > +	   (match_operand:V48_AVX2 1 "memory_operand")]
> > > +	  UNSPEC_MASKMOV))]
> > > +  "TARGET_AVX")
> > > +
> > > +(define_expand "maskstore<mode>"
> > > +  [(set (match_operand:V48_AVX2 0 "memory_operand")
> > > +	(unspec:V48_AVX2
> > > +	  [(match_operand:<sseintvecmode> 2 "register_operand")
> > > +	   (match_operand:V48_AVX2 1 "register_operand")
> > > +	   (match_dup 0)]
> > > +	  UNSPEC_MASKMOV))]
> > > +  "TARGET_AVX")
> > > +
> > >  (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
> > >    [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
> > >  	(unspec:AVX256MODE2P
> > 
> > x86 maintainers should comment here (ick - unspecs)
> 
> Well, the unspecs are preexisting (right now used by intrinsics only), I'm
> just adding expanders that will expand to those instructions.
> 
> > > @@ -4386,16 +4396,35 @@ get_references_in_stmt (gimple stmt, vec
> > >      {
> > >        unsigned i, n;
> > >  
> > > -      op0 = gimple_call_lhs_ptr (stmt);
> > > +      ref.is_read = false;
> > > +      if (gimple_call_internal_p (stmt))
> > > +	switch (gimple_call_internal_fn (stmt))
> > > +	  {
> > > +	  case IFN_MASK_LOAD:
> > > +	    ref.is_read = true;
> > > +	  case IFN_MASK_STORE:
> > > +	    ref.ref = build2 (MEM_REF,
> > > +			      ref.is_read
> > > +			      ? TREE_TYPE (gimple_call_lhs (stmt))
> > > +			      : TREE_TYPE (gimple_call_arg (stmt, 3)),
> > > +			      gimple_call_arg (stmt, 0),
> > > +			      gimple_call_arg (stmt, 1));
> > > +	    references->safe_push (ref);
> > 
> > This may not be a canonical MEM_REF AFAIK, so you should
> > use fold_build2 here (if the address is &a.b the .b needs folding
> 
> Ok, will try that.
> 
> > into the offset).  I assume the 2nd arg is always constant and
> > thus doesn't change pointer-type during propagations?
> 
> Yes, it is, always created by
>   ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
> (and for vectorized IFN_MASK_* copied over from the non-vectorized
> IFN_MASK_* call).
> 
> > > @@ -4464,7 +4493,7 @@ graphite_find_data_references_in_stmt (l
> > >  
> > >    FOR_EACH_VEC_ELT (references, i, ref)
> > >      {
> > > -      dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read);
> > > +      dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read);
> > >        gcc_assert (dr != NULL);
> > >        datarefs->safe_push (dr);
> > >      }
> > 
> > Interetsting that you succeeded in removing the indirection
> > on ref.pos ... I remember trying that twice at least and
> > failing ;)
> > 
> > You can install that as cleanup now if you split it out (so hopefully
> > no users creep back that make removing it impossible).
> 
> Ok, will do.
> 
> > > +  /* Check whether this is a load or store.  */
> > > +  lhs = gimple_assign_lhs (stmt);
> > > +  if (TREE_CODE (lhs) != SSA_NAME)
> > > +    {
> > 
> > gimple_store_p ()?
> 
> Likely.
> 
> > > +      if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
> > > +	return false;
> > > +      op = maskstore_optab;
> > > +      ref = lhs;
> > > +    }
> > > +  else if (gimple_assign_load_p (stmt))
> > > +    {
> > > +      op = maskload_optab;
> > > +      ref = gimple_assign_rhs1 (stmt);
> > > +    }
> > > +  else
> > > +    return false;
> > > +
> > > +  /* And whether REF isn't a MEM_REF with non-addressable decl.  */
> > > +  if (TREE_CODE (ref) == MEM_REF
> > > +      && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
> > > +      && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))
> > > +      && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)))
> > > +    return false;
> > 
> > I think that's overly conservative and not conservative enough.  Just
> > use may_be_nonaddressable_p () (even though the implementation can
> > need some TLC) and make sure to set TREE_ADDRESSABLE when you
> > end up taking its address.
> 
> Will try.
> 
> > Please factor out the target bits into a predicate in optabs.c
> > so you can reduce the amount of includes here.  You can eventually
> > re-use that from the vectorization parts.
> 
> Okay.
> 
> > > @@ -1404,7 +1530,8 @@ insert_gimplified_predicates (loop_p loo
> > >        basic_block bb = ifc_bbs[i];
> > >        gimple_seq stmts;
> > >  
> > > -      if (!is_predicated (bb))
> > > +      if (!is_predicated (bb)
> > > +	  || dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > 
> > isn't that redundant now?
> 
> Will try (and read the corresponding threads and IRC about that).
> 
> > >        for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > -	if ((stmt = gsi_stmt (gsi))
> > > -	    && gimple_assign_single_p (stmt)
> > > -	    && gimple_vdef (stmt))
> > > +	if ((stmt = gsi_stmt (gsi)) == NULL
> > 
> > I don't think gsi_stmt can be NULL
> 
> It can if gsi_end_p, but that is apparently in the loop condition.
> It was preexisting code anyway, but will change.
> 
> > > +   will be then if-converted, the new copy of the loop will not,
> > > +   and the LOOP_VECTORIZED internal call will be guarding which
> > > +   loop to execute.  The vectorizer pass will fold this
> > > +   internal call into either true or false.  */
> > >  
> > >  static bool
> > > +version_loop_for_if_conversion (struct loop *loop, bool *do_outer)
> > > +{
> > 
> > What's the do_outer parameter?
> 
> That is whether it should version also the outer loop.
> Though, perhaps if we solely use the loop versioning + IFN_LOOP_VECTORIZED
> for MASK_{LOAD,STORE} and perhaps MASK_CALL, we could avoid that and simply
> live with not vectorizing those in nested loops, perhaps that occurs rarely
> enough and will not be a vectorization regression.  The reason I have added
> it was when the versioning was used always for if-conversion, then without
> it we have regressed quite some tests.  Perhaps the outer loop versioning
> is too costly for too unlikely case.  So, do you prefer to drop that, or
> keep?

If it's easy to rip out it looks it can simplify the patch at this
stage which is good.

> > 
> > This needs a comment with explaining what code you create.
> 
> Ok.
> 
> > >  
> > > -  return changed;
> > > +  if (todo && version_outer_loop)
> > > +    {
> > > +      if (todo & TODO_update_ssa_only_virtuals)
> > > +	{
> > > +	  update_ssa (TODO_update_ssa_only_virtuals);
> > > +	  todo &= ~TODO_update_ssa_only_virtuals;
> > > +	}
> > 
> > Btw I hate that we do update_ssa multiple times per pass per
> > function.  That makes us possibly worse than O(N^2) as update_ssa computes
> > the IDF of the whole function.
> > 
> > This is something your patch introduces (it's only rewriting the
> > virtuals, not the incremental SSA update by BB copying).
> 
> See above.  If you don't like the 2x loop versioning, it can be dropped.
> That said, every loop versioning has one update_ssa anyway, so this
> isn't making the complexity any worse, other than constant factor.

Yeah.  On my TODO list is still cheaper loop versioning (including
getting rid of the speel_* stuff and having only a single SESE
copying machinery).

> > > +  maskt = gimple_call_arg (stmt, 2);
> > > +  lhs = gimple_call_lhs (stmt);
> > > +  type = TREE_TYPE (lhs);
> > > +  rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0),
> > > +		gimple_call_arg (stmt, 1));
> > 
> > That's possibly not canonical again, use fold_build2.
> 
> Ok, will try.
> 
> > > +	    gimple g = loop_vectorized_call;
> > > +	    tree lhs = gimple_call_lhs (g);
> > > +	    gimple_stmt_iterator gsi = gsi_for_stmt (g);
> > > +	    gimplify_and_update_call_from_tree (&gsi, boolean_true_node);
> > 
> > plain update_call_from_tree should also work here, boolean_true_node
> > is already gimple.
> 
> Will test.
> > 
> > > +	    gsi_next (&gsi);
> > > +	    if (!gsi_end_p (gsi))
> > > +	      {
> > > +		g = gsi_stmt (gsi);
> > > +		if (gimple_code (g) == GIMPLE_COND
> > > +		    && gimple_cond_lhs (g) == lhs)
> > > +		  {
> > > +		    gimple_cond_set_lhs (g, boolean_true_node);
> > 
> > or simply replace all immediate uses of 'lhs' by boolean_true_node
> > and remove the loop_vectorized call?
> 
> That would work, though e.g. removing of the call is something
> any DCE probably handles well and vectorizer relies on tons of DCE anyway.

Well, but you handle forwarding into the COND_EXPR here anyway
(probably so that cfgcleanup removes it), so simply forward it
to all uses.

> > See above.  And factor this out into a function.  Also move this
> > to the cleanup loop below.
> 
> Ok.

Thanks.

Ricahrd.

References:
- Re: [PATCH] Masked load/store vectorization (take 6)
  - From: Richard Biener
- Re: [PATCH] Masked load/store vectorization (take 6)
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]