Detect EAF flags in ipa-modref

Jan Hubicka hubicka@ucw.cz
Tue Nov 10 12:54:18 GMT 2020


> > +  tree callee = gimple_call_fndecl (stmt);
> > +  if (callee)
> > +    {
> > +      cgraph_node *node = cgraph_node::get (callee);
> > +      modref_summary *summary = node ? get_modref_function_summary (node)
> > +				: NULL;
> > +
> > +      if (summary && summary->arg_flags.length () > arg)
> 
> So could we make modref "transform" push this as fnspec attribute or
> would that not really be an optimization?

It was my original plan to synthetize fnspecs, but I think it is not
very good idea: we have the summary readily available and we can
represent information that fnspecs can't
(do not have artificial limits on number of parameters or counts)

I would preffer fnspecs to be used only for in-compiler declarations.
> > +
> > +/* Analyze EAF flags for SSA name NAME.
> > +   KNOWN_FLAGS is a cache for flags we already determined.
> > +   DEPTH is a recursion depth used to make debug output prettier.  */
> > +
> > +static int
> > +analyze_ssa_name_flags (tree name, vec<int> *known_flags, int depth)
> 
> C++ has references which makes the access to known_flags nicer ;)

Yay, will chang that :)
> 
> > +{
> > +  imm_use_iterator ui;
> > +  gimple *use_stmt;
> > +  int flags = EAF_DIRECT | EAF_NOCLOBBER | EAF_NOESCAPE | EAF_UNUSED;
> > +
> > +  /* See if value is already computed.  */
> > +  if ((*known_flags)[SSA_NAME_VERSION (name)])
> > +    {
> > +      /* Punt on cycles for now, so we do not need dataflow.  */
> > +      if ((*known_flags)[SSA_NAME_VERSION (name)] == 1)
> > +	{
> > +	  if (dump_file)
> > +	    fprintf (dump_file,
> > +		     "%*sGiving up on a cycle in SSA graph\n", depth * 4, "");
> > +	  return 0;
> > +	}
> > +      return (*known_flags)[SSA_NAME_VERSION (name)] - 2;
> > +    }
> > +  /* Recursion guard.  */
> > +  (*known_flags)[SSA_NAME_VERSION (name)] = 1;
> 
> This also guards against multiple evaluations of the same stmts
> but only in some cases?  Consider
> 
>   _1 = ..;
>   _2 = _1 + _3;
>   _4 = _1 + _5;
>   _6 = _2 + _4;
> 
> where we visit _2 = and _4 = from _1 but from both are going
> to visit _6.

Here we first push _6, then we go for _2 then for _1 evaluate _1,
evalueate _2, go for _4 and evaluate _4, and evaluate _6.
It is DFS and you need backward edge in DFS (comming from a PHI).

Cycles seems to somewhat matter for GCC: we do have a lot of functions
that walk linked lists that we could track otherwise.
> 
> Maybe I'm blind but you're not limiting depth?  Guess that asks
> for problems, esp. as you are recursing rather than using a
> worklist or so?
> 
> I see you try to "optimize" the walk by only visiting def->use
> links from parameters but then a RPO walk over all stmts would
> be simpler iteration-wise ...
We usually evaluate just small part of bigger functions (since we lose
track quite easily after hitting first memory store).  My plan was to
change this to actual dataflow once we have it well defined 
(this means after discussing EAF flags with you and adding the logic to
track callsites for true IPA pass that midly complicated things - for
every ssa name I track callsite/arg pair where it is passed to
either directly or indirectly.  Then this is translaed into call summary
and used by IPA pass to compute final flags)

I guess I can add --param ipa-modref-walk-depth for now and handle
dataflow incremntally?
In particular I am not sure if I should just write iterated RPO myself
or use tree-ssa-propagate.h (the second may be overkill).
> 
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file,
> > +	       "%*sAnalyzing flags of ssa name: ", depth * 4, "");
> > +      print_generic_expr (dump_file, name);
> > +      fprintf (dump_file, "\n");
> > +    }
> > +
> > +  FOR_EACH_IMM_USE_STMT (use_stmt, ui, name)
> > +    {
> > +      if (flags == 0)
> > +	{
> > +	  BREAK_FROM_IMM_USE_STMT (ui);
> > +	}
> > +      if (is_gimple_debug (use_stmt))
> > +	continue;
> > +      if (dump_file)
> > +	{
> > +	  fprintf (dump_file, "%*s  Analyzing stmt:", depth * 4, "");
> > +	  print_gimple_stmt (dump_file, use_stmt, 0);
> > +	}
> > +
> > +      /* Gimple return may load the return value.  */
> > +      if (gimple_code (use_stmt) == GIMPLE_RETURN)
> 
>  if (greturn *ret = dyn_cast <greturn *> (use_stmt))
> 
> makes the as_a below not needed, similar for the other cases (it
> also makes accesses cheaper, avoiding some checking code).

Looks cleaner indeed.
> 
> > +	{
> > +	  if (memory_access_to (gimple_return_retval
> > +				   (as_a <greturn *> (use_stmt)), name))
> > +	    flags &= ~EAF_UNUSED;
> > +	}
> > +      /* Account for LHS store, arg loads and flags from callee function.  */
> > +      else if (is_gimple_call (use_stmt))
> > +	{
> > +	  tree callee = gimple_call_fndecl (use_stmt);
> > +
> > +	  /* Recursion would require bit of propagation; give up for now.  */
> > +	  if (callee && recursive_call_p (current_function_decl, callee))
> > +	    flags = 0;
> > +	  else
> > +	    {
> > +	      int ecf_flags = gimple_call_flags (use_stmt);
> > +	      bool ignore_stores = ignore_stores_p (current_function_decl,
> > +						    ecf_flags);
> > +
> > +	      /* Handle *name = func (...).  */
> > +	      if (gimple_call_lhs (use_stmt)
> > +		  && memory_access_to (gimple_call_lhs (use_stmt), name))
> > +		flags &= ~(EAF_UNUSED | EAF_NOCLOBBER);
> > +
> > +	      /* Handle all function parameters.  */
> > +	      for (unsigned i = 0; i < gimple_call_num_args (use_stmt); i++)
> > +		/* Name is directly passed to the callee.  */
> > +		if (gimple_call_arg (use_stmt, i) == name)
> > +		  {
> > +		    if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
> > +		      flags &= ignore_stores
> > +			       ? 0
> > +			       : call_lhs_flags (use_stmt, i,
> > +						 known_flags, depth);
> > +		    else
> > +		      {
> > +			int call_flags = gimple_call_arg_flags (as_a <gcall *>
> > +								 (use_stmt), i);
> > +			if (ignore_stores)
> > +			  call_flags |= EAF_NOCLOBBER | EAF_NOESCAPE;
> > +			else
> > +			  call_flags &= call_lhs_flags (use_stmt, i,
> > +							known_flags, depth);
> > +
> > +			flags &= call_flags;
> > +		      }
> > +		  }
> > +		/* Name is dereferenced and passed to a callee.  */
> > +		else if (memory_access_to (gimple_call_arg (use_stmt, i), name))
> > +		  {
> > +		    if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
> > +		      flags &= ~EAF_UNUSED;
> > +		    else
> > +		      flags &= deref_flags (gimple_call_arg_flags
> > +						(as_a <gcall *> (use_stmt), i),
> > +					    ignore_stores);
> > +		    if (!ignore_stores)
> > +		      flags &= call_lhs_flags (use_stmt, i, known_flags,
> > +					       depth);
> > +		  }
> 
> Hmm, you forget gimple_call_chain at least which is at least
> argument passing?  Possibly the chain argument is never a function
> parameter but how do you know... (partial inlining?)

Ah, right. I forgot about chain.
I wil ladd it.
> 
> Then there's gimple_call_fn for indirect function calls to a
> function parameter?

Well, it is never load or store, so I do not really care about it.
for

void test (int (*foo)())
{
  foo();
}

I should be able to derive EAF_UNUSED.

void test (int (**foo)())
{
  (*foo)();
}

I will see sparate load.
> 
> I guess it would be nice to amend the gimple_walk_ stuff to have
> a SSA name callback where the tree and the SSA use is passed.
> Well, for another time.
> 
> > +	    }
> > +	  /* Only NOCLOBBER or DIRECT flags alone are not useful (see comments
> > +	     in tree-ssa-alias.c).  Give up earlier.  */
> > +	  if ((flags & ~(EAF_DIRECT | EAF_NOCLOBBER)) == 0)
> > +	    flags = 0;
> > +	}
> > +      else if (gimple_assign_load_p (use_stmt))
> > +	{
> > +	  /* Memory to memory copy.  */
> > +	  if (gimple_store_p (use_stmt))
> > +	    {
> > +	      /* Handle *name = *exp.  */
> > +	      if (memory_access_to (gimple_assign_lhs (use_stmt), name))
> > +		flags &= ~(EAF_UNUSED | EAF_NOCLOBBER);
> > +
> > +	      /* Handle *lhs = *name.
> > +
> > +		 We do not track memory locations, so assume that value
> > +		 is used arbitrarily.  */
> > +	      if (memory_access_to (gimple_assign_rhs1 (use_stmt), name))
> > +		flags = 0;
> > +	    }
> > +	  /* Handle lhs = *name.  */
> > +	  else if (memory_access_to (gimple_assign_rhs1 (use_stmt), name))
> > +	    flags &= deref_flags (analyze_ssa_name_flags
> > +				      (gimple_assign_lhs (use_stmt),
> > +				       known_flags, depth + 1), false);
> > +	}
> > +      else if (gimple_store_p (use_stmt))
> > +	{
> > +	  /* Handle *lhs = name.  */
> > +	  if (is_gimple_assign (use_stmt)
> > +	      && gimple_assign_rhs1 (use_stmt) == name)
> > +	    {
> > +	      if (dump_file)
> > +		fprintf (dump_file, "%*s  ssa name saved to memory\n",
> > +			 depth * 4, "");
> > +	      flags = 0;
> > +	    }
> > +	  /* Handle *name = exp.  */
> > +	  else if (is_gimple_assign (use_stmt)
> > +		   && memory_access_to (gimple_assign_lhs (use_stmt), name))
> > +	    flags &= ~(EAF_UNUSED | EAF_NOCLOBBER);
> > +	  /* ASM statements etc.  */
> > +	  else if (!is_gimple_assign (use_stmt))
> > +	    {
> > +	      if (dump_file)
> > +		fprintf (dump_file, "%*s  Unhandled store\n",
> > +			 depth * 4, "");
> > +	      flags = 0;
> > +	    }
> > +	}
> > +      else if (is_gimple_assign (use_stmt))
> > +	{
> > +	  enum tree_code code = gimple_assign_rhs_code (use_stmt);
> > +
> > +	  /* See if operation is a merge as considered by
> > +	     tree-ssa-structalias.c:find_func_aliases.  */
> > +	  if (!truth_value_p (code)
> > +	      && code != POINTER_DIFF_EXPR
> > +	      && (code != POINTER_PLUS_EXPR
> > +		  || gimple_assign_rhs1 (use_stmt) == name))
> > +	    flags &= analyze_ssa_name_flags
> > +			       (gimple_assign_lhs (use_stmt), known_flags,
> > +				depth + 1);
> > +	}
> > +      else if (gimple_code (use_stmt) == GIMPLE_PHI)
> > +	{
> > +	  flags &= analyze_ssa_name_flags
> > +			     (gimple_phi_result (use_stmt), known_flags,
> > +			      depth + 1);
> > +	}
> > +      /* Conditions are not considered escape points
> > +	 by tree-ssa-structalias.  */
> > +      else if (gimple_code (use_stmt) == GIMPLE_COND)
> > +	;
> > +      else
> > +	{
> > +	  if (dump_file)
> > +	    fprintf (dump_file, "%*s  Unhandled stmt\n", depth * 4, "");
> > +	  flags = 0;
> > +	}
> > +
> > +      if (dump_file)
> > +	{
> > +	  fprintf (dump_file, "%*s  current flags of ", depth * 4, "");
> > +	  print_generic_expr (dump_file, name);
> > +	  dump_eaf_flags (dump_file, flags);
> > +	}
> > +    }
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "%*sflags of ssa name ", depth * 4, "");
> > +      print_generic_expr (dump_file, name);
> > +      dump_eaf_flags (dump_file, flags);
> > +    }
> > +  (*known_flags)[SSA_NAME_VERSION (name)] = flags + 2;
> > +  return flags;
> > +}
> > +
> > +/* Determine EAF flags for function parameters.  */
> > +
> > +static void
> > +analyze_parms (modref_summary *summary)
> > +{
> > +  unsigned int parm_index = 0;
> > +  unsigned int count = 0;
> > +
> > +  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
> > +       parm = TREE_CHAIN (parm))
> > +    count++;
> > +
> > +  if (!count)
> > +    return;
> > +
> > +  auto_vec<int> known_flags;
> > +  known_flags.safe_grow_cleared (num_ssa_names);
> 
> , true for exact reserve.  Could be space optimized by not using
> auto_vec<int> but auto_vec<usigned char>?

I think there is not that much memory wasted, but indeed chars will be
more effective.
> 
> > +
> > +  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; parm_index++,
> > +       parm = TREE_CHAIN (parm))
> > +    {
> > +      tree name = ssa_default_def (cfun, parm);
> > +      if (!name)
> > +	continue;
> 
> looks like the vec might be quite sparse ...
> 
> > +      int flags = analyze_ssa_name_flags (name, &known_flags, 0);
> > +
> > +      if (flags)
> > +	{
> > +	  if (parm_index >= summary->arg_flags.length ())
> > +	    summary->arg_flags.safe_grow_cleared (count);
> 
> you want , true for exact reserve.
> 
> > +	  summary->arg_flags[parm_index] = flags;
> 
> maybe this as well - does it make sense to skip !is_gimple_reg_type
> params in the counting?  Guess it makes lookup too complicated.
> But we do have testcases with >30000 parameters ...

Well, the index needs to be actual index all call argument.
As said, i did not see any noticeable modref memory use increase at WPA
(I watched) since it is at most 1 byte for parameter in case something
got tracked.

Thanks for review!

Honza


More information about the Gcc mailing list