This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH]: Structure aliasing, part 1

From: Daniel Berlin <dberlin at dberlin dot org>
To: Diego Novillo <dnovillo at redhat dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Thu, 10 Mar 2005 10:33:09 -0500
Subject: Re: [PATCH]: Structure aliasing, part 1
References: <1110380866.5906.41.camel@linux.site> <20050310134043.GA12701@topo.toronto.redhat.com>
> This has been long overdue.  Thanks! There are various places
> that need blank lines around and comments that need reformatting.

I'll run through them all and make sure they are okay.

> 
> 
> > +/* Return the underlying variable for REF, if REF is something we can create
> > +   subvars for.  If the return value is a SSA_VAR_P, POFFSET will be the
> >
> This is confusing.  How about, "If REF is a COMPONENT_REF for a
> structure that can have sub-variables, return the sub-variable
> corresponding to the field referenced by REF."

Sure.

> 
> > +static inline tree
> > +okay_component_ref_for_subvars (tree ref, HOST_WIDE_INT *poffset,
> > +				HOST_WIDE_INT *psize)
> >
> Hmm, do we really want this in tree-flow-inline.h?  How about
> moving it to tree-dfa.c?

I moved it when i moved get_subvars_ and friends (they were originally
all in tree-ssa-alias.c).
I'm happy to move it back to tree-dfa.c if you like :)

> 
> 
> > @@ -347,6 +347,7 @@ init_tree_optimization_passes (void)
> >  
> >    p = &pass_all_optimizations.sub;
> >    NEXT_PASS (pass_referenced_vars);
> > +  NEXT_PASS (pass_create_structure_vars);
> >
> How about creating structure vars after SRA?  Do we really need
> to create sub-vars for all structures this hearly?

We had this discussion, actually, you probably don't remember it (but i
have it in my emails).

There are three possibilities:

1. Run structure variable creation very early (before SRA)
2. Run it after SRA
3. Run it both before and after SRA


The advantages of #1 is that
1. SRA has less work to do, and can make better decisions, because some
structs are now just completely gone due to const/copy prop.
2. You don't need to rerun may-alias and rewrite a whole mess of
variables.

The advantages of #2 is that
1. Theoretically, structure variable creation will create less variables
due to cleanup.

And the advantage of #3 is that
you get the advantages of #1 and #2 :)

However, #3 is harder, and actually slow.


The problem is that the advantage of #2 never actually happens.
It turns out that the scalar optimizers rarely remove an entire
structure access.  I had run some numbers, and it turns out that over an
entire bootstrap of gcc, we create *100* less SFT's if you run it after
SRA, as opposed to before it.   Yes. It's really that low of a number.

This is because even with SRA, you can't remove the loads and stores
without structure aliasing.

On the other hand, a lot of the time, running it pre-SRA *does* cause
SRA to create a lot less junk, because the work copy-prop and const prop
do on structure accesses do actually help make SRA have significantly
less work to do in terms of creating loads and stores.

I concluded at this point that it wasn't worth it to run it after SRA,
at least at the moment, and it was on my todo list to reevaluate after
TCB merges structure ccp/copy prop.




> 
> > +	      /* Since VAR is now a regular GIMPLE register, we will need
> > +		 to rename VAR into SSA afterwards.  */
> > +	      bitmap_set_bit (vars_to_rename, v_ann->uid);
> > +
> > +	      if (var_can_have_subvars (var)
> > +		  && (svars = get_subvars_for_var (var)))
> > +		{
> > +		  subvar_t sv;
> > +
> > +		  for (sv = svars; sv; sv = sv->next)
> > +		    {	      
> > +		      var_ann_t svann = var_ann (sv->var);
> > +		      if (bitmap_bit_p (ai->addresses_needed, svann->uid))
> > +			okay_to_mark = false;
> > +		      bitmap_set_bit (vars_to_rename, svann->uid);
> > +		    }
> > +		}
> >
> Could you factor the operations we do on the sub-vars?

This is a very hard trick, depending on what you mean.
I said around trying to figure out what to do about this before i
submitted it.


If you want me to make a FOR_EACH_SUBVAR, that is fine.
The problem is that the loop will still look like:

if (var_can_have_subvars (var) && (svars = get_subvars_for_var (var)))
{
	FOR_EACH_SUBVAR
	{
		do stuff
	}
}
else ...

The problem is that we have the else, because we do something completely
different when we don't actually have subvars for a variable.

The best i could come up with was

IF_AND_FOR_EACH_SUBVAR
{
	do stuff
}
else
{
	whatever.
}

Which seemed more confusing to follow than making it explicit :)



> All these loops inside the main logic make things harder to
> follow.
> 
> > +  if (TREE_CODE (addrop) == COMPONENT_REF
> > +      && (ref = okay_component_ref_for_subvars (addrop, &offset ,&size)))
> s/ ,&size/, &size/
> 
> > -  if (pt_var && SSA_VAR_P (pt_var))
> > +      for (sv = svars; sv; sv = sv->next)
> > +	{
> > +	  if (offset == sv->offset && size == sv->size)
> > +	    bitmap_set_bit (pi->pt_vars, var_ann (sv->var)->uid);
> > +	  else if (offset >= sv->offset && offset < (sv->offset + sv->size))
> > +	    bitmap_set_bit (pi->pt_vars, var_ann (sv->var)->uid);
> > +	  else if (offset < sv->offset 
> > +		   && (offset + size > sv->offset))
> > +	    bitmap_set_bit (pi->pt_vars, var_ann (sv->var)->uid);
> >
> This needs commenting.  All three alternatives do the same thing.
> Perhaps it's better if we just move the three predicates into
> one?

I'll factor it into subvar_overlaps_offset and comment it.

> 
> > +/* This structure is simply used during pushing fields onto the fieldstack
> > +   to track the offset of the field, since bitpos_of_field gives it relative
> > +   to it's immediate containing type, and we want it relative to the ultimate
> >
> s/it's/its/

I like apostrophes. :)  I'll fix all the it's vs its issues.


> 
> > +/* Return the position, in bits, of FIELD_DECL from the beginning of it's
> >
> Likewise.
> 
> > +/* Given an aggregate VAR, create the subvariables that represent it's
> >
> Likewise.
> 
> > +      /* Not all fields have DECL_SIZE set for some reason.  Also, we can't
> > +	 handle variable sized fields, or fields that are arrays.
> > +	 We *could* handle fields that are arrays, but this would require some
> > +	 additional hacking to tree-ssa-operands.c so ARRAY_REF's like
> > +	 a.b.c[5] were handled correctly (The ARRAY_REF will be handled first,
> > +	 so we need to go looking to see if it's part of a structure reference
> > +	 at that point, rather than where we do now)*/
> >
> Hmm?  I think I can parse this, but it sounds confusing.
> Needs reformatting too.

Will rewrite.

> 
> > +	  /* We need to copy the various flags from var to sv->var, so that
> > +	     they are is_global_var iff the original variable was.  */
> > +
> > +	  if (DECL_EXTERNAL (var))
> > +	    DECL_EXTERNAL (sv->var) = 1;
> >
> I'd prefer DECL_EXTERNAL (sv->var) = DECL_EXTERNAL (var).

Okeydokey.

> 
> > +	  if (TREE_PUBLIC (var))
> > +	    TREE_PUBLIC  (sv->var) = 1;
> > +	  if (TREE_STATIC (var))
> > +	    TREE_STATIC (sv->var) = 1;
> > +	  if (TREE_READONLY (fo->field))
> > +	    TREE_READONLY (sv->var) = 1;
> > +
> Likewise.
> 
> > +	  TREE_ADDRESSABLE (sv->var) = 1;
> >
> Oh?  Why are the SFTs always addressable?  I can see most of them
> being so, but are they always?
This was to work around it failing is_gimple_reg otherwise, and then it
would cutely try to create a real definition for them, which won't work.

You do the same thing in create_memory_tag, for the same reason:

  /* Memory tags are by definition addressable.  This also prevents
     is_gimple_ref frome confusing memory tags with optimizable
     variables.  */
  TREE_ADDRESSABLE (tag) = 1;

:)


> 
> > +static size_t old_referenced_vars;
> > +
> > +
> > +/* Create structure field variables for structures used in this function.  */
> > +
> > +static void
> > +create_structure_vars (void)
> > +{
> > +  basic_block bb;
> > +  size_t i;
> > +
> > +  old_referenced_vars = num_referenced_vars;
> > +  used_portions = xcalloc (num_referenced_vars, sizeof (used_part_t));
> > +  
> > +  FOR_EACH_BB (bb)
> > +    {
> > +      block_stmt_iterator bsi;
> > +      for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi))
> > +	{
> > +	  walk_tree_without_duplicates (bsi_stmt_ptr (bsi), 
> > +					find_used_portions,
> > +					NULL);
> > +	}
> > +    }
> >
> This is why I think we probably want to do this after SRA.  Let
> the initial scalar cleanup passes remove unreachable code and all
> that and then go fishing.

As i mentioned it above, we don't actually save any work that way :)
In fact, the time for the extra may-alias pass and rewriting is more
than you save in time for overlap variables.

I will also note another data point:

XLC actually creates the overlap maps in the *front end* and encodes
them into the IL.  So they create it even earlier, although they do
scalarization, etc.

Though if you really feel strongly about this, i'll move it.  I just
don't currently see advantages (Not that i don't believe that with
structure CCP/copy prop, we might not have some, but i'd rather move it
after that point, when we have more benchmarks)


> > +  free (used_portions);
> > +}
> > +struct tree_opt_pass pass_create_structure_vars = 
> > +{
> >
> Spacing after closing brace.
Yup.

> 
> > -      if (code == COMPONENT_REF)
> > -	get_expr_operands (stmt, &TREE_OPERAND (expr, 2), opf_none);
> > -      return;
> > -
> > +      {
> > +	tree ref;
> > +	HOST_WIDE_INT offset, size;
> > + 	/* This component ref becomes an access to all of the subvariables
> > +	   it can touch,  if we can determine that, but *NOT* the real one.
> > +	   If we can't determine which fields we could touch, the recursion
> > +	   will eventually get to a variable and add *all* of it's subvars, or
> > +	   whatever is the minimum correct subset.  */
> > +
> > +	ref = okay_component_ref_for_subvars (expr, &offset, &size);
> > +	if (ref)
> > +	  {	  
> > +	    subvar_t svars = get_subvars_for_var (ref);
> > +	    subvar_t sv;
> > +	    for (sv = svars; sv; sv = sv->next)
> > +	      {
> > +		
> > +		if (offset == sv->offset && size == sv->size)
> > +		    add_stmt_operand (&sv->var, s_ann, flags);
> > +		else if (offset >= sv->offset 
> > +			 && offset < (sv->offset + sv->size))
> > +		  add_stmt_operand (&sv->var, s_ann, flags & ~opf_kill_def);
> > +		else if (offset < sv->offset
> > +			 && (offset + size > sv->offset))
> > +		  add_stmt_operand (&sv->var, s_ann, flags & ~opf_kill_def);
> > +	      }
> > +	  }
> >
> Question for future work.  Instead of having to traverse the
> sub-variables, would it be worth it to have a more direct mapping
> mechanism? 

How so?

>  I'm thinking either hashing the fields or using field
> uids. 

The problem is that you need a {VAR, FIELD_UID} mapping, because fields
are specific to types, not vars.

This was the same problem i ran into during pruning when trying to make
it just mark used fields on a per var basis.
The hash table was *really* large.

>  Everything inside get_*_operands is always on relatively
> hot paths so I don't know if the additional hash table hackery
> would help or make things worse.
> 
> How are our timings wrt operand scanning? 

Fine. 
No significant speed degradation in profiles.

>  Memory usage?

It's up slightly, of course, but not more than ~1% in most cases (SFT's
themselves are cheap, since they are simply variables.  We generally
don't have more vuses/vdefs for SFT's than we did for the TMT's, though
this changes for unions, etc).  If we had cases where 10% or more memory
usage or something was being used, i'd probably change the pruning
heuristics so that didn't happen, unless it was providing like a 40%
speedup or something :)




> 
> > +Optimizations wishing to create new variables based on existing
> > +structure variables need to make sure to either mark the new variables
> > +for renaming,  or copy the subvars.
> > +
> >
> I've no idea what you're trying to say here.

For those funky things that decide they want to copy type mem tags, they
need to copy subvars too.
Follow-Ups:
- Re: [PATCH]: Structure aliasing, part 1
  - From: Diego Novillo
References:
- [PATCH]: Structure aliasing, part 1
  - From: Daniel Berlin
- Re: [PATCH]: Structure aliasing, part 1
  - From: Diego Novillo
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]