This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: expand from SSA form (1/2)


Hi,

On Tue, 21 Apr 2009, Andrew MacLeod wrote:

> > Anyway, here is how it works:
> > (1) in SSA form: create and coalesce partitions, as before, but don't
> >     rewrite anything.  Pass this info to cfgexpand.c.
> > (2) cfgexpand.c: allocate some space (in pseudos or stack) for each 
> >     partition not containing a default def (these are the ones corresponding
> >     to VAR_DECLs).  Set interesting attributes for that RTX according to the
> >     underlying SSA_NAME_VAR.  I.e. we don't have to create artificial abc.24
> >     variables for secondary partition.
> 
> Do you still create stack space for the original SSA_NAME variable? It 
> doesn't look like it, but I'm not real familiar with that code.

No, no space is allocated for the base variables itself.  Only for a 
partition which happens to be based on a PARM_DECL, but which doesn't 
contain the default def I have to allocate new space.  These partitions 
are the ones for which formerly new temporary variables would have been 
created (and hence new space allocated) as they weren't coalescable with 
the partition containing the default def.

> It shouldn't be needed normally right? I'm just wondering which 
> partitions without default defs would require stack space.

All partitions without default def require space.  If stack or pseudos is 
decided by use_register_for_decl(). 

> yeah, but TER is basically just a large expression table indexed by ssa 
> version.  So you just carry that table through to expand and utilize it 
> as you encounter the single use during expansion right?

Exactly.  So the effect of TER with those patches is (for now) only to 
reduce the lifetime of the LHS (most probably a pseudo).  It doesn't start 
at the original point of definition anymore, but only right before it's 
use (where it then also dies).  In the future we can make use of the table 
directly in expand, i.e. insert the RHS into the to-be-expanded 
expressions on the fly.

> > As out-of-ssa and expand are now basically one pass there can't be any 
> > passes working on non-SSA trees anymore.  Currently that's only two 
> > passes: tree-nrv, which is easily fixed, and mudflap, which I 
> > deactivated for now.
> 
> Has there been any thought to turning mudflap into a plugin now that we 
> have the machinery for that? Its seems like a ripe candidate.

At least not from my side.  I went the sissy way and instead fixed mudflap 
to work on SSA form :-|

Comments on the patch comments below :)  I'll soon send a new version of 
the patch that fixes all problems and testcases I encountered.


Ciao,
Michael.

> > + 	  /* This is a PARM_DECL or RESULT_DECL.  For those partitions that
> > + 	     contain the default def (representing the parm or result itself)
> > + 	     we don't do anything here.  But those which don't contain the
> > + 	     default def (representing a temporary based on the parm/result)
> > + 	     we need to allocate space just like for normal VAR_DECLs.  */
> 
> I presume the allocated space for these temps is normally in a register?

Yes, when optimization is on.  With -O0 mostly everything will be comitted
to stack.

> > + 	  int j = i;
> > + 	  struct partition_elem *start, *elem;
> > + 	  int has_default = 0;
> > + 	  if (SA.map->view_to_partition)
> > + 	    j = SA.map->view_to_partition[j];
> > + 	  j = partition_find (SA.map->var_partition, j);
> > + 	  start = elem = SA.map->var_partition->elements + j;
> > + 	  do
> 
> Ugg, do you have to expose the internal workings of the partition
> mechanism?  I tried to use only the published API in case I ever
> wanted to change it for performance reasons, or whatever... Maybe add
> a bitmap to the expansion structure which indicates which partitions
> have a default_def in them, and initialize it at the end of the
> out-of-ssa process once partitions don't change any more.

Yeah, I'm not terribly fond of the iteration above either.  I somehow
wanted to avoid walking all SSA names to set this to-be-invented flag,
but I also didn't want to slow the partition unioning by merging the 
flags.  I probably bite the apple and create a new bitmap per partition
as you suggested.

> > ! 		  /* Mark this stmt for removal if it is the list of replaceable
> > ! 		     expressions.  */
> 
> The stmt isnt being marked for removal, it just isnt having any expansion
> done on it...

Oh, right.  copy&pasted comments tend to become stale :)

> > +       /* Some RTL parts really want to look at DECL_RTL(x) when x
> > +          was a decl marked in REG_ATTR or MEM_ATTR.  We could use
> > + 	 SET_DECL_RTL here making this available, but that would mean
> > + 	 to select one of the potentially many RTLs for one DECL.  Instead
> > + 	 of doing that we simply reset the MEM_EXPR of the RTL in question,
> > + 	 then nobody can get at it and hence nobody can call DECL_RTL on it.  */
> 
> I presume that the MEM_EXPR is recreated somewhere? or it doesn't
> matter for some other reason?

MEM_ATTR (and hence MEM_EXPR) are created by
set_decl_rtl (via set_mem_attr*).  If it isn't set explicitely it
isn't recreated lazily anymore.  Which is a good thing.  It's used in
the RTL alias analysis.  So given two RTL MEMs it looks
up the MEM_EXPRs of both (getting at e.g. tree _DECL nodes), and
compares those.  But it then _also_ looks at DECL_RTL() of those nodes,
getting back to the RTL expression (either the original or one
representing the base object, e.g. without offset).  So it first goes
from RTL to tree and from there back to RTL.

The problem with that is that DECL_RTL() lazily tries to create the RTL
expression if it weren't set yet.  And we don't set DECL_RTL for
variables split into SSA partitions (because there are multiple RTL
expressions for each base var).  So this DECL_RTL lookup in alias.c
would lazily try to create the RTL, which then ICEs because such lazy
generation is not acceptable for VAR_DECLs.

So, we have to break the RTL->tree->RTL lookup at one of the two steps.
Either fix all RTL passes to not look up DECL_RTL() on some tree, or not
even providing the tree to start with.  I chose the latter for the
following reason: the problem only occurs with MEMs (not REGs).  We
generate MEMs for SSA names only if not optimizing or in exceptional
situations (impossible machine mode for registers for instance).  If not
optimizing the alias machinery isn't active, so doesn't need the
MEM_EXPR.  In the exceptional situations looking at the MEM RTL
expression itself is enough for disambiguation (and it happens very
seldomly).  So removing the MEM_EXPR from the MEM_ATTRs doesn't hinder
optimization, and is a central point to ensure that the later RTL
passes don't accidentally call DECL_RTL.

> > +   currently_expanding_to_rtl = 1;
> > +   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb, EXIT_BLOCK_PTR, next_bb)
> 
> Wasn't 'currently_expanding_to_rtl' already set earlier in
> gimple_expand_cfg()? 

Yeah.  It's reset to 0 five lines above:
  currently_expanding_to_rtl = 0;
  convert_from_eh_region_ranges ();
  set_eh_throw_stmt_table (cfun, NULL);
  rebuild_jump_labels (get_insns ());
  find_exception_handler_labels ();
  currently_expanding_to_rtl = 1;
The four routines in between look as if they might be affected by the
setting of currently_expanding_to_rtl, but splitting edges needs to come
afterwards and needs to have it set.  greping around now I see that
before the patch currently_expanding_to_rtl is only used in two
backends, so it's probably safe to remove the funny reset/set.

> > +       /* ??? commit_one_edge_insertion might reorder the edges of the
> > +          current block (when it needs to split an edge), so that we
> > + 	 might miss some edges with instructions on them.  Pff.
> > + 	 (see execute/ashldi-1.c).  */
> > +       VEC(edge,gc) *edges = VEC_copy (edge,gc,bb->preds);
> > + 
> > +       for (ei = ei_start (edges); (e = ei_safe_edge (ei)); )
> > + 	{
> > + 	  ei_next (&ei);
> > + 	  if (e->insns.r)
> > + 	    commit_one_edge_insertion (e);
> > + 	}
> > +       VEC_free (edge, gc, edges);
> > +     }
> 
> how annoying eh. Why doesn't commit_edge_insertions() run into this
> problem as well?

Yeah, I was also confused about this.

> It just loops over the edges with a FOR... is it
> because it does SUCC's instead? could we use SUCCs here instead?

I tried using preds and succs, iterating from back or from front.
Nothing helped.  If one thinks about it it also can't help (the way
edges are removed/inserted to split them necessarily invalidates
iterators).  The only theory I have is, that in all places where
currently commit_edge_insertions() is called there either aren't that
many critical edges with insns on them, or at least there's only one
such edge per block.

The nature of the problem is the following:  Suppose you have this edge
list (preds or succs doesn't matter):  E2 E3* E4 E5*
(the * edges have insns, suppose E3 is critical).  Now when inserting
for E3 we need to split it, let's call it E3'.  For wiring this new edge
into the above list, we first remove E3 and end up with:
  E2 E5* E4  (that's the fast removal at work, fill up the empty slot
              with the last element)
and put back E3' :  E2 E5* E4 E3'.  Voila, our iterator (now pointing to
the next element, E4) won't come by E5 anymore and miss an insertion.

Thinking about it now I see a way to solve it.  Simply not advancing the
iterator when we have something to insert should do the trick.  Still
doesn't explain why commit_edge_insertions should work.
  
> > !   result = (* pp1)->cost - (* pp2)->cost;
> >     /* Since qsort does not guarantee stability we use the elements
> 	 as a secondary key.  This provides us with independence from
> 	 the host's implementation of the sorting algorithm.  */
> 
> Hmm. how long has this been backwards... 
> its seems fine in 4.2, I rewrote it for 4.3, and jeez, its been backwards
> since 4.3.0 was released it seems.  Zoinks.

:-)  It's a bit non-obvious, because the predicate looks correct at
first sight and only becomes incorrect considering that the sorted array
then is walked from back to front ;)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]