[tree-ssa] Speedup alias analysis [patch]

Tue Jun 24 22:41:00 GMT 2003

On Tue, 2003-06-24 at 10:30, Andrew MacLeod wrote:
> On Tue, 2003-06-24 at 09:43, Diego Novillo wrote:
> > Right, and that's exactly what we do.  We only emit a copy when the
> > variables on either side of the VDEF end up in different partitions.
> > 
> 
> HOw do you know which VDEFs are significant? because there is no real
> def of them?
> 
Again: those that have their LHS and RHS in different partitions :)
The rationale is that given a VDEF of the form 'a_3 = VDEF <a_2>', if
a_3 and a_2 coalesce to different partitions, then we need to transfer
the value from a_2's partition to initialize a_3's partition.

This copy operation is apparently very rarely needed.  In the collection
of .i and .ii files I've got from GCC we do not generate a single
instance of this.  I've only seen it emitted in
gcc.c-torture/execute/980701-1.c.

It also triggers zero times in the POOMA testcases I've got.  Not
particularly encouraging :/  And I still haven't been able to find out
either a proof of correctness nor a counter-example that shows that the
transformation is wrong.  Sigh.

> So what does GLOBAL_VARs have to do with mixing virtual and non-virtual
> operands?  Its going to produce mixes like this which you are
> demonstrating?
>
No, the problem with .GLOBAL_VAR is that it is forcefully added to the
may-alias set of every call-clobbered variable.  So:

	{
	  foo (&a, &b);
	  a = 3;
	  b = 2;
	  return a + b;
	}

will add .GLOBAL_VAR to the may-alias list of 'a' and 'b', producing 

	{
	  a.1 = &a;
	  b.1 = &b;

	  # .GLOBAL_VAR_2 = VDEF <.GLOBAL_VAR_1>
	  foo (a.1, b.1);

	  # .GLOBAL_VAR_3 = VDEF <.GLOBAL_VAR_2>
	  a = 3;

	  # .GLOBAL_VAR_4 = VDEF <.GLOBAL_VAR_3>
	  b = 2;

	  # VUSE <.GLOBAL_VAR_4>
	  # VUSE <.GLOBAL_VAR_4>
	  return a + b;
	}

Instead of the more reasonable:	

	{
	  a.1 = &a;
	  b.1 = &b;

	  # a_2 = VDEF <a_1>
	  # b_4 = VDEF <b_3>
	  foo (a.1, b.1);

	  a_5 = 3;
	  b_8 = 2;

	  return 5;
	}

So, the trick is to handle .GLOBAL_VAR the other way around.  We add 'a'
and 'b' to the list of .GLOBAL_VAR's may-aliases.  The definition points
of .GLOBAL_VAR are function calls.  At every function call, we generate
one VDEF operation for each call-clobbered variable.  Meaning that we
shouldn't see references to .GLOBAL_VAR in most cases (which is good).

However, .GLOBAL_VAR is needed for the grouping heuristics.  If there
are thousands of call-clobbered variables in a function, each function
call will generate thousands of VDEFs.  So, when the may-alias set for
.GLOBAL_VAR grows too large, we need to revert to the current behaviour
and just put up with the loss of precision.

>  
> 
> > The only unaliased virtual ops are non-scalar data references.  Or you
> > mean a must-alias pass?  In the first case we need a scalarizer of some
> > kind.  In the second one, we'd prove that 'p' always points to 'a' and
> > replace every reference to '*p' with 'a' directly.
> > 
> 
> The stuff we talked about in ottawa which allows promotion of a VDEF'd
> variable to a real one when all its VUSES can be real uses. This allowed
> effective live-range splitting of variables into aliasined and
> non-aliased parts. Or something to that effect.
> 
Yes, that's when you can prove a must-alias relation in some section of
code:

	if (i_3 > j_2)
	  {
	    p_5 = &a;
	    b_8 = 9;
	    *p_5 = b_8 + 3;
	  }
	else
	  {
	    p_6 = &j;
	    *p_6 = b_1 - 10;
	  }
	# p_7 = PHI <p_5, p_6>
	< ... other de-references of p_7 ... >

You have 'p_5 must alias a' inside the THEN clause, and 'p_6 must alias
j' inside the ELSE clause.  As Jeff pointed out, we need to figure out
the SSA numbering in this case.  I seem to remember a paper that used
SSA information to do something along these lines.

> SO, a couple of things.
> First, why isnt the first one
> 
>   p_3 = &a_2 ? 
> 
ADDR_EXPR is a funny operator.  If you expose it in the IL you may run
into aliasing problems and the optimizers will likely mess things up. 
Two examples:

{
  a_1 = 0;
  p_2 = &a_1;
  return *p_2;
}

CCP will convert the above into:

{
  a_1 = 3;
  p_2 = &0;
  return *p_2;
}

At runtime, the program will SEGV.  The second one involves copy
propagation:

{
  a_2 = b_1;
  p_3 = &a_2;
  # a_4 = VDEF <a_2>
  *p_3 = 5;
  return a_4;
}

Copy propagation will turn this into:

{
  a_2 = b_1;
  p_3 = &b_1;
  # a_4 = VDEF <a_2>
  *p_3 = 5;
  return a_4;
}

At runtime, the program will never update 'a'.

> I thought we were going to expose that into the IL... otherwise
> rewriting it is a pain because I have to delve into the stmt to find the
> variable to rewrite if a_2 didn't get coalesced with the storage for
> 'a'.
> 
If we were to expose it into the IL we would have to teach the
optimizers to treat it with care.  I'm not sure I want that.  This topic
comes up every now and then.  We still haven't found a more gracious way
of dealing with ADDR_EXPRs, unfortunately.

Diego.