Run copy-prop before VRP to increase VRP optimization opportunities

Jan Hubicka hubicka@ucw.cz
Fri Jun 24 13:37:00 GMT 2005


> 
> While investigating the feasibility of removing the VRP bits from
> DOM I discovered that tree-vrp.c regularly misses optimization
> opportunities when copies are present in the IL. 
> 
> Here's a trivial little example (greatly simplified from
> alias.c::nonlocal_mentioned_p:
> 
> SSA form after inserting ASSERT_EXPRs
> nonlocal_mentioned_p (x)
> {
>   unsigned int D.1246;
>   int D.1245;
>   int D.1244;
> 
> <bb 0>:
>   D.1244_2 = x_1->code;
>   D.1245_3 = D.1244_2 - 6;
>   D.1246_4 = (unsigned int) D.1245_3;
>   if (D.1246_4 <= 1) goto <L0>; else goto <L3>;
> 
> <L0>:;
>   D.1244_5 = D.1244_2;
>   if (D.1244_5 == 7) goto <L1>; else goto <L3>;
> 
> <L1>:;
>   D.1244_6 = D.1244_2;
>   if (D.1244_6 != 7) goto <L2>; else goto <L3>;
> 
> <L2>:;
>   abort ();
> 
> <L3>:;
>   return;
> 
> }
> 
> 
> The above fragment is taken from the .vrp dump.  Look carefully at the
> conditionals after L0 and L1.  It's not terribly difficult to see
> that the conditional after <L1> is totally unnecessary.  However, it
> will not be removed by VRP.
> 
> I can make an educated guess that because the SSA_NAMEs in the
> conditionals have no other use we never insert any ASSERT_EXPRs
> for those SSA_NAMEs and we never make a real attempt to eliminate
> the second conditional.  Basically the copies hide the fact that we
> have a set of objects which are equivalent and a use of one member
> in the set should be viewed as using the set's representative member.
> 
> 
> Unfortunately, the code to build equivalency sets runs after we have
> already inserted our ASSERT_EXPRs.  So we can't just use that code to
> handle this case for us.
> 
> Luckily we can run copy-propagation to just before VRP to clean this
> stuff up.
> 
> If we take that approach we'll have the following VRP dump after
> inserting ASSERT_EXPRs:
> 
> nonlocal_mentioned_p (x)
> {
>   unsigned int D.1246;
>   int D.1245;
>   int D.1244;
> 
> <bb 0>:
>   D.1244_2 = x_1->code;
>   D.1245_3 = D.1244_2 - 6;
>   D.1246_4 = (unsigned int) D.1245_3;
>   if (D.1246_4 <= 1) goto <L0>; else goto <L3>;
> 
> <L0>:;
>   D.1244_5 = D.1244_2;
>   if (D.1244_2 == 7) goto <L1>; else goto <L3>;
> 
> <L1>:;
>   D.1244_8 = ASSERT_EXPR <D.1244_2, D.1244_2 == 7>;
>   D.1244_6 = D.1244_8;
>   if (D.1244_8 != 7) goto <L2>; else goto <L3>;
> 
> <L2>:;
>   abort ();
> 
> <L3>:;
>   return;
> 
> }
> 
> 
> 
> Now we have the ASSERT_EXPR we want after <L1> and we will
> have recorded the proper equivalences to allow us to optimize
> away the last conditional.
> 
> 
> While DOM would trivially catch this case, it seems to me that it would
> be better to catch it in VRP.  After all, this is precisely the kind of
> thing we want VRP to optimize for us.
> 
> Across my set of old cc1 .i files, we increase the number of
> conditionals which VRP can evaluate at compile time from ~6500
> to just over 7000 and we get a reduction in the number of 
> conditionals that DOM evaluates at compile time.
> 
> 
> The simplest approach is to just add another copy-prop pass before VRP;
> however, that approach will result in a compile-time hit on the order 
> of about 1%.  Not good.
> 
> Another approach would be to move the existing copy-prop pass which runs
> after VRP to just before VRP.  That ought to be compile-time neutral or
> slightly beneficial.  Unfortunately, this approach will leave in copies
> introduced by VRP when it eliminates ASSERT_EXPRs.  Those copies get in
> the way of later optimizers and ultimately cause regressions (failures
> to optimize in particular).
> 
> There is a third approach (and the one I think works best).  First we
> move the copy-prop pass which currently runs after VRP so that it runs
> just before VRP.  We then get more intelligent about eliminating 
> ASSERT_EXPRs -- it's pretty trivial to eliminate those copies as they're
> created rather than run the full copy-propagation pass.    This approach
> gets us the benefit of removing more conditionals in VRP, actually
> *improves* compile time slightly and doesn't leave copies in the IL
> stream to confuse later optimizers.

Jeff,
t looks like your change caused roughly 5 point SPECint
regression  in baseline that seems out of noise factor (it might make
sense to watch another run to be done this afternoon, see
http://www.suse.de/~aj/SPEC/amd64/CINT/sandbox-britten/recent.html there
are interesting jumps in peak but I am not sure if they seem trustable)
Would be possible to check this?

Honza



More information about the Gcc-patches mailing list