Extend tree-call-cdce to calls whose result is used

Michael Matz matz@suse.de
Mon Nov 9 17:02:00 GMT 2015


Hi,

On Mon, 9 Nov 2015, Richard Sandiford wrote:

> -ffast-math would already cause us to treat the function as not setting 
> errno, so the code wouldn't be used.

What is "the code"?  I don't see any checking of the relevant flags in 
tree-call-cdce.c, so I wonder what would prevent the addition of the 
unnecessary checking.  It's clear that such calls with unused results will 
be removed by normal DCE without errno (so cdce doesn't have anything to 
worry about, and hence doesn't add any conditions).  But you're adding 
handling of calls with _used_ result, i.e. the call itself will stay, and 
you're adding some conditions around this, like:

  if (x>=0)
    _1 = SQRT(x);
  else
    _2 = sqrt(x);
  res_3 = phi(_1, _2);

My question was, what transforms this into

  res_3 = SQRT(x);

under fast-math?  Before your patch this was done by removing the y!=y 
test eventually, but with your patch?

> > This somehow feels wrong.  Dead-code elimination doesn't have anything 
> > to do with the transformation you want, it rather is rewriting all 
> > feasible calls into something else, like fold_builtins does.  Also 
> > cdce currently doesn't seem to do any checks on the fast-math flags, 
> > so I wonder if some of the conditions that you now also insert for 
> > calls whose results are used stay until final code.
> 
> Isn't that just a question of what you call the current pass though?

The name definitely was always suspect, but I really think the pass itself 
is the problem, it should have been part of dce itself from the beginning.  
Adding more code to it that doesn't even have anything to do with the 
original purpose moves us away from the ideal situation IMHO.  We do have 
existing places that rewrite calls into something else, so instead of 
another pass over all instructions, why not extend those places (and then 
remove the cdce pass)?

> > The pass is somewhat expensive in that it removes dominator info and 
> > schedules a full ssa update.  The transformation is trivial enough 
> > that dominators and SSA form can be updated on the fly, I think 
> > without that it's not feasible for -O.
> 
> r229916 fixed that for the non-EH case.

Ah, missed it.  Even the EH case shouldn't be difficult.  If the original 
dominator of the EH destination was the call block it moves, otherwise it 
remains unchanged.

> I posted a patch to update the vops for the non-EH case as well:
> 
>     https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html

I see, here the EH case is a bit more difficult as you need to 
differentiate between VOP uses in the EH and the non-EH region, but not 
insurmountable.

> > But as said I think this transformation should better be moved into 
> > builtin folding (or other call folding), at which point also the 
> > fast-math flags can be checked.  The infrastructure routines of 
> > tree-call-cdce can be used there of course.  If so moved the cdce pass 
> > would be subsumed by that btw. (because the dead call result will be 
> > trivially exposed), and that would be a good thing.
> 
> Yeah, that was why I added the code to the cdce pass.  I don't think it 
> makes sense to have one pass that adds range checks for calls whose lhs 
> is unused and a separate pass for calls whose lhs isn't used.

Certainly not.  But it does make sense to add the range checks 
unconditionally in some rewriting pass, thereby obsoleting the cdce pass.

> But by "builtin folding", do you mean fold_builtin_n etc.?

I had the pass_fold_builtins in mind.

> I thought the point was that we were trying to move away from relying on 
> folding at that level.

If you mean in the sense of the various fold_builtin_n calls, yes.  But I 
don't see us moving to a state where we don't have _any_ specific pass 
that does a generic replace-these-patterns-with-those-patterns work, IOW I 
don't think we should make it so that everything is subsumed by the 
match.pd method.  For instance it doesn't make sense to diddle with the 
stdarg builtins before a specific time, and after diddling it doesn't make 
sense anymore to check for them, so putting this into the match.pd work 
would merely do useless work before that time.  Also match.pd (currently) 
isn't too happy with generating control flow.  And sometimes it simply 
makes sense to open-code the replacement sequences.  So, whereever we end 
up being, I think we always will have some pass going over all BBs/insns 
and pattern matching stuff.

Currently we have quite some of such passes (reassoc, forwprop, 
lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and 
others), but they are all handling only special situations in one way or 
the other.  pass_fold_builtins is another one, but it seems most related 
to what you want (replacing a call with something else), so I thought 
that'd be the natural choice.

call_cdce is also such a pass, but I think it's simply not the appropriate 
one (only in so far as its source file contains the helper routines you 
need), and in addition I think it shouldn't exist at all (and wouldn't 
need to if it had been part of DCE from the start, or if you implemented 
the conditionalizing as part of another pass).  Hey, you could be one to 
remove a pass! ;-)


Ciao,
Michael.



More information about the Gcc-patches mailing list