Extend tree-call-cdce to calls whose result is used
Mon Nov 9 17:02:00 GMT 2015
On Mon, 9 Nov 2015, Richard Sandiford wrote:
> -ffast-math would already cause us to treat the function as not setting
> errno, so the code wouldn't be used.
What is "the code"? I don't see any checking of the relevant flags in
tree-call-cdce.c, so I wonder what would prevent the addition of the
unnecessary checking. It's clear that such calls with unused results will
be removed by normal DCE without errno (so cdce doesn't have anything to
worry about, and hence doesn't add any conditions). But you're adding
handling of calls with _used_ result, i.e. the call itself will stay, and
you're adding some conditions around this, like:
_1 = SQRT(x);
_2 = sqrt(x);
res_3 = phi(_1, _2);
My question was, what transforms this into
res_3 = SQRT(x);
under fast-math? Before your patch this was done by removing the y!=y
test eventually, but with your patch?
> > This somehow feels wrong. Dead-code elimination doesn't have anything
> > to do with the transformation you want, it rather is rewriting all
> > feasible calls into something else, like fold_builtins does. Also
> > cdce currently doesn't seem to do any checks on the fast-math flags,
> > so I wonder if some of the conditions that you now also insert for
> > calls whose results are used stay until final code.
> Isn't that just a question of what you call the current pass though?
The name definitely was always suspect, but I really think the pass itself
is the problem, it should have been part of dce itself from the beginning.
Adding more code to it that doesn't even have anything to do with the
original purpose moves us away from the ideal situation IMHO. We do have
existing places that rewrite calls into something else, so instead of
another pass over all instructions, why not extend those places (and then
remove the cdce pass)?
> > The pass is somewhat expensive in that it removes dominator info and
> > schedules a full ssa update. The transformation is trivial enough
> > that dominators and SSA form can be updated on the fly, I think
> > without that it's not feasible for -O.
> r229916 fixed that for the non-EH case.
Ah, missed it. Even the EH case shouldn't be difficult. If the original
dominator of the EH destination was the call block it moves, otherwise it
> I posted a patch to update the vops for the non-EH case as well:
I see, here the EH case is a bit more difficult as you need to
differentiate between VOP uses in the EH and the non-EH region, but not
> > But as said I think this transformation should better be moved into
> > builtin folding (or other call folding), at which point also the
> > fast-math flags can be checked. The infrastructure routines of
> > tree-call-cdce can be used there of course. If so moved the cdce pass
> > would be subsumed by that btw. (because the dead call result will be
> > trivially exposed), and that would be a good thing.
> Yeah, that was why I added the code to the cdce pass. I don't think it
> makes sense to have one pass that adds range checks for calls whose lhs
> is unused and a separate pass for calls whose lhs isn't used.
Certainly not. But it does make sense to add the range checks
unconditionally in some rewriting pass, thereby obsoleting the cdce pass.
> But by "builtin folding", do you mean fold_builtin_n etc.?
I had the pass_fold_builtins in mind.
> I thought the point was that we were trying to move away from relying on
> folding at that level.
If you mean in the sense of the various fold_builtin_n calls, yes. But I
don't see us moving to a state where we don't have _any_ specific pass
that does a generic replace-these-patterns-with-those-patterns work, IOW I
don't think we should make it so that everything is subsumed by the
match.pd method. For instance it doesn't make sense to diddle with the
stdarg builtins before a specific time, and after diddling it doesn't make
sense anymore to check for them, so putting this into the match.pd work
would merely do useless work before that time. Also match.pd (currently)
isn't too happy with generating control flow. And sometimes it simply
makes sense to open-code the replacement sequences. So, whereever we end
up being, I think we always will have some pass going over all BBs/insns
and pattern matching stuff.
Currently we have quite some of such passes (reassoc, forwprop,
lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and
others), but they are all handling only special situations in one way or
the other. pass_fold_builtins is another one, but it seems most related
to what you want (replacing a call with something else), so I thought
that'd be the natural choice.
call_cdce is also such a pass, but I think it's simply not the appropriate
one (only in so far as its source file contains the helper routines you
need), and in addition I think it shouldn't exist at all (and wouldn't
need to if it had been part of DCE from the start, or if you implemented
the conditionalizing as part of another pass). Hey, you could be one to
remove a pass! ;-)
More information about the Gcc-patches