This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Extend tree-call-cdce to calls whose result is used
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Michael Matz <matz at suse dot de>
- Cc: <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 09 Nov 2015 18:02:32 +0000
- Subject: Re: Extend tree-call-cdce to calls whose result is used
- Authentication-results: sourceware.org; auth=none
- References: <874mgyorwm dot fsf at e105548-lin dot cambridge dot arm dot com> <alpine dot LSU dot 2 dot 20 dot 1511091509260 dot 11029 at wotan dot suse dot de> <87r3jznqvg dot fsf at e105548-lin dot cambridge dot arm dot com> <alpine dot LSU dot 2 dot 20 dot 1511091652400 dot 11029 at wotan dot suse dot de>
Michael Matz <matz@suse.de> writes:
> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>
>> -ffast-math would already cause us to treat the function as not setting
>> errno, so the code wouldn't be used.
>
> What is "the code"? I don't see any checking of the relevant flags in
> tree-call-cdce.c, so I wonder what would prevent the addition of the
> unnecessary checking.
-ffast-math implies -fno-errno-math, which in turn changes how the
function attributes are set. E.g.:
#undef ATTR_MATHFN_FPROUNDING_ERRNO
#define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)
So with -ffast-math these functions don't set errno and don't have a vdef.
The patch checks for that here:
+/* Return true if built-in function call CALL could be implemented using
+ a combination of an internal function to compute the result and a
+ separate call to set errno. */
+
+static bool
+can_use_internal_fn (gcall *call)
+{
+ /* Only replace calls that set errno. */
+ if (!gimple_vdef (call))
+ return false;
Checking for a vdef seemed reasonable. If we treat these libm functions
as doing nothing other than producing a numerical result then we'll get
better optimisation across the board if we don't add unnecessary vops.
(Which we do, but see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68235 )
>> > The pass is somewhat expensive in that it removes dominator info and
>> > schedules a full ssa update. The transformation is trivial enough
>> > that dominators and SSA form can be updated on the fly, I think
>> > without that it's not feasible for -O.
>>
>> r229916 fixed that for the non-EH case.
>
> Ah, missed it. Even the EH case shouldn't be difficult. If the original
> dominator of the EH destination was the call block it moves, otherwise it
> remains unchanged.
The target of the edge is easy in itself, I agree, but that isn't
necessarily the only affected block, if the EH handler doesn't
exit or rethrow.
>> I posted a patch to update the vops for the non-EH case as well:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html
>
> I see, here the EH case is a bit more difficult as you need to
> differentiate between VOP uses in the EH and the non-EH region, but not
> insurmountable.
Well, I agree it's not insurmountable. :-)
>> But by "builtin folding", do you mean fold_builtin_n etc.?
>
> I had the pass_fold_builtins in mind.
OK.
> Currently we have quite some of such passes (reassoc, forwprop,
> lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and
> others), but they are all handling only special situations in one way or
> the other. pass_fold_builtins is another one, but it seems most related
> to what you want (replacing a call with something else), so I thought
> that'd be the natural choice.
Well, to be pedantic, it's not really replacing the call. Except for
the special case of targets that support direct assignments to errno,
it keeps the original call but ensures that it isn't usually executed.
From that point of view it doesn't really seem like a fold.
But I suppose that's just naming again :-). And it's easily solved with
s/fold/rewrite/.
> call_cdce is also such a pass, but I think it's simply not the appropriate
> one (only in so far as its source file contains the helper routines you
> need), and in addition I think it shouldn't exist at all (and wouldn't
> need to if it had been part of DCE from the start, or if you implemented
> the conditionalizing as part of another pass). Hey, you could be one to
> remove a pass! ;-)
It still seems a bit artificial to me to say that the transformation
with a null lhs is "DCE enough" to go in the main DCE pass (even
though like I say it doesn't actually eliminate any code from the IR,
it just adds more code) and should be kept in a separate pass from
the one that does the transformation on a non-null lhs.
Thanks,
Richard