On 23/08/18 11:13, Richard Sandiford wrote:
Kyrill Tkachov <email@example.com> writes:
This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under -freciprocal-math and -funsafe-math-optimizations.
In particular consider:
x = 1.0 / sqrt (a);
r1 = x * x; // same as 1.0 / a
r2 = a * x; // same as sqrt (a)
If x, r1 and r2 are all used further on in the code, this can be transformed into:
tmp1 = 1.0 / a
tmp2 = sqrt (a)
tmp3 = tmp1 * tmp2
x = tmp3
r1 = tmp1
r2 = tmp2
Nice optimisation :-) Someone who knows the pass better should review,
Thanks for the review.
There seems to be an implicit assumption that this is a win even
when the r1 and r2 assignments are only conditionally executed.
That's probably true, but it might be worth saying explicitly.
I'll admit I had not considered that case.
I think it won't make a difference in practice, as the really expensive operations here
are the sqrt and the division and they are on the executed path in either case and them
becoming independent should be a benefit of its own.
+/* Return TRUE if USE_STMT is a multiplication of DEF by A. */
+static inline bool
+is_mult_by (gimple *use_stmt, tree def, tree a)
+ if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+ && gimple_assign_rhs_code (use_stmt) == MULT_EXPR)
+ tree op0 = gimple_assign_rhs1 (use_stmt);
+ tree op1 = gimple_assign_rhs2 (use_stmt);
+ return (op0 == def && op1 == a)
+ || (op0 == a && op1 == def);
+ return 0;
Seems like is_square_of could now be a light-weight wrapper around this.
Indeed, I've done the wrapping now.
@@ -652,6 +669,180 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator *def_gsi, tree def)
occ_head = NULL;
+/* Transform sequences like
+ x = 1.0 / sqrt (a);
+ r1 = x * x;
+ r2 = a * x;
+ tmp1 = 1.0 / a;
+ tmp2 = sqrt (a);
+ tmp3 = tmp1 * tmp2;
+ x = tmp3;
+ r1 = tmp1;
+ r2 = tmp2;
+ depending on the uses of x, r1, r2. This removes one multiplication and
+ allows the sqrt and division operations to execute in parallel.
+ DEF_GSI is the gsi of the initial division by sqrt that defines
+ DEF (x in the example abovs). */
+optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
+ use_operand_p use_p;
+ imm_use_iterator use_iter;
+ gimple *stmt = gsi_stmt (*def_gsi);
+ tree x = def;
+ tree orig_sqrt_ssa_name = gimple_assign_rhs2 (stmt);
+ tree div_rhs1 = gimple_assign_rhs1 (stmt);
+ if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME
+ || TREE_CODE (div_rhs1) != REAL_CST
+ || !real_equal (&TREE_REAL_CST (div_rhs1), &dconst1))
+ gimple *sqrt_stmt = SSA_NAME_DEF_STMT (orig_sqrt_ssa_name);
+ if (!is_gimple_call (sqrt_stmt)
+ || !gimple_call_lhs (sqrt_stmt))
+ gcall *call = as_a <gcall *> (sqrt_stmt);
Very minor, but:
= dyn_cast <gcall *> (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name));
if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt))
would avoid the need for the separate as_a<>, and would mean that
we only call gimple_call_* on gcalls.
+ if (has_other_use)
+ /* Using the two temporaries tmp1, tmp2 from above
+ the original x is now:
+ x = tmp1 * tmp2. */
+ gcc_assert (mult_ssa_name);
+ gcc_assert (sqr_ssa_name);
+ gimple_stmt_iterator gsi2 = gsi_for_stmt (stmt);
+ tree new_ssa_name
+ = make_temp_ssa_name (TREE_TYPE (a), NULL, "recip_sqrt_transformed");
+ gimple *new_stmt
+ = gimple_build_assign (new_ssa_name, MULT_EXPR,
+ mult_ssa_name, sqr_ssa_name);
+ gsi_insert_before (&gsi2, new_stmt, GSI_SAME_STMT);
+ gcc_assert (gsi_stmt (gsi2) == stmt);
+ gimple_assign_set_rhs_from_tree (&gsi2, new_ssa_name);
+ fold_stmt (&gsi2);
+ update_stmt (stmt);
In this case we're replacing the statement in its original position,
so there's no real need to use a temporary. It seems better to
change the rhs_code, rhs1 and rhs2 of stmt in-place, with the same
lhs as before.
Yes, that's cleaner.
@@ -762,6 +953,23 @@ pass_cse_reciprocals::execute (function *fun)
if (optimize_bb_for_size_p (bb))
Seems unnecessary to skip the new optimisation when optimising for size.
Like you say, it saves a multiplication overall. Also:
+ if (flag_unsafe_math_optimizations)
+ for (gimple_stmt_iterator gsi = gsi_after_labels (bb);
+ !gsi_end_p (gsi);
+ gsi_next (&gsi))
+ gimple *stmt = gsi_stmt (gsi);
+ if (gimple_has_lhs (stmt)
+ && (def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF)) != NULL
+ && FLOAT_TYPE_P (TREE_TYPE (def))
+ && TREE_CODE (def) == SSA_NAME
+ && is_gimple_assign (stmt)
+ && gimple_assign_rhs_code (stmt) == RDIV_EXPR)
+ optimize_recip_sqrt (&gsi, def);
It looks like this could safely be done in one of the existing walks
(e.g. the execute_cse_reciprocals_1 one, if we do this when optimising
You're right. I've moved this into one of the walks above this.
Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
CC'ing richi as he's reviewed these kinds of patches in the past.
Is this ok for trunk?
2018-08-23 Kyrylo Tkachov <firstname.lastname@example.org>
* tree-ssa-math-opts.c (is_mult_by): New function.
(is_square_of): Use the above.
(optimize_recip_sqrt): New function.
(pass_cse_reciprocals::execute): Use the above.
2018-08-23 Kyrylo Tkachov <email@example.com>
* gcc.dg/recip_sqrt_mult_1.c: New test.
* gcc.dg/recip_sqrt_mult_2.c: Likewise.
* gcc.dg/recip_sqrt_mult_3.c: Likewise.