[PATCH] Optimise sqrt reciprocal multiplications

Thu Aug 23 17:09:00 GMT 2018

Hi Richard,

On 23/08/18 11:13, Richard Sandiford wrote:
> Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
>> Hi all,
>>
>> This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under -freciprocal-math and -funsafe-math-optimizations.
>> In particular consider:
>>
>> x = 1.0 / sqrt (a);
>> r1 = x * x;  // same as 1.0 / a
>> r2 = a * x; // same as sqrt (a)
>>
>> If x, r1 and r2 are all used further on in the code, this can be transformed into:
>> tmp1 = 1.0 / a
>> tmp2 = sqrt (a)
>> tmp3 = tmp1 * tmp2
>> x = tmp3
>> r1 = tmp1
>> r2 = tmp2
> Nice optimisation :-)  Someone who knows the pass better should review,
> but:

Thanks for the review.

> There seems to be an implicit assumption that this is a win even
> when the r1 and r2 assignments are only conditionally executed.
> That's probably true, but it might be worth saying explicitly.

I'll admit I had not considered that case.
I think it won't make a difference in practice, as the really expensive operations here
are the sqrt and the division and they are on the executed path in either case and them
becoming independent should be a benefit of its own.

>> +/* Return TRUE if USE_STMT is a multiplication of DEF by A.  */
>> +
>> +static inline bool
>> +is_mult_by (gimple *use_stmt, tree def, tree a)
>> +{
>> +  if (gimple_code (use_stmt) == GIMPLE_ASSIGN
>> +      && gimple_assign_rhs_code (use_stmt) == MULT_EXPR)
>> +    {
>> +      tree op0 = gimple_assign_rhs1 (use_stmt);
>> +      tree op1 = gimple_assign_rhs2 (use_stmt);
>> +
>> +      return (op0 == def && op1 == a)
>> +	      || (op0 == a && op1 == def);
>> +    }
>> +  return 0;
>> +}
> Seems like is_square_of could now be a light-weight wrapper around this.

Indeed, I've done the wrapping now.

>> @@ -652,6 +669,180 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator *def_gsi, tree def)
>>     occ_head = NULL;
>>   }
>>   
>> +/* Transform sequences like
>> +   x = 1.0 / sqrt (a);
>> +   r1 = x * x;
>> +   r2 = a * x;
>> +   into:
>> +   tmp1 = 1.0 / a;
>> +   tmp2 = sqrt (a);
>> +   tmp3 = tmp1 * tmp2;
>> +   x = tmp3;
>> +   r1 = tmp1;
>> +   r2 = tmp2;
>> +   depending on the uses of x, r1, r2.  This removes one multiplication and
>> +   allows the sqrt and division operations to execute in parallel.
>> +   DEF_GSI is the gsi of the initial division by sqrt that defines
>> +   DEF (x in the example abovs).  */
>> +
>> +static void
>> +optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
>> +{
>> +  use_operand_p use_p;
>> +  imm_use_iterator use_iter;
>> +  gimple *stmt = gsi_stmt (*def_gsi);
>> +  tree x = def;
>> +  tree orig_sqrt_ssa_name = gimple_assign_rhs2 (stmt);
>> +  tree div_rhs1 = gimple_assign_rhs1 (stmt);
>> +
>> +  if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME
>> +      || TREE_CODE (div_rhs1) != REAL_CST
>> +      || !real_equal (&TREE_REAL_CST (div_rhs1), &dconst1))
>> +    return;
>> +
>> +  gimple *sqrt_stmt = SSA_NAME_DEF_STMT (orig_sqrt_ssa_name);
>> +  if (!is_gimple_call (sqrt_stmt)
>> +      || !gimple_call_lhs (sqrt_stmt))
>> +    return;
>> +
>> +  gcall *call = as_a <gcall *> (sqrt_stmt);
> Very minor, but:
>
>    gcall *sqrt_stmt
>      = dyn_cast <gcall *> (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name));
>    if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt))
>      return;
>
> would avoid the need for the separate as_a<>, and would mean that
> we only call gimple_call_* on gcalls.

Ok.

>> +  if (has_other_use)
>> +    {
>> +      /* Using the two temporaries tmp1, tmp2 from above
>> +	 the original x is now:
>> +	 x = tmp1 * tmp2.  */
>> +      gcc_assert (mult_ssa_name);
>> +      gcc_assert (sqr_ssa_name);
>> +      gimple_stmt_iterator gsi2 = gsi_for_stmt (stmt);
>> +
>> +      tree new_ssa_name
>> +	= make_temp_ssa_name (TREE_TYPE (a), NULL, "recip_sqrt_transformed");
>> +      gimple *new_stmt
>> +	= gimple_build_assign (new_ssa_name, MULT_EXPR,
>> +			       mult_ssa_name, sqr_ssa_name);
>> +      gsi_insert_before (&gsi2, new_stmt, GSI_SAME_STMT);
>> +      gcc_assert (gsi_stmt (gsi2) == stmt);
>> +      gimple_assign_set_rhs_from_tree (&gsi2, new_ssa_name);
>> +      fold_stmt (&gsi2);
>> +      update_stmt (stmt);
> In this case we're replacing the statement in its original position,
> so there's no real need to use a temporary.  It seems better to
> change the rhs_code, rhs1 and rhs2 of stmt in-place, with the same
> lhs as before.

Yes, that's cleaner.

>> @@ -762,6 +953,23 @@ pass_cse_reciprocals::execute (function *fun)
>>         if (optimize_bb_for_size_p (bb))
>>           continue;
> Seems unnecessary to skip the new optimisation when optimising for size.
> Like you say, it saves a multiplication overall.  Also:

Indeed.

>> +      if (flag_unsafe_math_optimizations)
>> +	{
>> +	  for (gimple_stmt_iterator gsi = gsi_after_labels (bb);
>> +	       !gsi_end_p (gsi);
>> +	       gsi_next (&gsi))
>> +	    {
>> +	      gimple *stmt = gsi_stmt (gsi);
>> +
>> +	      if (gimple_has_lhs (stmt)
>> +		  && (def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF)) != NULL
>> +		  && FLOAT_TYPE_P (TREE_TYPE (def))
>> +		  && TREE_CODE (def) == SSA_NAME
>> +		  && is_gimple_assign (stmt)
>> +		  && gimple_assign_rhs_code (stmt) == RDIV_EXPR)
>> +		optimize_recip_sqrt (&gsi, def);
>> +	    }
>> +	}
> It looks like this could safely be done in one of the existing walks
> (e.g. the execute_cse_reciprocals_1 one, if we do this when optimising
> for size).

You're right. I've moved this into one of the walks above this.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
CC'ing richi as he's reviewed these kinds of patches in the past.

Is this ok for trunk?

Thanks,
Kyrill

2018-08-23  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * tree-ssa-math-opts.c (is_mult_by): New function.
     (is_square_of): Use the above.
     (optimize_recip_sqrt): New function.
     (pass_cse_reciprocals::execute): Use the above.

2018-08-23  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.dg/recip_sqrt_mult_1.c: New test.
     * gcc.dg/recip_sqrt_mult_2.c: Likewise.
     * gcc.dg/recip_sqrt_mult_3.c: Likewise.

> Thanks,
> Richard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: recip-sqrt-mult.patch
Type: text/x-patch
Size: 9031 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180823/bb6d9dc5/attachment.bin>