This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.



On 9/3/19 9:23 AM, Richard Biener wrote:
> On Mon, 2 Sep 2019, Barnaby Wilks wrote:
> 
>> Hello,
>>
>> This patch introduces an optimization for narrowing binary and builtin
>> math operations to the smallest type when unsafe math optimizations are
>> enabled (typically -Ofast or -ffast-math).
>>
>> Consider the example:
>>
>>     float f (float x) {
>>       return 1.0 / sqrt (x);
>>     }
>>
>>     f:
>>       fcvt	d0, s0
>>       fmov	d1, 1.0e+0
>>       fsqrt	d0, d0
>>       fdiv	d0, d1, d0
>>       fcvt	s0, d0
>>       ret
>>
>> Given that all outputs are of float type, we can do the whole
>> calculation in single precision and avoid any potentially expensive
>> conversions between single and double precision.
>>
>> Aka the expression would end up looking more like
>>
>>     float f (float x) {
>>       return 1.0f / sqrtf (x);
>>     }
>>
>>     f:
>>       fsqrt	s0, s0
>>       fmov	s1, 1.0e+0
>>       fdiv	s0, s1, s0
>>       ret
>>
>> This optimization will narrow casts around math builtins, and also
>> not try to find the widest type for calculations when processing binary
>> math operations (if unsafe math optimizations are enable).
>>
>> Added tests to verify that narrower math builtins are chosen and
>> no unnecessary casts are introduced when appropriate.
>>
>> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
>>
>> I don't have write access, so if OK for trunk then can someone commit on
>> my behalf?
> 
> @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>                && newtype == type
>                && types_match (newtype, type))
>              (op (convert:newtype @1) (convert:newtype @2))
> -           (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> +           (with
> +             {
> +               if (!flag_unsafe_math_optimizations)
> +                 {
> +                   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
>                        newtype = ty1;
> +
>                      if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
> -                     newtype = ty2; }
> +                     newtype = ty2;
> +                 }
> +             }
> +
>                 /* Sometimes this transformation is safe (cannot
>                    change results through affecting double rounding
>                    cases) and sometimes it is not.  If NEWTYPE is
> 
> The ChangeLog doesn't mention this change and I wonder what it is
> for - later flag_unsafe_math_optimizations is checked, in particular
> 
>                     && (flag_unsafe_math_optimizations
>                         || (TYPE_PRECISION (newtype) == TYPE_PRECISION
> (type)
>                             && real_can_shorten_arithmetic (TYPE_MODE
> (itype),
>                                                             TYPE_MODE
> (type))
>                             && !excess_precision_type (newtype)))
> 
> note the !excess_precision_type (newtype) which you fail to check
> below.

This change prevents the pattern from casting the operands to the widest 
type (the widest between the type of the operands and the type of the 
expression as a whole) if unsafe math optimizations are enabled.
Whereas the second check of flag_unsafe_math_optimizations is a shortcut 
to enable the transformation as a whole, and does not affect the type 
that the operands are being cast to.

Without the first check then the expression will always use the widest 
type, resulting in unnecessary casts. For example

   float f (float x, float y) {
     double z = 1.0 / x;
     return z * y;
   }

Will generate (With -Ofast)

   float D.3459;
   double z;

   _1 = (double) x;
   z = 1.0e+0 / _1;
   _2 = (double) y;
   _3 = z * _2;
   D.3459 = (float) _3;
   return D.3459;

Note how the parameters are cast to doubles, the whole calculation is 
done in double precision and then cast out to single precision at the 
end. (because double is the widest type in the expression)

Whereas if you include the first flag_unsafe_math_optimizations check, 
and prevent the widening then you get

   float D.3459;
   double z;

   _1 = (double) x;
   z = 1.0e+0 / _1;
   _2 = (float) z;
   D.3459 = y * _2;
   return D.3459;

Where only "double z = 1.0 / x" happens in double precision, and the 
rest of the calculation is done in single precision, reducing the amount 
of casts.

The benefits here can be seen in the generated code:
Without the flag_unsafe_math_optimizations check

         fcvt    d1, s1
         fcvt    d0, s0
         fdiv    d0, d1, d0
         fcvt    s0, d0
         ret

With the check

         fdiv    s0, s1, s0
         ret

> 
> @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>    (vec_perm vec_same_elem_p@0 @0 @1)
>    @0)
> +
> +/* Convert expressions of the form
> +   (x) math_call1 ((y) z) where (x) and z are the same type, into
> +   math_call2 (z), where math_call2 is the math builtin for
> +   type x.  Type x (and therefore type of z) must be a lower precision
> +   than y/math_call1.  */
> +(if (flag_unsafe_math_optimizations && !flag_errno_math)
> +  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
> +          POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
> +          ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
> +          LOG1P SIN TAN TANH SQRT FABS LOGB)
> +    (simplify
> +      (convert (op@0 (convert@1 @2)))
> +       (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
> (@1))
> +             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> +             && types_match (type, TREE_TYPE (@2))
> +             && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> +         (with { enum built_in_function fcode = builtin_mathfn_code (@0);
> +                 tree fn = mathfn_built_in (type, fcode, false); }
> +           (if (fn)
> +             (convert { build_call_expr (fn, 1, @2); })))))))
> 
> This (convert { build_call_expr (..) } ) only works on GENERIC.

I didn't realize that build_call_expr only works on GENERIC, I took
that snippet from convert_to_real_1 in convert.c, which does a similar 
thing, but is not very generic and wont cover all the cases.
I suppose this doesn't matter if the transformation is being moved out 
of match.pd into the backprop pass, or if not is there a more generic 
(if you excuse the pun) way to create function call nodes for GIMPLE & 
GENERIC?

> I also wonder why you needed the mathfn_built_in change.

Because some builtins are not marked as implicit - or more specifically 
reserved in C90 and actually specified in C99.
To be honest, I wasn't really happy with doing this, as I'm not sure of 
it's implications, so if you have a better way to do this then that 
would be much appreciated?
 From what I can tell its the float versions of the math builtins that 
are not implicit, and these are the functions that are most commonly 
needed to narrow to.

> If you look at other examples in match.pd you'd see you should have
> used sth like
> 
>   (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
>        opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
>     (simplify
> ...
>        (if (types_match (type, float_type_node))
>          (opf @2)))
> 
> and you have to repeat this for the COSHL (long double) case
> with appropriate opd and opf lists.  In theory, if we'd extend
> genmatch to 'transform' builtin function kinds that could be
> done prettier like for example with

Why would this need to be the case? The code already does practially the 
same thing by matching on all the builtins and then transforming down to 
the narrowest type with the builtin_mathfn_code/mathfn_built_in combination.

>   (for op (COSH EXP ...)
>    (simplify
> ...
>     (op:type @2))
> 
> which I'd kind-of like.  Note it's not as simple as passing
> 'type' to mathfn_built_in since that expects literal
> double_type_node and friends but we could use a {gimple,generic}-match.c
> private helper for that.

Would "type" not be a double_type_node (or related) literal already?
If mathfn_built_in does not recognise the given type then it will just 
spit out NULL_TREE, which is checked by:

   +                 tree fn = mathfn_built_in (type, fcode, false); }
   +           (if (fn)
   +             (convert { build_call_expr (fn, 1, @2); })))))))


Apologies if any of these seem obvious questions - I'm quite new to GCC 
internals.

Regards,
Barney

> Now - as a general comment I think adding this kind of narrowing is
> good but doing it via match.pd patterns is quite limiting - eventually
> the backprop pass would be a fit for propagating "needed precision"
> and narrowing feeding stmts accordingly in a more general way?
> Richard can probably tell quickest if it is feasible in that framework.
> 
> Thanks,
> Richard.
> 
> 
>> Regards,
>> Barney
>>
>> gcc/ChangeLog:
>>
>> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
>>
>> 	* builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
>> 	* builtins.h (mathfn_built_in): Likewise.
>> 	* match.pd: Add expressions for simplifying builtin and binary
>> 	math expressions.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
>>
>> 	* gcc.dg/fold-single-precision.c: New test.
>>
> 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]