[PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.

Tue Sep 3 08:23:00 GMT 2019

On Mon, 2 Sep 2019, Barnaby Wilks wrote:

> Hello,
> 
> This patch introduces an optimization for narrowing binary and builtin
> math operations to the smallest type when unsafe math optimizations are
> enabled (typically -Ofast or -ffast-math).
> 
> Consider the example:
> 
>    float f (float x) {
>      return 1.0 / sqrt (x);
>    }
> 
>    f:
>      fcvt	d0, s0
>      fmov	d1, 1.0e+0
>      fsqrt	d0, d0
>      fdiv	d0, d1, d0
>      fcvt	s0, d0
>      ret
> 
> Given that all outputs are of float type, we can do the whole 
> calculation in single precision and avoid any potentially expensive 
> conversions between single and double precision.
> 
> Aka the expression would end up looking more like
> 
>    float f (float x) {
>      return 1.0f / sqrtf (x);
>    }
> 
>    f:
>      fsqrt	s0, s0
>      fmov	s1, 1.0e+0
>      fdiv	s0, s1, s0
>      ret
> 
> This optimization will narrow casts around math builtins, and also
> not try to find the widest type for calculations when processing binary
> math operations (if unsafe math optimizations are enable).
> 
> Added tests to verify that narrower math builtins are chosen and
> no unnecessary casts are introduced when appropriate.
> 
> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
> 
> I don't have write access, so if OK for trunk then can someone commit on 
> my behalf?

@@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
              && newtype == type
              && types_match (newtype, type))
            (op (convert:newtype @1) (convert:newtype @2))
-           (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
+           (with
+             {
+               if (!flag_unsafe_math_optimizations)
+                 {
+                   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
                      newtype = ty1;
+
                    if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
-                     newtype = ty2; }
+                     newtype = ty2;
+                 }
+             }
+
               /* Sometimes this transformation is safe (cannot
                  change results through affecting double rounding
                  cases) and sometimes it is not.  If NEWTYPE is

The ChangeLog doesn't mention this change and I wonder what it is
for - later flag_unsafe_math_optimizations is checked, in particular

                   && (flag_unsafe_math_optimizations
                       || (TYPE_PRECISION (newtype) == TYPE_PRECISION 
(type)
                           && real_can_shorten_arithmetic (TYPE_MODE 
(itype),
                                                           TYPE_MODE 
(type))
                           && !excess_precision_type (newtype)))

note the !excess_precision_type (newtype) which you fail to check
below.


@@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
  @0)
+
+/* Convert expressions of the form
+   (x) math_call1 ((y) z) where (x) and z are the same type, into
+   math_call2 (z), where math_call2 is the math builtin for
+   type x.  Type x (and therefore type of z) must be a lower precision
+   than y/math_call1.  */
+(if (flag_unsafe_math_optimizations && !flag_errno_math)
+  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
+          POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
+          ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
+          LOG1P SIN TAN TANH SQRT FABS LOGB)
+    (simplify
+      (convert (op@0 (convert@1 @2)))
+       (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
(@1))
+             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
+             && types_match (type, TREE_TYPE (@2))
+             && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
+         (with { enum built_in_function fcode = builtin_mathfn_code (@0);
+                 tree fn = mathfn_built_in (type, fcode, false); }
+           (if (fn)
+             (convert { build_call_expr (fn, 1, @2); })))))))

This (convert { build_call_expr (..) } ) only works on GENERIC.
I also wonder why you needed the mathfn_built_in change.

If you look at other examples in match.pd you'd see you should have
used sth like

 (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
      opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
   (simplify
...
      (if (types_match (type, float_type_node))
        (opf @2)))

and you have to repeat this for the COSHL (long double) case
with appropriate opd and opf lists.  In theory, if we'd extend
genmatch to 'transform' builtin function kinds that could be
done prettier like for example with

 (for op (COSH EXP ...)
  (simplify
...
   (op:type @2))

which I'd kind-of like.  Note it's not as simple as passing
'type' to mathfn_built_in since that expects literal
double_type_node and friends but we could use a {gimple,generic}-match.c
private helper for that.

Now - as a general comment I think adding this kind of narrowing is
good but doing it via match.pd patterns is quite limiting - eventually
the backprop pass would be a fit for propagating "needed precision"
and narrowing feeding stmts accordingly in a more general way?
Richard can probably tell quickest if it is feasible in that framework.

Thanks,
Richard.


> Regards,
> Barney
> 
> gcc/ChangeLog:
> 
> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> 
> 	* builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
> 	* builtins.h (mathfn_built_in): Likewise.
> 	* match.pd: Add expressions for simplifying builtin and binary
> 	math expressions.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> 
> 	* gcc.dg/fold-single-precision.c: New test.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix ImendÃ¶rffer; HRB 247165 (AG MÃ¼nchen)