This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
- From: Barnaby Wilks <Barnaby dot Wilks at arm dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, "law at redhat dot com" <law at redhat dot com>, "ian at airs dot com" <ian at airs dot com>, Tamar Christina <Tamar dot Christina at arm dot com>, Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, Richard Sandiford <Richard dot Sandiford at arm dot com>
- Date: Tue, 3 Sep 2019 15:22:50 +0000
- Subject: Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3sDEZ3n0mmv732u9a1MRdvBBVWfi2eBOxmulIAJEU24=; b=noDBhxiCVRgl1Tam3zz4Fc3AMDhQ8ph5kJnjUsc8gcyqaZz1r954kDCOma2lYkGJ2nj6T7z50CZNt6fMYN5mlf3Y9iu58wM88fokUB/b8cjV8Liunj0HrB3sgqY29nKmDjZNaMcSeSw3xfo6J591hcnbCe2EDr6QOa4ZCEaa6oa/v2A74XyLRE6sMdMvodJczgpx5U1x6qsSGhzZ7gyhort89j25tevJueAji02xYrdNkqmKysno9UKvZN+LW+SLxQKbhxs4DHa4JnrFzUZ2/SIqYtQ4jZQwJVr+uNxgSYsn5q3Z+N8Ogj18vx/OadqhJl5bT9a/uTFqTJINUBOt8g==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LBeH3M3XUYbDQDMnyr2Jw1xFbxb0zqrly5KaP29QInIDDWKuvZm1DlE9seMpmiVVlyiKv8oaRXP3VLX6iMGCjmWYqqrVPeNtIK5QXDH4RuqDn6lv6fmr1mcRY3Xg8cnZhqF3ieg5NN7Tw+lntPsSJbhrELQrI0dTr3j61MkjZaygmU2HXC18PU5UpXqboxXoGswP4GSqIky54K+Ej6P/NQOxjm5+r5MOaj13FUf2UHPxmQOU8l/ClSqTUgTs+O4luX37O4Rk/nblsWjAov8RRUgCmcBT6EFwn0/TpQAg4HkieIDtw0wweyk5fzIgc23tnf0czs/KlME8i1GJJM9VxA==
- Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Barnaby dot Wilks at arm dot com;
- References: <571395fe-921b-5a68-ec8d-84850a732253@arm.com> <alpine.LSU.2.20.1909031006590.32458@zhemvz.fhfr.qr>
On 9/3/19 9:23 AM, Richard Biener wrote:
> On Mon, 2 Sep 2019, Barnaby Wilks wrote:
>
>> Hello,
>>
>> This patch introduces an optimization for narrowing binary and builtin
>> math operations to the smallest type when unsafe math optimizations are
>> enabled (typically -Ofast or -ffast-math).
>>
>> Consider the example:
>>
>> float f (float x) {
>> return 1.0 / sqrt (x);
>> }
>>
>> f:
>> fcvt d0, s0
>> fmov d1, 1.0e+0
>> fsqrt d0, d0
>> fdiv d0, d1, d0
>> fcvt s0, d0
>> ret
>>
>> Given that all outputs are of float type, we can do the whole
>> calculation in single precision and avoid any potentially expensive
>> conversions between single and double precision.
>>
>> Aka the expression would end up looking more like
>>
>> float f (float x) {
>> return 1.0f / sqrtf (x);
>> }
>>
>> f:
>> fsqrt s0, s0
>> fmov s1, 1.0e+0
>> fdiv s0, s1, s0
>> ret
>>
>> This optimization will narrow casts around math builtins, and also
>> not try to find the widest type for calculations when processing binary
>> math operations (if unsafe math optimizations are enable).
>>
>> Added tests to verify that narrower math builtins are chosen and
>> no unnecessary casts are introduced when appropriate.
>>
>> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
>>
>> I don't have write access, so if OK for trunk then can someone commit on
>> my behalf?
>
> @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && newtype == type
> && types_match (newtype, type))
> (op (convert:newtype @1) (convert:newtype @2))
> - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> + (with
> + {
> + if (!flag_unsafe_math_optimizations)
> + {
> + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> newtype = ty1;
> +
> if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
> - newtype = ty2; }
> + newtype = ty2;
> + }
> + }
> +
> /* Sometimes this transformation is safe (cannot
> change results through affecting double rounding
> cases) and sometimes it is not. If NEWTYPE is
>
> The ChangeLog doesn't mention this change and I wonder what it is
> for - later flag_unsafe_math_optimizations is checked, in particular
>
> && (flag_unsafe_math_optimizations
> || (TYPE_PRECISION (newtype) == TYPE_PRECISION
> (type)
> && real_can_shorten_arithmetic (TYPE_MODE
> (itype),
> TYPE_MODE
> (type))
> && !excess_precision_type (newtype)))
>
> note the !excess_precision_type (newtype) which you fail to check
> below.
This change prevents the pattern from casting the operands to the widest
type (the widest between the type of the operands and the type of the
expression as a whole) if unsafe math optimizations are enabled.
Whereas the second check of flag_unsafe_math_optimizations is a shortcut
to enable the transformation as a whole, and does not affect the type
that the operands are being cast to.
Without the first check then the expression will always use the widest
type, resulting in unnecessary casts. For example
float f (float x, float y) {
double z = 1.0 / x;
return z * y;
}
Will generate (With -Ofast)
float D.3459;
double z;
_1 = (double) x;
z = 1.0e+0 / _1;
_2 = (double) y;
_3 = z * _2;
D.3459 = (float) _3;
return D.3459;
Note how the parameters are cast to doubles, the whole calculation is
done in double precision and then cast out to single precision at the
end. (because double is the widest type in the expression)
Whereas if you include the first flag_unsafe_math_optimizations check,
and prevent the widening then you get
float D.3459;
double z;
_1 = (double) x;
z = 1.0e+0 / _1;
_2 = (float) z;
D.3459 = y * _2;
return D.3459;
Where only "double z = 1.0 / x" happens in double precision, and the
rest of the calculation is done in single precision, reducing the amount
of casts.
The benefits here can be seen in the generated code:
Without the flag_unsafe_math_optimizations check
fcvt d1, s1
fcvt d0, s0
fdiv d0, d1, d0
fcvt s0, d0
ret
With the check
fdiv s0, s1, s0
ret
>
> @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (simplify
> (vec_perm vec_same_elem_p@0 @0 @1)
> @0)
> +
> +/* Convert expressions of the form
> + (x) math_call1 ((y) z) where (x) and z are the same type, into
> + math_call2 (z), where math_call2 is the math builtin for
> + type x. Type x (and therefore type of z) must be a lower precision
> + than y/math_call1. */
> +(if (flag_unsafe_math_optimizations && !flag_errno_math)
> + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
> + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
> + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
> + LOG1P SIN TAN TANH SQRT FABS LOGB)
> + (simplify
> + (convert (op@0 (convert@1 @2)))
> + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
> (@1))
> + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> + && types_match (type, TREE_TYPE (@2))
> + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> + (with { enum built_in_function fcode = builtin_mathfn_code (@0);
> + tree fn = mathfn_built_in (type, fcode, false); }
> + (if (fn)
> + (convert { build_call_expr (fn, 1, @2); })))))))
>
> This (convert { build_call_expr (..) } ) only works on GENERIC.
I didn't realize that build_call_expr only works on GENERIC, I took
that snippet from convert_to_real_1 in convert.c, which does a similar
thing, but is not very generic and wont cover all the cases.
I suppose this doesn't matter if the transformation is being moved out
of match.pd into the backprop pass, or if not is there a more generic
(if you excuse the pun) way to create function call nodes for GIMPLE &
GENERIC?
> I also wonder why you needed the mathfn_built_in change.
Because some builtins are not marked as implicit - or more specifically
reserved in C90 and actually specified in C99.
To be honest, I wasn't really happy with doing this, as I'm not sure of
it's implications, so if you have a better way to do this then that
would be much appreciated?
From what I can tell its the float versions of the math builtins that
are not implicit, and these are the functions that are most commonly
needed to narrow to.
> If you look at other examples in match.pd you'd see you should have
> used sth like
>
> (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
> opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
> (simplify
> ...
> (if (types_match (type, float_type_node))
> (opf @2)))
>
> and you have to repeat this for the COSHL (long double) case
> with appropriate opd and opf lists. In theory, if we'd extend
> genmatch to 'transform' builtin function kinds that could be
> done prettier like for example with
Why would this need to be the case? The code already does practially the
same thing by matching on all the builtins and then transforming down to
the narrowest type with the builtin_mathfn_code/mathfn_built_in combination.
> (for op (COSH EXP ...)
> (simplify
> ...
> (op:type @2))
>
> which I'd kind-of like. Note it's not as simple as passing
> 'type' to mathfn_built_in since that expects literal
> double_type_node and friends but we could use a {gimple,generic}-match.c
> private helper for that.
Would "type" not be a double_type_node (or related) literal already?
If mathfn_built_in does not recognise the given type then it will just
spit out NULL_TREE, which is checked by:
+ tree fn = mathfn_built_in (type, fcode, false); }
+ (if (fn)
+ (convert { build_call_expr (fn, 1, @2); })))))))
Apologies if any of these seem obvious questions - I'm quite new to GCC
internals.
Regards,
Barney
> Now - as a general comment I think adding this kind of narrowing is
> good but doing it via match.pd patterns is quite limiting - eventually
> the backprop pass would be a fit for propagating "needed precision"
> and narrowing feeding stmts accordingly in a more general way?
> Richard can probably tell quickest if it is feasible in that framework.
>
> Thanks,
> Richard.
>
>
>> Regards,
>> Barney
>>
>> gcc/ChangeLog:
>>
>> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com>
>>
>> * builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
>> * builtins.h (mathfn_built_in): Likewise.
>> * match.pd: Add expressions for simplifying builtin and binary
>> math expressions.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com>
>>
>> * gcc.dg/fold-single-precision.c: New test.
>>
>