This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Barnaby Wilks <Barnaby dot Wilks at arm dot com>, "gcc-patches\@gcc.gnu.org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, "law\@redhat.com" <law at redhat dot com>, "ian\@airs.com" <ian at airs dot com>, Tamar Christina <Tamar dot Christina at arm dot com>, Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Date: Tue, 03 Sep 2019 15:19:30 +0100
- Subject: Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
- References: <571395fe-921b-5a68-ec8d-84850a732253@arm.com> <alpine.LSU.2.20.1909031006590.32458@zhemvz.fhfr.qr>
Richard Biener <rguenther@suse.de> writes:
> On Mon, 2 Sep 2019, Barnaby Wilks wrote:
>
>> Hello,
>>
>> This patch introduces an optimization for narrowing binary and builtin
>> math operations to the smallest type when unsafe math optimizations are
>> enabled (typically -Ofast or -ffast-math).
>>
>> Consider the example:
>>
>> float f (float x) {
>> return 1.0 / sqrt (x);
>> }
>>
>> f:
>> fcvt d0, s0
>> fmov d1, 1.0e+0
>> fsqrt d0, d0
>> fdiv d0, d1, d0
>> fcvt s0, d0
>> ret
>>
>> Given that all outputs are of float type, we can do the whole
>> calculation in single precision and avoid any potentially expensive
>> conversions between single and double precision.
>>
>> Aka the expression would end up looking more like
>>
>> float f (float x) {
>> return 1.0f / sqrtf (x);
>> }
>>
>> f:
>> fsqrt s0, s0
>> fmov s1, 1.0e+0
>> fdiv s0, s1, s0
>> ret
>>
>> This optimization will narrow casts around math builtins, and also
>> not try to find the widest type for calculations when processing binary
>> math operations (if unsafe math optimizations are enable).
>>
>> Added tests to verify that narrower math builtins are chosen and
>> no unnecessary casts are introduced when appropriate.
>>
>> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
>>
>> I don't have write access, so if OK for trunk then can someone commit on
>> my behalf?
> [...]
>
> Now - as a general comment I think adding this kind of narrowing is
> good but doing it via match.pd patterns is quite limiting - eventually
> the backprop pass would be a fit for propagating "needed precision"
> and narrowing feeding stmts accordingly in a more general way?
> Richard can probably tell quickest if it is feasible in that framework.
Yeah, I think it would be a good fit, and would for example cope with
cases in which we select between two double results before doing the
truncation to float. I'd wanted to do something similar for integer
truncation but never found the time...
At the moment, backprop handles a single piece of information: whether
the sign of the value matters. This is (over?)generalised to be one bit
of information in a word of flags. I guess we could take the same
approach here and have flags for certain well-known floating-point
types, but it might be cleaner to instead have a field that records the
widest mode that users of the result want.
I think to do this we'd need to build an array that maps floating-point
machine_modes to their order in the FOR_EACH_MODE_IN_CLASS chain.
That'll give us a total ordering over floating-point modes and mean
that operator & (the usage_info confluence function) can just take
whichever of the input usage_info modes has the highest index in this
chain.
Thanks,
Richard