[PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Wed Jul 18 09:50:00 GMT 2018

On 18/07/18 10:44, Richard Biener wrote:
> On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi Richard,
>>
>> On 17/07/18 14:27, Richard Biener wrote:
>>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>> Hi all,
>>>>
>>>> This is my first Fortran patch, so apologies if I'm missing something.
>>>> The current expansion of the min and max intrinsics explicitly expands
>>>> the comparisons between each argument to calculate the global min/max.
>>>> Some targets, like aarch64, have instructions that can calculate the min/max
>>>> of two real (floating-point) numbers with the proper NaN-handling semantics
>>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
>>>> are the semantics provided by the __builtin_fmin/max family of functions that expand
>>>> to these instructions.
>>>>
>>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
>>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
>>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
>>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
>>> require things that
>>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>> The current implementation expands to:
>>       mvar = a1;
>>       if (a2 .op. mvar || isnan (mvar))
>>         mvar = a2;
>>       if (a3 .op. mvar || isnan (mvar))
>>         mvar = a3;
>>       ...
>>       return mvar;
>>
>> That is, if one of the operands is a NaN it will return the other argument.
>> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
>> as far as I can tell.
>>
>>> /* Minimum and maximum values.  When used with floating point, if both
>>>      operands are zeros, or if either operand is NaN, then it is unspecified
>>>      which of the two operands is returned as the result.  */
>>>
>>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
>>> zeros or NaNs.
>>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
>>> zeros are significant.
>> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
>> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
>>
>>
>>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
>>> is a good idea,
>>> this may both generate bigger code and be slower.
>> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
>> them as available (does that mean they'll have a fast inline implementation?)
> This doesn't mean anything given you make them available with your
> patch ;)  So I expect it may
> cause issues for !c99_runtime targets (and long double at least).

Urgh, that can cause headaches...

>> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
>> to the existing expansion.
> As said I would not use fmin/fmax calls here at all.

... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these intrinsics?
That should make it simpler and more portable.

>> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
> You said that, yes.  Even without -ffast-math?

It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill

> Richard.
>
>> Thanks,
>> Kyrill
>>
>>> Richard.
>>>
>>>> midend and optimise. The previous approach of generating the open-coded version of that
>>>> is used when we don't have an appropriate __builtin_fmin/max available.
>>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
>>>> 128-bit __built_fminl available.
>>>>
>>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
>>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
>>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
>>>> in performance on a Cortex-A72.
>>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
>>>>
>>>> Ok for trunk?
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
>>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
>>>>        __builtin_fmaxl.
>>>>        * trans-intrinsic.c: Include builtins.h.
>>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
>>>>        functions to calculate the min/max.
>>>>
>>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>        * gfortran.dg/max_fmaxf.f90: New test.
>>>>        * gfortran.dg/min_fminf.f90: Likewise.
>>>>        * gfortran.dg/minmax_integer.f90: Likewise.
>>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
>>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.