This is the mail archive of the
mailing list for the GCC project.
Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: David Sherwood <David dot Sherwood at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 19 Aug 2015 13:11:45 +0100
- Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
- Authentication-results: sourceware.org; auth=none
- References: <000001d0d5b0$5da4dbb0$18ee9310$ at arm dot com> <CAFiYyc1bgWwdV4PRLBuUv3yC0X-k5gJVuyyV9V7Vrz3Lte+wZw at mail dot gmail dot com> <000001d0d8cf$2fb42770$8f1c7650$ at arm dot com> <CAFiYyc2XT+iqyRNgp+N2gWsaP-=1xVUWsuUEj+bOq_UmE_1eLw at mail dot gmail dot com> <000001d0d9a6$1efdc350$5cf949f0$ at arm dot com> <CAFiYyc3CLF8beK5GaB86Ad7623gWc9yhc8nTom-ByoaHTEMyOg at mail dot gmail dot com> <87fv3gbs36 dot fsf at e105548-lin dot cambridge dot arm dot com> <CAFiYyc1q7deng7mjnt788RwqkHuvdDCKW=uWpd=goEvTeZiK5Q at mail dot gmail dot com> <8737zfbo2j dot fsf at e105548-lin dot cambridge dot arm dot com> <CAFiYyc1any7rSNCYqEpMDqsCesPte1N=ancreby-XSFBJmJ1Tg at mail dot gmail dot com>
Richard Biener <firstname.lastname@example.org> writes:
> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> <email@example.com> wrote:
>> Richard Biener <firstname.lastname@example.org> writes:
>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>>> <email@example.com> wrote:
>>>> Richard Biener <firstname.lastname@example.org> writes:
>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>>> <email@example.com> wrote:
>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>>>> <firstname.lastname@example.org> wrote:
>>>>>>> > Hi Richard,
>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>>>> > seemed more
>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>>>> > addition it
>>>>>>> > would seem to provide more opportunities for optimisation than a
>>>>>>> > target-specific
>>>>>>> > builtin implementation would. I accept that optimisation
>>>>>>> > opportunities will
>>>>>>> > be more limited for strict math compilation, but that it was still
>>>>>>> > worth having
>>>>>>> > them. Also, if we did map it to builtins then the scalar
>>>>>>> > version would go
>>>>>>> > through the optabs and the vector version would go through the
>>>>>>> > target's builtin
>>>>>>> > expansion, which doesn't seem very consistent.
>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>>>>> you can't vectorize anyway? (strict IEEE behavior is about NaNs,
>>>>>> I thought for this particular case associativity wasn't an issue?
>>>>>> We're not doing any
>>>>>> reductions here, just simply performing max/min operations on each
>>>>>> pair of elements
>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>>>>> ensure that for
>>>>>> each pair of elements if one element is a NaN we return the other one.
>>>>> Hmm, true. Ok, my comment still stands - I don't see that using a
>>>>> tree code is the best thing to do here. You can add fmin/max optabs
>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>>>> builtin for the vectorized variant.
>>>>> The reason I am pushing against a new tree code is that we'd have an
>>>>> awful lot of similar codes when pushing other flag related IL
>>>>> specialities to actual IL constructs. And we still need to find a
>>>>> consistent way to do that.
>>>> In this case though the new code is really the "native" min/max operation
>>>> for fp, rather than some weird flag-dependent behaviour. Maybe it's
>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>>>> the flag-related modification. The STRICT_* prefix is forced by that and
>>>> might make it seem like more of a special case than it really is.
>>> In some sense. But the "strict" version already has a builtin (just no
>>> special expander in builtins.c). We usually don't add 1:1 tree codes
>>> for existing builtins (why have builtins at all then?).
>> We still need the builtin to match the C function (and to allow direct
>> calls to __builtin_fmin, etc., which are occasionally useful).
>>>> If you're still not convinced, how about an internal function instead
>>>> of a built-in function, so that we can continue to use optabs for all
>>>> cases? I'd really like to avoid forcing such a generic concept down to
>>>> target-specific builtins with target-specific expansion code, especially
>>>> when the same concept is exposed by target-independent code for scalars.
>>> The target builtin is for the vectorized variant - not all targets might have
>>> that and we'd need to query the target about this. So using a IFN would
>>> mean adding a target hook for that query.
>> No, the idea is that if we have a tree code or an internal function, the
>> decision about whether we have target support is based on a query of the
>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>> No new hooks are needed.
>> The patch checked for target support that way.
> Fair enough. Still this means we should have tree codes for all builtins
> that eventually are vectorized? So why don't we have SIN_EXPR,
> POW_EXPR (ok, I did argue and have patches for that in the past),
> RINT_EXPR, SQRT_EXPR, etc?
Yeah, it doesn't sound so bad to me :-) The choice of what's a function
in C and what's inherent is pretty arbitrary. E.g. % on doubles could
have implemented fmod() or remainder(). Casts from double to int could
have used the current rounding mode, but instead they truncate and
conversions using the rounding mode need to go through something like
(l)lrint(). Like you say, pow() could have been an operator (and is in
many languages), but instead it's a function.
> This patch starts to go down that route which is why I ask for the
> whole picture to be considered and hinted at the alternative implementation
> which follows existing practice. Add a expander in builtins.c, add an optab,
> and eventual support to vectorized_function.
> See for example ix86_builtin_vectorized_function which handles
> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> if the target supports it for the scalar mode, so not sure if there is
> any x86 ISA where it has vectorized FMA but not scalar FMA).
Yeah. TBH I'm really against doing that unless (a) there's good reason
to believe that the concept really is specific to one target and
wouldn't be implemented on others or (b) there really is a function
rather than an instruction underneath (usually the case for sin, etc.).
But (b) could also be handled by the optab support library mechanism.
Reasons against using target-specific builtins for operations that
have direct support in the ISA:
1. Like you say, in practice vector ops only tend to be supported if the
associated scalar op is also supported. Sticking to this approach
means that vector ops follow a different path from scalar ops whereas
(for example) division follows the same path for both. It just seems
confusing to have some floating-point optabs that support both scalar
and vector operands and others that only support scalar operands.
2. Once converted to a target-specific function, the target-independent
code has no idea what the function does or how expensive it is.
We might start out with just one hook to convert a scalar operation
to a target-dependent built-in function, but I bet over time we'll
grow other hooks to query properties about the function, such as
3. builtin_vectorized_function returns a decl rather than a call.
If the target's vector API doesn't already have a built-in for the
operation we need, with the exact types and arguments that we expect,
the target needs to define one, presumably marked so that it isn't
callable by input code.
E.g. on targets where FP conversion instructions allow an explicit
rounding mode to be specified as an operand, it's reasonable for a
target's vector API to expose that operand as a constant argument to
the API function. There'd then be one API function for all vector-
float-to-vector-float integer rounding operations, rather than one
for vector rint(), one for vector ceil(), etc. (I'm thinking of
System z instructions here, although I don't know offhand what the
vector API is there.) IMO it doesn't make sense to force the target
to define "fake" built-in functions for all those possibilities
purely for the sake of the target hook. It's a lot of extra code,
and it's extra code that would be duplicated on any target that needs
to do this.
IMO optabs are the best way for the target to tell the target-independent
code what it can do. If it supports sqrt on df it defines sqrtdf and
if it supports vector sqrt on v2df it defines sqrtv2df. These patterns
will often be a single define_expand or define_insn template -- the
vectorness often comes "for free" in terms of writing the pattern.
>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>>> > builtin enum value) is worse than a tree-code value, unless the limit
>>> > of the tree_code bitfield is in sight (maybe it is).
>>> I think tree_code is 64bits now.
>> Even better :-)
> I'm not against adding a corresponding tree code for all math builtin functions,
> we just have to decide whether this is the way to go (and of course support
> expanding those back to libcalls to libc/m rather than libgcc). There are
> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> generate as the target may not be able to expand the long double variant
> directly but needs a libcall but libm might not be linked or may not
> have support
> for it. That would be a new thing compared to libgcc providing a fallback
> for all other tree codes.
True, but that doesn't seem too bad. The constraints would be the same
if we're operating on built-in functions rather than codes. I suppose
built-in functions make this more explicit, but at the end of the day
it's a costing decision. We should no more be converting a cheap
operation into an expensive libgcc function than converting a cheap
operation into an expensive libm function, even if the libgcc conversion
There's certainly precedent for introducing calls to things that libgcc
doesn't define. E.g. we already introduce calls to memcpy in things
like loop distribution, even though we don't provide a fallback memcpy