This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: David Sherwood <david dot sherwood at arm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <Richard dot Sandiford at arm dot com>
- Date: Wed, 25 Nov 2015 13:38:33 +0100
- Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
- Authentication-results: sourceware.org; auth=none
- References: <000001d0d5b0$5da4dbb0$18ee9310$ at arm dot com> <CAFiYyc1bgWwdV4PRLBuUv3yC0X-k5gJVuyyV9V7Vrz3Lte+wZw at mail dot gmail dot com> <000001d0d8cf$2fb42770$8f1c7650$ at arm dot com> <CAFiYyc2XT+iqyRNgp+N2gWsaP-=1xVUWsuUEj+bOq_UmE_1eLw at mail dot gmail dot com> <000001d0d9a6$1efdc350$5cf949f0$ at arm dot com> <CAFiYyc3CLF8beK5GaB86Ad7623gWc9yhc8nTom-ByoaHTEMyOg at mail dot gmail dot com> <87fv3gbs36 dot fsf at e105548-lin dot cambridge dot arm dot com> <CAFiYyc1q7deng7mjnt788RwqkHuvdDCKW=uWpd=goEvTeZiK5Q at mail dot gmail dot com> <8737zfbo2j dot fsf at e105548-lin dot cambridge dot arm dot com> <CAFiYyc1any7rSNCYqEpMDqsCesPte1N=ancreby-XSFBJmJ1Tg at mail dot gmail dot com> <87y4h7a35q dot fsf at e105548-lin dot cambridge dot arm dot com> <CAFiYyc2W82jzYE3saLQkNhhDEH1+BFk0yShrh3OVBdjHkkyr3A at mail dot gmail dot com> <000001d125d0$4f09e990$ed1dbcb0$ at arm dot com>
On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood <david.sherwood@arm.com> wrote:
> Hi,
>
> This is part 1 of a reworked version of a patch I originally submitted in
> August, rebased after Richard Sandiford's recent work on the internal
> functions. This first patch adds the internal function definitions and optabs
> that provide support for IEEE fmax()/fmin() functions.
>
> Later patches will add the appropriate aarch64/aarch32 vector instructions.
Ok.
Thanks,
Richard.
> Tested:
>
> x86_64-linux: no regressions
> aarch64-none-elf: no regressions
> arm-none-eabi: no regressions
>
> Regards,
> David Sherwood.
>
> ChangeLog:
>
> 2015-11-19 David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * optabs.def: Add new optabs fmax_optab/fmin_optab.
> * internal-fn.def: Add new fmax/fmin internal functions.
> * config/aarch64/aarch64.md: New pattern.
> * config/aarch64/aarch64-simd.md: Likewise.
> * config/aarch64/iterators.md: New unspecs, iterators.
> * config/arm/iterators.md: New iterators.
> * config/arm/unspecs.md: New unspecs.
> * config/arm/neon.md: New pattern.
> * config/arm/vfp.md: Likewise.
> * doc/md.texi: Add fmin and fmax patterns.
> gcc/testsuite
> * gcc.target/aarch64/fmaxmin.c: New test.
> * gcc.target/arm/fmaxmin.c: New test.
>
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: 19 August 2015 13:35
>> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
>> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
>>
>> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> > Richard Biener <richard.guenther@gmail.com> writes:
>> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
>> >> <richard.sandiford@arm.com> wrote:
>> >>> Richard Biener <richard.guenther@gmail.com> writes:
>> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>> >>>> <richard.sandiford@arm.com> wrote:
>> >>>>> Richard Biener <richard.guenther@gmail.com> writes:
>> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>> >>>>>> <david.sherwood@arm.com> wrote:
>> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>> >>>>>>>> <david.sherwood@arm.com> wrote:
>> >>>>>>>> > Hi Richard,
>> >>>>>>>> >
>> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>> >>>>>>>> > seemed more
>> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>> >>>>>>>> > addition it
>> >>>>>>>> > would seem to provide more opportunities for optimisation than a
>> >>>>>>>> > target-specific
>> >>>>>>>> > builtin implementation would. I accept that optimisation
>> >>>>>>>> > opportunities will
>> >>>>>>>> > be more limited for strict math compilation, but that it was still
>> >>>>>>>> > worth having
>> >>>>>>>> > them. Also, if we did map it to builtins then the scalar
>> >>>>>>>> > version would go
>> >>>>>>>> > through the optabs and the vector version would go through the
>> >>>>>>>> > target's builtin
>> >>>>>>>> > expansion, which doesn't seem very consistent.
>> >>>>>>>>
>> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>> >>>>>>>> you can't vectorize anyway? (strict IEEE behavior is about NaNs,
>> >>>>>>>> correct?)
>> >>>>>>> I thought for this particular case associativity wasn't an issue?
>> >>>>>>> We're not doing any
>> >>>>>>> reductions here, just simply performing max/min operations on each
>> >>>>>>> pair of elements
>> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>> >>>>>>> ensure that for
>> >>>>>>> each pair of elements if one element is a NaN we return the other one.
>> >>>>>>
>> >>>>>> Hmm, true. Ok, my comment still stands - I don't see that using a
>> >>>>>> tree code is the best thing to do here. You can add fmin/max optabs
>> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>> >>>>>> builtin for the vectorized variant.
>> >>>>>>
>> >>>>>> The reason I am pushing against a new tree code is that we'd have an
>> >>>>>> awful lot of similar codes when pushing other flag related IL
>> >>>>>> specialities to actual IL constructs. And we still need to find a
>> >>>>>> consistent way to do that.
>> >>>>>
>> >>>>> In this case though the new code is really the "native" min/max operation
>> >>>>> for fp, rather than some weird flag-dependent behaviour. Maybe it's
>> >>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>> >>>>> the flag-related modification. The STRICT_* prefix is forced by that and
>> >>>>> might make it seem like more of a special case than it really is.
>> >>>>
>> >>>> In some sense. But the "strict" version already has a builtin (just no
>> >>>> special expander in builtins.c). We usually don't add 1:1 tree codes
>> >>>> for existing builtins (why have builtins at all then?).
>> >>>
>> >>> We still need the builtin to match the C function (and to allow direct
>> >>> calls to __builtin_fmin, etc., which are occasionally useful).
>> >>>
>> >>>>> If you're still not convinced, how about an internal function instead
>> >>>>> of a built-in function, so that we can continue to use optabs for all
>> >>>>> cases? I'd really like to avoid forcing such a generic concept down to
>> >>>>> target-specific builtins with target-specific expansion code, especially
>> >>>>> when the same concept is exposed by target-independent code for scalars.
>> >>>>
>> >>>> The target builtin is for the vectorized variant - not all targets might have
>> >>>> that and we'd need to query the target about this. So using a IFN would
>> >>>> mean adding a target hook for that query.
>> >>>
>> >>> No, the idea is that if we have a tree code or an internal function, the
>> >>> decision about whether we have target support is based on a query of the
>> >>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>> >>> No new hooks are needed.
>> >>>
>> >>> The patch checked for target support that way.
>> >>
>> >> Fair enough. Still this means we should have tree codes for all builtins
>> >> that eventually are vectorized? So why don't we have SIN_EXPR,
>> >> POW_EXPR (ok, I did argue and have patches for that in the past),
>> >> RINT_EXPR, SQRT_EXPR, etc?
>> >
>> > Yeah, it doesn't sound so bad to me :-) The choice of what's a function
>> > in C and what's inherent is pretty arbitrary. E.g. % on doubles could
>> > have implemented fmod() or remainder(). Casts from double to int could
>> > have used the current rounding mode, but instead they truncate and
>> > conversions using the rounding mode need to go through something like
>> > (l)lrint(). Like you say, pow() could have been an operator (and is in
>> > many languages), but instead it's a function.
>> >
>> >> This patch starts to go down that route which is why I ask for the
>> >> whole picture to be considered and hinted at the alternative implementation
>> >> which follows existing practice. Add a expander in builtins.c, add an optab,
>> >> and eventual support to vectorized_function.
>> >>
>> >> See for example ix86_builtin_vectorized_function which handles
>> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
>> >> if the target supports it for the scalar mode, so not sure if there is
>> >> any x86 ISA where it has vectorized FMA but not scalar FMA).
>> >
>> > Yeah. TBH I'm really against doing that unless (a) there's good reason
>> > to believe that the concept really is specific to one target and
>> > wouldn't be implemented on others or (b) there really is a function
>> > rather than an instruction underneath (usually the case for sin, etc.).
>> > But (b) could also be handled by the optab support library mechanism.
>> >
>> > Reasons against using target-specific builtins for operations that
>> > have direct support in the ISA:
>> >
>> > 1. Like you say, in practice vector ops only tend to be supported if the
>> > associated scalar op is also supported. Sticking to this approach
>> > means that vector ops follow a different path from scalar ops whereas
>> > (for example) division follows the same path for both. It just seems
>> > confusing to have some floating-point optabs that support both scalar
>> > and vector operands and others that only support scalar operands.
>> >
>> > 2. Once converted to a target-specific function, the target-independent
>> > code has no idea what the function does or how expensive it is.
>> > We might start out with just one hook to convert a scalar operation
>> > to a target-dependent built-in function, but I bet over time we'll
>> > grow other hooks to query properties about the function, such as
>> > costs.
>> >
>> > 3. builtin_vectorized_function returns a decl rather than a call.
>> > If the target's vector API doesn't already have a built-in for the
>> > operation we need, with the exact types and arguments that we expect,
>> > the target needs to define one, presumably marked so that it isn't
>> > callable by input code.
>> >
>> > E.g. on targets where FP conversion instructions allow an explicit
>> > rounding mode to be specified as an operand, it's reasonable for a
>> > target's vector API to expose that operand as a constant argument to
>> > the API function. There'd then be one API function for all vector-
>> > float-to-vector-float integer rounding operations, rather than one
>> > for vector rint(), one for vector ceil(), etc. (I'm thinking of
>> > System z instructions here, although I don't know offhand what the
>> > vector API is there.) IMO it doesn't make sense to force the target
>> > to define "fake" built-in functions for all those possibilities
>> > purely for the sake of the target hook. It's a lot of extra code,
>> > and it's extra code that would be duplicated on any target that needs
>> > to do this.
>> >
>> > IMO optabs are the best way for the target to tell the target-independent
>> > code what it can do. If it supports sqrt on df it defines sqrtdf and
>> > if it supports vector sqrt on v2df it defines sqrtv2df. These patterns
>> > will often be a single define_expand or define_insn template -- the
>> > vectorness often comes "for free" in terms of writing the pattern.
>> >
>> >>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>> >>>> > builtin enum value) is worse than a tree-code value, unless the limit
>> >>>> > of the tree_code bitfield is in sight (maybe it is).
>> >>>>
>> >>>> I think tree_code is 64bits now.
>> >>>
>> >>> Even better :-)
>> >>
>> >> Yes.
>> >>
>> >> I'm not against adding a corresponding tree code for all math builtin functions,
>> >> we just have to decide whether this is the way to go (and of course support
>> >> expanding those back to libcalls to libc/m rather than libgcc). There are
>> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
>> >> generate as the target may not be able to expand the long double variant
>> >> directly but needs a libcall but libm might not be linked or may not
>> >> have support
>> >> for it. That would be a new thing compared to libgcc providing a fallback
>> >> for all other tree codes.
>> >
>> > True, but that doesn't seem too bad. The constraints would be the same
>> > if we're operating on built-in functions rather than codes. I suppose
>> > built-in functions make this more explicit, but at the end of the day
>> > it's a costing decision. We should no more be converting a cheap
>> > operation into an expensive libgcc function than converting a cheap
>> > operation into an expensive libm function, even if the libgcc conversion
>> > links.
>> >
>> > There's certainly precedent for introducing calls to things that libgcc
>> > doesn't define. E.g. we already introduce calls to memcpy in things
>> > like loop distribution, even though we don't provide a fallback memcpy
>> > in libgcc.
>>
>> As an additional point for many math functions we have to support errno
>> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
>> only if -fno-math-errno is in effect. But then code has to handle
>> both variants for things like constant folding and expression combining.
>> That's very unfortunate and something we want to avoid (one reason
>> the POW_EXPR thing didn't fly when I tried). STRICT_FMIN/MAX_EXPR
>> is an example where this doesn't apply, of course (but I detest the name,
>> just use FMIN/FMAX_EXPR?). Still you'd need to handle both,
>> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
>>
>> Richard.
>>
>>
>> > Thanks,
>> > Richard
>> >
>