[ARM] PR66791: Gate comparison in vca intrinsics on __FAST_MATH__

Wed Jun 30 09:05:19 GMT 2021

On Wed, 30 Jun 2021 at 14:00, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > Sent: 29 June 2021 08:21
> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > Subject: Re: [ARM] PR66791: Gate comparison in vca intrinsics on
> > __FAST_MATH__
> >
> > On Tue, 22 Jun 2021 at 15:04, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > Hi,
> > > The attached patch gates abs(__a) cmp abs(__b) for vca intrinsics on
> > > __FAST_MATH__. I moved vabs intrinsics before vcage_f32 since vca
> > > intrinsics use those.
> > > Bootstrapped+tested on arm-linux-gnueabihf.
> > > OK to commit ?
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573384.html
>
> Hmm, does this result in better optimisation? I guess it's expressing the operation at a higher level, but there's now conceptually three operations (2xvabs + 1 comparison) that would need to be folded away by the optimisers...
Hi Kyrill,
That was my motivation for PR97906 ;-)
With that fix, it now folds c = vabs(a) >= vabs(b) to vacle z, b, a
with __FAST_MATH__ defined.

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh