[Bug tree-optimization/88713] Vectorized code slow vs. flang

rguenther at suse dot de gcc-bugzilla@gcc.gnu.org
Wed Jan 23 13:56:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #37 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> 
> --- Comment #36 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Richard Biener from comment #34)
> > GCC definitely fails to see the FMA use as opportunity in
> > ix86_emit_swsqrtsf, the a == 0 checking is because of the missing
> > expander w/o avx512er where we could still use the NR sequence
> > with the other instruction.  HJ?
> 
> Like this?

Yes.  The lack of an expander for the rqsrt operation is probably
more severe though (causing sqrt + approx recip to appear)

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e0d7c74fcec..0bbe3772ab7 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -44855,14 +44855,22 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode
> mode, bool recip)
>         }
>      }
> 
> +  mthree = force_reg (mode, mthree);
> +
>    /* e0 = x0 * a */
>    emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a)));
> -  /* e1 = e0 * x0 */
> -  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
> 
> -  /* e2 = e1 - 3. */
> -  mthree = force_reg (mode, mthree);
> -  emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree)));
> +  if (TARGET_FMA || TARGET_AVX512F)
> +    emit_insn (gen_rtx_SET (e2,
> +                           gen_rtx_FMA (mode, e0, x0, mthree)));
> +  else
> +    {
> +      /* e1 = e0 * x0 */
> +      emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
> +
> +      /* e2 = e1 - 3. */
> +      emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree)));
> +    }
> 
>    mhalf = force_reg (mode, mhalf);
>    if (recip)


More information about the Gcc-bugs mailing list