[Bug tree-optimization/88713] Vectorized code slow vs. flang
rguenther at suse dot de
gcc-bugzilla@gcc.gnu.org
Wed Jan 23 13:56:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #37 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #36 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Richard Biener from comment #34)
> > GCC definitely fails to see the FMA use as opportunity in
> > ix86_emit_swsqrtsf, the a == 0 checking is because of the missing
> > expander w/o avx512er where we could still use the NR sequence
> > with the other instruction. HJ?
>
> Like this?
Yes. The lack of an expander for the rqsrt operation is probably
more severe though (causing sqrt + approx recip to appear)
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e0d7c74fcec..0bbe3772ab7 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -44855,14 +44855,22 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode
> mode, bool recip)
> }
> }
>
> + mthree = force_reg (mode, mthree);
> +
> /* e0 = x0 * a */
> emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a)));
> - /* e1 = e0 * x0 */
> - emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
>
> - /* e2 = e1 - 3. */
> - mthree = force_reg (mode, mthree);
> - emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree)));
> + if (TARGET_FMA || TARGET_AVX512F)
> + emit_insn (gen_rtx_SET (e2,
> + gen_rtx_FMA (mode, e0, x0, mthree)));
> + else
> + {
> + /* e1 = e0 * x0 */
> + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
> +
> + /* e2 = e1 - 3. */
> + emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree)));
> + }
>
> mhalf = force_reg (mode, mhalf);
> if (recip)
More information about the Gcc-bugs
mailing list