[Bug tree-optimization/88713] Vectorized code slow vs. flang
rguenther at suse dot de
gcc-bugzilla@gcc.gnu.org
Wed Jan 23 14:05:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #39 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #38 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to rguenther@suse.de from comment #37)
> > On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
> >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> > >
> > > --- Comment #36 from H.J. Lu <hjl.tools at gmail dot com> ---
> > > (In reply to Richard Biener from comment #34)
> > > > GCC definitely fails to see the FMA use as opportunity in
> > > > ix86_emit_swsqrtsf, the a == 0 checking is because of the missing
> > > > expander w/o avx512er where we could still use the NR sequence
> > > > with the other instruction. HJ?
> > >
> > > Like this?
> >
> > Yes. The lack of an expander for the rqsrt operation is probably
> > more severe though (causing sqrt + approx recip to appear)
> >
>
> Can we use UNSPEC_RSQRT14 here if UNSPEC_RSQRT28 isn't available?
I think we can but we lack an expander for this. IIRC for the following
existing expander the RTL is ignored and thus we could simply
replace the TARGET_AVX512ER check with TARGET_AVX512F?
(define_expand "rsqrtv16sf2"
[(set (match_operand:V16SF 0 "register_operand")
(unspec:V16SF
[(match_operand:V16SF 1 "vector_operand")]
UNSPEC_RSQRT28))]
"TARGET_SSE_MATH && TARGET_AVX512ER"
{
ix86_emit_swsqrtsf (operands[0], operands[1], V16SFmode, true);
DONE;
})
More information about the Gcc-bugs
mailing list