[Bug rtl-optimization/89155] Suboptimal code generation for SSE intrinsics based rsqrt
glisse at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Feb 1 22:34:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89155
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Trying 7 -> 9:
7: r87:V4SF=vec_merge(unspec[r86:V4SF] 45,r86:V4SF,0x1)
REG_DEAD r86:V4SF
9: r88:SF=vec_select(r87:V4SF,parallel)
REG_DEAD r87:V4SF
Failed to match this instruction:
(set (reg:SF 88)
(vec_select:SF (unspec:V4SF [
(reg:V4SF 86)
] UNSPEC_RSQRT)
(parallel [
(const_int 0 [0])
])))
Gcc doesn't know that UNSPEC_RSQRT acts element-wise and it could swap this to
something like
(unspec:SF [ (vec_select:SF (reg:V4SF 86) (parallel [ (const_int 0) ] )) ]
UNSPEC_RSQRT)
which, if we split it, might then be able to use "*rsqrtsf2_sse", and the
vec_select would combine nicely with the vec_merges.
More information about the Gcc-bugs
mailing list