[Bug rtl-optimization/89155] Suboptimal code generation for SSE intrinsics based rsqrt

glisse at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Feb 1 22:34:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89155

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Trying 7 -> 9:
    7: r87:V4SF=vec_merge(unspec[r86:V4SF] 45,r86:V4SF,0x1)
      REG_DEAD r86:V4SF
    9: r88:SF=vec_select(r87:V4SF,parallel)
      REG_DEAD r87:V4SF
Failed to match this instruction:
(set (reg:SF 88)
    (vec_select:SF (unspec:V4SF [
                (reg:V4SF 86)
            ] UNSPEC_RSQRT)
        (parallel [
                (const_int 0 [0])
            ])))

Gcc doesn't know that UNSPEC_RSQRT acts element-wise and it could swap this to
something like
(unspec:SF [ (vec_select:SF (reg:V4SF 86) (parallel [ (const_int 0) ] )) ]
UNSPEC_RSQRT)
which, if we split it, might then be able to use "*rsqrtsf2_sse", and the
vec_select would combine nicely with the vec_merges.


More information about the Gcc-bugs mailing list