This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/29756] SSE intrinsics hard to use without redundant temporaries appearing


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the remaining piece may be that of the init-regs issue.  We have

  vf_24 = BIT_INSERT_EXPR <vf_23(D), _26, 0 (32 bits)>;

which leaves the upper elements undefined, but init-regs forces them to zero.
Another issue is that in

  _26 = BIT_FIELD_REF <v_13(D), 32, 32>;
  vf_24 = BIT_INSERT_EXPR <vf_23(D), _26, 0 (32 bits)>;
  _25 = __builtin_ia32_shufps (vf_24, vf_24, 0);

the shufps is not exposed to gimple optimizations and thus we can't simplify
it in any way.  Only the backend knows that it could be simplified to

  _25 = __builtin_ia32_shufps (vf_13(D), vf_13(D), 85);

so the backend might want to "expand" __builtin_ia32_shufps to a VEC_PERM_EXPR
in its target specific builtin folding hook (making sure the reverse works
well enough obviously).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]