This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/29756] SSE intrinsics hard to use without redundant temporaries appearing
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 19 May 2016 09:52:21 +0000
- Subject: [Bug middle-end/29756] SSE intrinsics hard to use without redundant temporaries appearing
- Auto-submitted: auto-generated
- References: <bug-29756-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the remaining piece may be that of the init-regs issue. We have
vf_24 = BIT_INSERT_EXPR <vf_23(D), _26, 0 (32 bits)>;
which leaves the upper elements undefined, but init-regs forces them to zero.
Another issue is that in
_26 = BIT_FIELD_REF <v_13(D), 32, 32>;
vf_24 = BIT_INSERT_EXPR <vf_23(D), _26, 0 (32 bits)>;
_25 = __builtin_ia32_shufps (vf_24, vf_24, 0);
the shufps is not exposed to gimple optimizations and thus we can't simplify
it in any way. Only the backend knows that it could be simplified to
_25 = __builtin_ia32_shufps (vf_13(D), vf_13(D), 85);
so the backend might want to "expand" __builtin_ia32_shufps to a VEC_PERM_EXPR
in its target specific builtin folding hook (making sure the reverse works
well enough obviously).