This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
- From: "hjl dot tools at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 21 Apr 2009 20:34:10 -0000
- Subject: [Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
- References: <bug-39840-700@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #5 from hjl dot tools at gmail dot com 2009-04-21 20:34 -------
Created an attachment (id=17667)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17667&action=view)
An example
I am enclosing a modified example which can be compiled with both
icc and gcc. I also included assembly codes generated by "icc -O2"
and "gcc -avx -O2". Icc generates:
54: c5 ff 7c c8 vhaddps %ymm0,%ymm0,%ymm1
58: c5 f7 7c d1 vhaddps %ymm1,%ymm1,%ymm2
5c: c5 ef 7c da vhaddps %ymm2,%ymm2,%ymm3
60: c5 fc 29 5c 24 e0 vmovaps %ymm3,-0x20(%rsp)
66: f3 0f 10 44 24 e0 movss -0x20(%rsp),%xmm0
for
if (has_avx ())
{
...
}
There is
f3 0f 10 44 24 e0 movss -0x20(%rsp),%xmm0
although this code will only run on AVX targets. Since we don't
support basic block optimization, I don't see how we can avoid
SSE instructions in AVX code path. The best option I can think
of is function level optimization. But as we all know, function
level optimization isn't usable, as least in this context. I
think we should go back and another look at function level
optimization. We should do it right this time. I have some
ideas in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37565
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840