[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
ubizjak at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sat Aug 2 13:01:00 GMT 2008
------- Comment #4 from ubizjak at gmail dot com 2008-08-02 13:00 -------
(In reply to comment #3)
> Operations in loops should now be vectorized. The original testcase is
> probably not worth vectorizing due to calling convention problems (_Complex T
> is not passed as a vector).
Not really. For some unknown reason, _Complex float is passed as a two element
vector in SSE register. This introduces (double!) store forwarding penalty,
since we have to split the value into SSE pair before processing. This is wrong
ABI design, as shown by comparing generated code from following example:
--cut here--
_Complex float testf (_Complex float a, _Complex float b)
{
return a + b;
}
_Complex double testd (_Complex double a, _Complex double b)
{
return a + b;
}
--cut here--
testf:
movq %xmm0, -8(%rsp)
movq %xmm1, -16(%rsp)
movss -8(%rsp), %xmm0
movss -4(%rsp), %xmm2
addss -16(%rsp), %xmm0
addss -12(%rsp), %xmm2
movss %xmm0, -24(%rsp)
movss %xmm2, -20(%rsp)
movq -24(%rsp), %xmm0
ret
testd:
addsd %xmm3, %xmm1
addsd %xmm2, %xmm0
ret
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485
More information about the Gcc-bugs
mailing list