This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Why does GCC store XMM registers into RAM then load them back instead of using them directly?
- From: Liu Hao <ltpmouse at gmail dot com>
- To: "gcc-help at gcc dot gnu dot org" <gcc-help at gcc dot gnu dot org>
- Date: Tue, 2 May 2017 14:15:33 +0800
- Subject: Why does GCC store XMM registers into RAM then load them back instead of using them directly?
- Authentication-results: sourceware.org; auth=none
This can be observed from the following example:
(For your reference: https://godbolt.org/g/toFOVc )
```c++
#include <emmintrin.h>
double my_fmax_1(double x, double y){
return _mm_cvtsd_f64(_mm_max_sd(_mm_set_sd(x), _mm_set_sd(y)));
}
double my_fmax_2(double x, double y){
double r;
__asm__ (
"maxsd %%xmm1, %%xmm0"
: "=x"(r)
: "0"(x), "x"(y)
);
return r;
}
```
After being compiled with `-O3`, this snippet results in the following
assembly:
```assembly
my_fmax_1(double, double):
movsd %xmm0, -24(%rsp)
movsd %xmm1, -16(%rsp)
movsd -24(%rsp), %xmm0
movsd -16(%rsp), %xmm1
maxsd %xmm1, %xmm0
ret
my_fmax_2(double, double):
maxsd %xmm1, %xmm0
ret
```
The first function seems very inefficient. Are there any particular
reasons why GCC doesn't optimize it well (like the second function)
--
Best regards,
ltpmouse