Why does GCC store XMM registers into RAM then load them back instead of using them directly?

Liu Hao ltpmouse@gmail.com
Tue May 2 06:16:00 GMT 2017


This can be observed from the following example:
(For your reference: https://godbolt.org/g/toFOVc )

```c++
#include <emmintrin.h>

double my_fmax_1(double x, double y){
     return _mm_cvtsd_f64(_mm_max_sd(_mm_set_sd(x), _mm_set_sd(y)));
}
double my_fmax_2(double x, double y){
     double r;
     __asm__ (
         "maxsd   %%xmm1, %%xmm0"
         : "=x"(r)
         : "0"(x), "x"(y)
     );
     return r;
}
```

After being compiled with `-O3`, this snippet results in the following 
assembly:

```assembly
my_fmax_1(double, double):
         movsd   %xmm0, -24(%rsp)
         movsd   %xmm1, -16(%rsp)
         movsd   -24(%rsp), %xmm0
         movsd   -16(%rsp), %xmm1
         maxsd   %xmm1, %xmm0
         ret
my_fmax_2(double, double):
         maxsd   %xmm1, %xmm0
         ret
```

The first function seems very inefficient. Are there any particular 
reasons why GCC doesn't optimize it well (like the second function)

-- 
Best regards,
ltpmouse



More information about the Gcc-help mailing list