This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Why does GCC store XMM registers into RAM then load them back instead of using them directly?

This can be observed from the following example:
(For your reference: )

#include <emmintrin.h>

double my_fmax_1(double x, double y){
    return _mm_cvtsd_f64(_mm_max_sd(_mm_set_sd(x), _mm_set_sd(y)));
double my_fmax_2(double x, double y){
    double r;
    __asm__ (
        "maxsd   %%xmm1, %%xmm0"
        : "=x"(r)
        : "0"(x), "x"(y)
    return r;

After being compiled with `-O3`, this snippet results in the following assembly:

my_fmax_1(double, double):
        movsd   %xmm0, -24(%rsp)
        movsd   %xmm1, -16(%rsp)
        movsd   -24(%rsp), %xmm0
        movsd   -16(%rsp), %xmm1
        maxsd   %xmm1, %xmm0
my_fmax_2(double, double):
        maxsd   %xmm1, %xmm0

The first function seems very inefficient. Are there any particular reasons why GCC doesn't optimize it well (like the second function)

Best regards,

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]