gcc-3.4.4 casting builtin float vector to float for use in x86-64 register return convention of an inline function

Thu Mar 2 15:22:00 GMT 2006

Take a simple suboptimal inline implementation of hadd with SSE2 using
builtin intrinsics:

typedef float v4sf __attribute__ ((vector_size (16)));

inline v4sf hadd(v4sf src)
{
  src = __builtin_ia32_addps(src, __builtin_ia32_movhlps(src, src));
  return __builtin_ia32_addss(src, __builtin_ia32_shufps(src,
src, 0xE5));
}

This gets compiled to:

movaps  %xmm0, %xmm1
movhlps %xmm0, %xmm1
addps   %xmm1, %xmm0
movaps  %xmm0, %xmm1
shufps  $229, %xmm0, %xmm1
addss   %xmm1, %xmm0

Apparently a printf("%f\n", hadd(something)) works with the parameter
in %xmm0 and doesn't even convert the single precision float to a
double.

But if you want to continue to pass the value in %xmm0 to a function
which takes a float like this one:

float foo(float a)
{
  return a;
}

With foo(hadd(something)) you get:
error: incompatible type for argument 1 of `foo'

So consider this instead:

inline float hadd(v4sf src)
{
  src = __builtin_ia32_addps(src, __builtin_ia32_movhlps(src, src));
  return (float)__builtin_ia32_addss(src, __builtin_ia32_shufps(src,
src, 0xE5));
}

Oops casting v4sf to float doesn't work even if there is
no difference to the register the value is and will be returned in.
This essentially prevents one from writing effective inline functions
that should not store return values on the stack or somewhere else in
memory.

I really would like to know if there is a way around it.

Jon Daniel

gcc-3.4.4 casting builtin float __vector__ to float for use in x86-64 register return convention of an inline function

gcc-3.4.4 casting builtin float vector to float for use in x86-64 register return convention of an inline function