GCC appears to try to pass half and quad-precision floating point numbers (_Float16 and __float128) using the integer calling convention on x86-64 MinGW. They should get passed with the floating point CC in SSE registers, like other floats, as is done by GCC x86-64 linux, Clang x86-64 linux, and Clang x86-64 MinGW. Sample: void f16_ext(_Float16); void f16_entry(float _, _Float16 a) { asm("nop # marker"); f16_ext(a); } void f32_ext(float); void f32_entry(float _, float a) { asm("nop # marker"); f32_ext(a); } void f64_ext(double); void f64_entry(float _, double a) { asm("nop # marker"); f64_ext(a); } void f128_ext(__float128); void f128_entry(float _, __float128 a) { asm("nop # marker"); f128_ext(a); } Incorrect output from GCC on x64 MinGW (O2): f16_entry: mov ecx, edx nop # marker jmp f16_ext f32_entry: movaps xmm0, xmm1 nop # marker jmp f32_ext f64_entry: movapd xmm0, xmm1 nop # marker jmp f64_ext f128_entry: sub rsp, 56 movdqa xmm0, XMMWORD PTR [rdx] nop # marker lea rcx, 32[rsp] movaps XMMWORD PTR 32[rsp], xmm0 call f128_ext nop add rsp, 56 ret Correct output from GCC on x64 Linux (O2): f16_entry: movaps xmm0, xmm1 nop # marker jmp f16_ext f32_entry: movaps xmm0, xmm1 nop # marker jmp f32_ext f64_entry: movapd xmm0, xmm1 nop # marker jmp f64_ext f128_entry: movdqa xmm0, xmm1 nop # marker jmp f128_ext Correct output from Clang on both x64 Linux and x64 MinGW: f16_entry: # @f16_entry movaps xmm0, xmm1 nop # marker jmp f16_ext # TAILCALL f32_entry: # @f32_entry movaps xmm0, xmm1 nop # marker jmp f32_ext # TAILCALL f64_entry: # @f64_entry movaps xmm0, xmm1 nop # marker jmp f64_ext # TAILCALL f128_entry: # @f128_entry movaps xmm0, xmm1 nop # marker jmp f128_ext # TAILCALL Tested with GCC 13.1. Link: https://gcc.godbolt.org/z/hdojahes5
What does msvc do?
Does Microsoft's abi documents this case? If not then gcc is as correct here as clang is.
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#parameter-passing So it uses floating point as the type. But then it is vague on those kind of type. Gcc treats _Float16 similar to how __m64 and __m128 are passed that is via integer registers.
To my knowledge, MSVC does not support or specify an ABI for 16- or 128-bit IEEE floating point types, so I do suppose that either GCC or Clang could be considered correct here. SysV doesn't say anything about f16 but does clarify that f128 should be SSE: > Arguments of types __float128, _Decimal128 and __m128 are split into two halves. The least significant ones belong to class SSE, the most significant one to class SSEUP Falling back to what SysV says seems reasonable to me since MSVC doesn't provide any guidance, and passing via xmm is better register use anyway. Is there any reason not to match SysV and Clang here? One side needs to change, the mismatch is causing problems with rt math symbols.
Looks like bugz cut off part of the sysv quote, here for reference: > Arguments of types __float128, _Decimal128 and __m128 are split > into two halves. The least significant ones belong to class SSE, the most > significant one to class SSEUP.
According to: https://cs61.seas.harvard.edu/site/pdf/x86-64-abi-20210928.pdf > Arguments of types _Float16, float, double, _Decimal32, _Decimal64 and __m64 are in class SSE. So `_Float16` is SSE as well.
(In reply to connor horman from comment #6) > According to: https://cs61.seas.harvard.edu/site/pdf/x86-64-abi-20210928.pdf > > > Arguments of types _Float16, float, double, _Decimal32, _Decimal64 and __m64 are > in class SSE. > > So `_Float16` is SSE as well. I must have been looking at an ancient version of the psABI, it looks like _Float16 was added five years ago [1]. Thanks for the update. I think all of this is substantial enough reasoning for GCC to be the one to change its behavior. If Microsoft winds up adding these types then it seems rather likely for them to follow the precedent of SysV and their own float/double ABI, especially if the MinGW ecosystem does the same. Doesn't help anything here but I found some discussion on MSVC _Float16/_Float128 at [2]. [1]: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/71d1183e7bb95e9f8ad732e0f2b5a4f127796e2a [2]: https://developercommunity.visualstudio.com/t/Implement-the-C23-Extended-Precision-P/10374212?viewtype=all
After some more digging it sounds like passing _Float128 arguments indirectly is the correct thing to do, which is GCC's current behavior. Excerpts from the calling convention [1]: > Integer arguments are passed in registers RCX, RDX, R8, and R9. > Floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. > 16-byte arguments are passed by reference. And > __m128 types, arrays, and strings are never passed by immediate > value. Instead, a pointer is passed to memory allocated by the > caller. Additionally, varargs don't work correctly if _Float128 is passed by value (thanks lh_mouse on IRC for pointing this out). So GCC is doing the right thing for arguments, and I have a WIP patch to make LLVM agree [2]. However, the return section of the calling convention says the following: > A scalar return value that can fit into 64 bits, including the __m64 > type, is returned through RAX. Non-scalar types including floats, > doubles, and vector types such as __m128, __m128i, __m128d are > returned in XMM0. My read of that is that even though _Float128 needs to be passed indirectly, it should be returned in XMM0. GCC currently returns on the stack so I am thinking this should change. This would make _Float128 share the same calling convention as __m128. For _Float16, it seems like following _Float32/_Float64 would be reasonable since the calling convention generally seems to indicate that floats get passed in vector registers, and it fits. In short, the changes I would propose are: - GCC change to return _Float128 in XMM0, continue passing on the stack - GCC change to pass and return _Float16 in XMM - Clang change to pass _Float128 indirectly, continue returning in XMM0 Obviously there is some extrapolations here since the type isn't officially supported, but this seems more or less consistent with existing types. Andrew, do you have any thoughts here? [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: https://github.com/llvm/llvm-project/pull/115052