Bug 115054 - __float128 and _Float16 use incorrect ABI on x86-64 MinGW
Summary: __float128 and _Float16 use incorrect ABI on x86-64 MinGW
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: ABI
Depends on:
Blocks:
 
Reported: 2024-05-12 21:28 UTC by Trevor Gross
Modified: 2024-11-23 02:45 UTC (History)
3 users (show)

See Also:
Host:
Target: X86_64-mingw
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Trevor Gross 2024-05-12 21:28:49 UTC
GCC appears to try to pass half and quad-precision floating point numbers (_Float16 and __float128) using the integer calling convention on x86-64 MinGW. They should get passed with the floating point CC in SSE registers, like other floats, as is done by GCC x86-64 linux, Clang x86-64 linux, and Clang x86-64 MinGW.

Sample:

    void f16_ext(_Float16);
    void f16_entry(float _, _Float16 a) {
        asm("nop # marker");
        f16_ext(a);
    }

    void f32_ext(float);
    void f32_entry(float _, float a) {
        asm("nop # marker");
        f32_ext(a);
    }

    void f64_ext(double);
    void f64_entry(float _, double a) {
        asm("nop # marker");
        f64_ext(a);
    }

    void f128_ext(__float128);
    void f128_entry(float _, __float128 a) {
        asm("nop # marker");
        f128_ext(a);
    }

Incorrect output from GCC on x64 MinGW (O2):

    f16_entry:
        mov     ecx, edx
        nop # marker
        jmp     f16_ext
    f32_entry:
        movaps  xmm0, xmm1
        nop # marker
        jmp     f32_ext
    f64_entry:
        movapd  xmm0, xmm1
        nop # marker
        jmp     f64_ext
    f128_entry:
        sub     rsp, 56
        movdqa  xmm0, XMMWORD PTR [rdx]
        nop # marker
        lea     rcx, 32[rsp]
        movaps  XMMWORD PTR 32[rsp], xmm0
        call    f128_ext
        nop
        add     rsp, 56
        ret

Correct output from GCC on x64 Linux (O2):

    f16_entry:
        movaps  xmm0, xmm1
        nop # marker
        jmp     f16_ext
    f32_entry:
        movaps  xmm0, xmm1
        nop # marker
        jmp     f32_ext
    f64_entry:
        movapd  xmm0, xmm1
        nop # marker
        jmp     f64_ext
    f128_entry:
        movdqa  xmm0, xmm1
        nop # marker
        jmp     f128_ext

Correct output from Clang on both x64 Linux and x64 MinGW:

    f16_entry:                              # @f16_entry
        movaps  xmm0, xmm1
        nop     # marker
        jmp     f16_ext                         # TAILCALL
    f32_entry:                              # @f32_entry
        movaps  xmm0, xmm1
        nop     # marker
        jmp     f32_ext                         # TAILCALL
    f64_entry:                              # @f64_entry
        movaps  xmm0, xmm1
        nop     # marker
        jmp     f64_ext                         # TAILCALL
    f128_entry:                             # @f128_entry
        movaps  xmm0, xmm1
        nop     # marker
        jmp     f128_ext                        # TAILCALL


Tested with GCC 13.1. Link: https://gcc.godbolt.org/z/hdojahes5
Comment 1 Andrew Pinski 2024-05-12 21:35:11 UTC
What does msvc do?
Comment 2 Andrew Pinski 2024-05-12 21:37:00 UTC
Does Microsoft's abi documents this case?

If not then gcc is as correct here as clang is.
Comment 3 Andrew Pinski 2024-05-12 21:42:52 UTC
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#parameter-passing

So it uses floating point as the type. But then it is vague on those kind of type. Gcc treats _Float16 similar to how __m64 and __m128 are passed that is via integer registers.
Comment 4 Trevor Gross 2024-05-12 22:37:52 UTC
To my knowledge, MSVC does not support or specify an ABI for 16- or 128-bit IEEE floating point types, so I do suppose that either GCC or Clang could be considered correct here.

SysV doesn't say anything about f16 but does clarify that f128 should be SSE:

> Arguments of types __float128, _Decimal128 and __m128 are split
into two halves. The least significant ones belong to class SSE, the most
significant one to class SSEUP

Falling back to what SysV says seems reasonable to me since MSVC doesn't provide any guidance, and passing via xmm is better register use anyway. Is there any reason not to match SysV and Clang here? One side needs to change, the mismatch is causing problems with rt math symbols.
Comment 5 Trevor Gross 2024-05-12 22:43:39 UTC
Looks like bugz cut off part of the sysv quote, here for reference:

> Arguments of types __float128, _Decimal128 and __m128 are split
> into two halves. The least significant ones belong to class SSE, the most
> significant one to class SSEUP.
Comment 6 connor horman 2024-08-23 15:34:18 UTC
According to: https://cs61.seas.harvard.edu/site/pdf/x86-64-abi-20210928.pdf

> Arguments of types _Float16, float, double, _Decimal32, _Decimal64 and __m64 are
in class SSE.

So `_Float16` is SSE as well.
Comment 7 Trevor Gross 2024-08-23 21:01:37 UTC
(In reply to connor horman from comment #6)
> According to: https://cs61.seas.harvard.edu/site/pdf/x86-64-abi-20210928.pdf
> 
> > Arguments of types _Float16, float, double, _Decimal32, _Decimal64 and __m64 are
> in class SSE.
> 
> So `_Float16` is SSE as well.

I must have been looking at an ancient version of the psABI, it looks like _Float16 was added five years ago [1]. Thanks for the update.

I think all of this is substantial enough reasoning for GCC to be the one to change its behavior. If Microsoft winds up adding these types then it seems rather likely for them to follow the precedent of SysV and their own float/double ABI, especially if the MinGW ecosystem does the same.

Doesn't help anything here but I found some discussion on MSVC _Float16/_Float128 at [2].

[1]: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/71d1183e7bb95e9f8ad732e0f2b5a4f127796e2a
[2]: https://developercommunity.visualstudio.com/t/Implement-the-C23-Extended-Precision-P/10374212?viewtype=all
Comment 8 Trevor Gross 2024-11-05 21:30:05 UTC
After some more digging it sounds like passing _Float128 arguments indirectly is the correct thing to do, which is GCC's current behavior. Excerpts from the calling convention [1]:

> Integer arguments are passed in registers RCX, RDX, R8, and R9.
> Floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L.
> 16-byte arguments are passed by reference. 

And

> __m128 types, arrays, and strings are never passed by immediate
> value. Instead, a pointer is passed to memory allocated by the
> caller.

Additionally, varargs don't work correctly if _Float128 is passed by value (thanks lh_mouse on IRC for pointing this out). So GCC is doing the right thing for arguments, and I have a WIP patch to make LLVM agree [2].

However, the return section of the calling convention says the following:

> A scalar return value that can fit into 64 bits, including the __m64
> type, is returned through RAX. Non-scalar types including floats,
> doubles, and vector types such as __m128, __m128i, __m128d are
> returned in XMM0.

My read of that is that even though _Float128 needs to be passed indirectly, it should be returned in XMM0. GCC currently returns on the stack so I am thinking this should change. This would make _Float128 share the same calling convention as __m128.

For _Float16, it seems like following _Float32/_Float64 would be reasonable since the calling convention generally seems to indicate that floats get passed in vector registers, and it fits.

In short, the changes I would propose are:

- GCC change to return _Float128 in XMM0, continue passing on the stack
- GCC change to pass and return _Float16 in XMM
- Clang change to pass _Float128 indirectly, continue returning in XMM0

Obviously there is some extrapolations here since the type isn't officially supported, but this seems more or less consistent with existing types.

Andrew, do you have any thoughts here?

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: https://github.com/llvm/llvm-project/pull/115052