Bug 91970 - arm: 64bit int to double conversion does not respect rounding mode
Summary: arm: 64bit int to double conversion does not respect rounding mode
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 94646 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-02 15:52 UTC by nsz
Modified: 2024-04-11 01:33 UTC (History)
2 users (show)

See Also:
Host:
Target: arm*-*hf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-04-18 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nsz 2019-10-02 15:52:57 UTC
on arm-* with

#include <fenv.h>
#include <stdio.h>
int main()
{
	long long x = (1LL << 60) - 1;
	double y;
	fesetround(FE_DOWNWARD);
	__asm__ __volatile__ ("" : "+m" (x));
	y = x;
	__asm__ __volatile__ ("" : "+m" (y));
	fesetround(FE_TONEAREST);
	printf("%a\n", y);
}

i get

0x1p60

instead of

0x1.fffffffffffffp+59

i assume this is because the conversion is handled by __aeabi_l2d
(also known as __floatdidf in libgcc) which is not rounding mode
aware.

this affects hardfloat targets which otherwise support directed
rounding modes.
Comment 1 nsz 2019-10-02 16:24:47 UTC
floating-point exceptions are also missing for the same reason.
Comment 2 Andreas Schwab 2019-10-02 16:55:05 UTC
Don't you need #pragma STDC FENV_ACCESS?
Comment 3 nsz 2019-10-02 17:11:47 UTC
(In reply to Andreas Schwab from comment #2)
> Don't you need #pragma STDC FENV_ACCESS?

yes, for iso c conformance you need it, but gcc does not
handle it anyway, instead it requires -frounding-math.

however if double prec instructions are available, using them
may be even faster in the difficult inexact case, e.g.

double uconv64(uint64_t x)
{
double lo = uconv32(x); // single instruction, always exact
double hi = uconv32(x>>32);
return lo + hi*0x1p32;
}

so i would not make the fix depend on -frounding-math,
just always use hardfloat instructions on hardfloat targets
to do the conv.

(i suspect it affects more than just armhf)
Comment 4 jsm-csl@polyomino.org.uk 2019-10-02 19:55:54 UTC
The libgcc2.c functions for conversions that get used by default on most 
architectures should respect the rounding mode if the underlying 
single-word-to-floating-point instruction does so.  (They have more 
complicated logic to handle the __floatdisf case and other such cases 
where a single word is wider than the precision of the floating-point 
type.)

The libgcc2.c __fix* functions may raise spurious "inexact" exceptions, 
but that's a different matter.
Comment 5 nsz 2019-10-03 09:53:51 UTC
ok so the real problem is that libgcc does not define
FP_INIT_ROUNDMODE and FP_HANDLE_EXCEPTIONS etc for
hardfloat arm targets.
Comment 6 jsm-csl@polyomino.org.uk 2019-10-03 17:33:20 UTC
Arm only uses soft-fp for Thumb-1; otherwise it uses Arm-specific assembly 
code.

A natural fix might be, for these particular functions, in the hard-float 
case, to use the libgcc2.c versions instead; I expect that's faster than 
doing them entirely as software floating-point while respecting exceptions 
and rounding modes.  Arguably for the other functions, in the hard-float 
case, it would be best to build the t-hardfp versions to keep the libgcc 
size down, instead of the assembly versions.  In both cases you'd need to 
make sure the functions have the ABI-specified names, that they get 
properly built for the base PCS rather than the VFP variant, where the ABI 
requires that, and that any other ABI requirements are met (the default 
libgcc versions, compiled as C code, might not always follow the same 
interface as the ABI-defined functions).

The only case where soft-fp support for exceptions and rounding modes 
seems likely to be useful for Arm is those processors that do 
single-precision-only floating point in hardware, for which use of soft-fp 
for double precision would make sense in order to support exceptions and 
rounding modes (along with t-hardfp for single precision).  (Also, I 
suppose, use of soft-fp with exceptions and rounding modes would make 
sense for half-precision on hardware with floating-point support only for 
single and maybe double precision.)
Comment 7 nsz 2019-10-07 12:47:58 UTC
i think the code snippet i posted is more efficient and significantly
smaller than using libgcc (which also sounds hard to wire up to do the
right thing). the code sequence can possibly be even inlined.
(and i don't mind if ucontrollers with single precision only fpu
don't have correct fenv behaviour)
Comment 8 jsm-csl@polyomino.org.uk 2019-10-07 21:52:02 UTC
The code snippet you posted looks like exactly what libgcc2.c would 
typically do for __floatundidf.  Given that, I'd prefer building the 
relevant function from libgcc2.c rather than having an Arm-specific 
duplicate.

  /* When the word size is small, we never get any rounding error.  */
  FSTYPE f = (UWtype) (u >> W_TYPE_SIZE);
  f *= Wtype_MAXp1_F;
  f += (UWtype)u;
  return f;
Comment 9 nsz 2019-10-08 09:12:55 UTC
ok i was looking at the wrong code, didn't know libgcc2,
i agree that's the right way to fix this.
Comment 10 nsz 2020-04-18 08:33:35 UTC
*** Bug 94646 has been marked as a duplicate of this bug. ***
Comment 11 nsz 2020-04-18 08:41:54 UTC
confirmed
Comment 12 Rich Felker 2020-04-18 16:47:42 UTC
There's some awful hand-written asm in libgcc/config/arm/ieee754-df.S replacing the standard libgcc2.c versions; that's the problem. But in order to use the latter it would need to be compiled with -mfloat-abi=softfp since the __aeabi_l2d function (and all the __aeabi_* apparently) use the standard soft-float EABI even on EABIHF targets.

I'm not sure why you want a library function to be called for this on hardfloat targets anyway. Inlining the hi*0x1p32+lo is almost surely smaller than the function call, counting spills and conversion of the result back from GP registers to an FP register. It seems like GCC should be able to inline this idiom at a high level for *all* targets that lack a floatdidf operation but have floatsidf.

Of course a high level fix is going to be hell to backport, and this really needs a backportable fix or workaround (maintained in mcm not upstream gcc) from musl perspective. Maybe the easiest way to do that is just to hack the right preprocessor conditions for a hardfloat implementation into ieee754-df.S...
Comment 13 jsm-csl@polyomino.org.uk 2020-04-20 22:09:08 UTC
bpabi-lib.h is the existing mechanism for renaming libgcc2.c functions and 
declaring them with __attribute__((pcs("aapcs"))).  (But when causing more 
such functions to be used, or such functions to be used when they weren't 
before, we'd need to make sure all the declarations therein are being 
applied properly in the new cases to give the functions the desired ABI.)
Comment 14 Thiago Jung Bauermann 2024-04-11 00:55:02 UTC
This came up in the context of a Linux kernel patch series providing a "Unified cross-architecture kernel-mode FPU API" because the AMD GPU driver fails to build on arm with the series applied because of this issue:

https://lore.kernel.org/linux-arm-kernel/75a37a4b-f516-40a3-b6b5-4aa1636f9b60@sifive.com/

Copying the patch author's response here:

> In both cases, the issue is that the toolchain requires runtime support to
> convert between `unsigned long long` and `double`, even when hardware FP is
> enabled. There was some past discussion about GCC inlining some of these
> conversions[1], but that did not get implemented.
> 
> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT`
> for 32-bit arm until we can provide these runtime library functions.
Comment 15 Andrew Pinski 2024-04-11 01:03:03 UTC
(In reply to Thiago Jung Bauermann from comment #14)
> This came up in the context of a Linux kernel patch series providing a
> "Unified cross-architecture kernel-mode FPU API" because the AMD GPU driver
> fails to build on arm with the series applied because of this issue:
> 
> https://lore.kernel.org/linux-arm-kernel/75a37a4b-f516-40a3-b6b5-
> 4aa1636f9b60@sifive.com/

That is a totally unrelated issue from this one. The issue there is arm EABI defines the function names rather than GCC and GCC follows the ARM EABI here (so does clang/LLVM). so you would need to do a similar renaming for the functions on arm (32bit).
Comment 16 Thiago Jung Bauermann 2024-04-11 01:33:59 UTC
(In reply to Andrew Pinski from comment #15)
> (In reply to Thiago Jung Bauermann from comment #14)
> > This came up in the context of a Linux kernel patch series providing a
> > "Unified cross-architecture kernel-mode FPU API" because the AMD GPU driver
> > fails to build on arm with the series applied because of this issue:
> > 
> > https://lore.kernel.org/linux-arm-kernel/75a37a4b-f516-40a3-b6b5-
> > 4aa1636f9b60@sifive.com/
> 
> That is a totally unrelated issue from this one. The issue there is arm EABI
> defines the function names rather than GCC and GCC follows the ARM EABI here
> (so does clang/LLVM). so you would need to do a similar renaming for the
> functions on arm (32bit).

Thank you! I relayed your point (hopefully correctly) in the email thread above.