[PATCH] builtin fadd variants implementation

Joseph Myers joseph@codesourcery.com
Mon Sep 2 16:38:00 GMT 2019


On Mon, 2 Sep 2019, Tejas Joshi wrote:

> Hello.
> Should a result like 1.4 be considered as inexact if truncating
> (narrowing?) from double to float? (due to loss of trailing bits)

If the mathematical result of the arithmetic operation is literally the 
decimal number 1.4, as opposed to the double value represented by the C 
constant 1.4 which is actually 0x1.6666666666666p+0, then it is inexact 
regardless of the (non-decimal) types involved.  For example, fdiv (7, 5), 
ddivl (7, 5), etc. are always inexact.

If the mathematical result of the arithmetic operation is 
0x1.6666666666666p+0, the closest approximation to 1.4 in IEEE binary64, 
then it is inexact for result formats narrower than binary64 and exact for 
result formats that can represent that value.  For example, fadd (1.4, 
0.0) is inexact (the truncation to float is inexact although the addition 
is exact).  But daddl (1.4, 0.0) - note the arguments are double 
constants, not long double - is exact, because the mathematical result is 
exactly representable in double.  Whereas daddl (1.4L, 0.0L) would be 
inexact if long double is wider than double.

The question is always whether the infinite-precision mathematical result 
of the arithmetic operation - which takes values representable in its 
argument types - is exactly representable in the final result type.

-- 
Joseph S. Myers
joseph@codesourcery.com



More information about the Gcc-patches mailing list