This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] builtin fadd variants implementation
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Tejas Joshi <tejasjoshi9673 at gmail dot com>
- Cc: <gcc-patches at gcc dot gnu dot org>, Martin Jambor <mjambor at suse dot cz>
- Date: Mon, 2 Sep 2019 16:38:51 +0000
- Subject: Re: [PATCH] builtin fadd variants implementation
- Ironport-sdr: ese5qF6zGgRwUcAaO82DCrJLyD0ORBzZGe8H3Ao5+HXb4BXsUMcKvgxIf5wExRrs8pAPGa1Uq+ XCAOzf2RennNzwaKwlY1DFni8BckMhzcSVUbftGLaCgnO83rbW6bPnnTAsFlpsOg0CS25BHD8U anuXucZFRHm2Rdv48d3qw9Z7hCqRObAWx8DwofQrd/GqiLnZzQTb8qoa0sr/WB3Yp5sasuMFSI 73ix7X2TE8OcIDp9IxrItFIs3CGqbyDF+krXz4NnT4DFacJxpTKoM1TVO1qtXF8Os2YzqYoN3x eLw=
- Ironport-sdr: AxnTFDmBGIJburp0Ucz4aEEfCoU2eFptYg3IXy+f9qxaHL/F0cImgnCNJKbMfhIYdlfkf09slb KMWW95T3D/odSXYg3NTXAFM4Uwg8FBWNossOos88ReXMNpTKo3+0s+frfRqViRdnbW/eNrmN/x phiGiyBWeQ/yZnoMLrl6Sj3bSUKPWcTeBpIXxdVUgzEmDvD8+BBCC/B6X6sxHWrUneEvJqR4cF NaKCBGN+FWK6H/YLowiyBi0DSgXkKeyjAGKV5qqBPcDiniErcVoM36PQOak6eDjhYPR4g0ZkeJ CZs=
- References: <CACMrGjA_V1D9UJ=Ldy+HfkCub3Tw6rdFAVr0nuxvtu6aLGZgYA@mail.gmail.com> <alpine.DEB.2.21.1908211634430.19918@digraph.polyomino.org.uk> <CACMrGjCjsWpp9YCmN5W23sOT=+xMcLA8QbZbdvUz7zgoxsOChw@mail.gmail.com> <alpine.DEB.2.21.1908272239380.31674@digraph.polyomino.org.uk> <CACMrGjDSOKSqZvduQUPEGCzje1uZD3o7Op0qH7v4VL2kNuWL4g@mail.gmail.com>
On Mon, 2 Sep 2019, Tejas Joshi wrote:
> Hello.
> Should a result like 1.4 be considered as inexact if truncating
> (narrowing?) from double to float? (due to loss of trailing bits)
If the mathematical result of the arithmetic operation is literally the
decimal number 1.4, as opposed to the double value represented by the C
constant 1.4 which is actually 0x1.6666666666666p+0, then it is inexact
regardless of the (non-decimal) types involved. For example, fdiv (7, 5),
ddivl (7, 5), etc. are always inexact.
If the mathematical result of the arithmetic operation is
0x1.6666666666666p+0, the closest approximation to 1.4 in IEEE binary64,
then it is inexact for result formats narrower than binary64 and exact for
result formats that can represent that value. For example, fadd (1.4,
0.0) is inexact (the truncation to float is inexact although the addition
is exact). But daddl (1.4, 0.0) - note the arguments are double
constants, not long double - is exact, because the mathematical result is
exactly representable in double. Whereas daddl (1.4L, 0.0L) would be
inexact if long double is wider than double.
The question is always whether the infinite-precision mathematical result
of the arithmetic operation - which takes values representable in its
argument types - is exactly representable in the final result type.
--
Joseph S. Myers
joseph@codesourcery.com