Bug 111709 - [13/14/15 Regression] Miscompilation of sysdeps/ieee754/dbl-64/s_fma.c since r13-1268-g8c99e307b20c50
Summary: [13/14/15 Regression] Miscompilation of sysdeps/ieee754/dbl-64/s_fma.c since ...
Status: RESOLVED MOVED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 13.1.0
: P3 normal
Target Milestone: 13.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2023-10-05 17:49 UTC by John David Anglin
Modified: 2025-02-06 18:31 UTC (History)
8 users (show)

See Also:
Host: hppa*-*-linux*
Target: hppa*-*-linux*
Build: hppa*-*-linux*
Known to work:
Known to fail:
Last reconfirmed: 2024-05-02 00:00:00


Attachments
Preproccessed source generated using gcc-12 (7.77 KB, text/plain)
2023-10-05 17:49 UTC, John David Anglin
Details
Preprocessed source generated using gcc-13 (7.97 KB, text/plain)
2023-10-05 17:50 UTC, John David Anglin
Details
.s file for s_fma.c generated using gcc-12 (12.00 KB, text/plain)
2023-10-05 17:51 UTC, John David Anglin
Details
.s file for s_fma.c generated using gcc-13 (11.60 KB, text/plain)
2023-10-05 17:52 UTC, John David Anglin
Details
non pic .s file for s_fma.c generated using gcc-12 without debug info (2.88 KB, text/plain)
2023-10-05 18:01 UTC, John David Anglin
Details
non pic .s file for s_fma.c generated using gcc-13 without debug info (2.76 KB, text/plain)
2023-10-05 18:01 UTC, John David Anglin
Details
Diff between s_fma_12.s and s_fma_13.s (3.85 KB, text/plain)
2023-10-05 18:06 UTC, John David Anglin
Details
Patch (402 bytes, patch)
2025-02-06 15:41 UTC, John David Anglin
Details | Diff
Patch (345 bytes, patch)
2025-02-06 15:45 UTC, John David Anglin
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John David Anglin 2023-10-05 17:49:41 UTC
Created attachment 56056 [details]
Preproccessed source generated using gcc-12

This fail occurs with gcc version 13.1.0 (Debian 13.1.0-8) in the .

dave@mx3210:~/gnu/glibc/objdir$ make test t=math/test-double-fma
make -r PARALLELMFLAGS="" -C ../glibc objdir=`pwd` test
make[1]: Entering directory '/home/dave/gnu/glibc/glibc'
make subdir=math -C math/ ..=../ /home/dave/gnu/glibc/objdir/math/test-double-fma.out
make[2]: Entering directory '/home/dave/gnu/glibc/glibc/math'
gcc -o /home/dave/gnu/glibc/objdir/math/test-double-fma -nostdlib -nostartfiles     -Wl,-z,relro  /home/dave/gnu/glibc/objdir/csu/crt1.o /home/dave/gnu/glibc/objdir/csu/crti.o `gcc  --print-file-name=crtbegin.o` /home/dave/gnu/glibc/objdir/math/test-double-fma.o /home/dave/gnu/glibc/objdir/support/libsupport_nonshared.a /home/dave/gnu/glibc/objdir/math/libm-test-support-double.o /home/dave/gnu/glibc/objdir/math/libm.so.6  -Wl,-dynamic-linker=/lib/ld.so.1 -Wl,-rpath-link=/home/dave/gnu/glibc/objdir:/home/dave/gnu/glibc/objdir/math:/home/dave/gnu/glibc/objdir/elf:/home/dave/gnu/glibc/objdir/dlfcn:/home/dave/gnu/glibc/objdir/nss:/home/dave/gnu/glibc/objdir/nis:/home/dave/gnu/glibc/objdir/rt:/home/dave/gnu/glibc/objdir/resolv:/home/dave/gnu/glibc/objdir/mathvec:/home/dave/gnu/glibc/objdir/support:/home/dave/gnu/glibc/objdir/crypt:/home/dave/gnu/glibc/objdir/nptl -lgcc -Wl,--as-needed -lgcc_s  -Wl,--no-as-needed /home/dave/gnu/glibc/objdir/libc.so.6 /home/dave/gnu/glibc/objdir/libc_nonshared.a -Wl,--as-needed /home/dave/gnu/glibc/objdir/elf/ld.so -Wl,--no-as-needed -lgcc -Wl,--as-needed -lgcc_s  -Wl,--no-as-needed `gcc  --print-file-name=crtend.o` /home/dave/gnu/glibc/objdir/csu/crtn.o
env GCONV_PATH=/home/dave/gnu/glibc/objdir/iconvdata LOCPATH=/home/dave/gnu/glibc/objdir/localedata LC_ALL=C   /home/dave/gnu/glibc/objdir/elf/ld.so.1 --library-path /home/dave/gnu/glibc/objdir:/home/dave/gnu/glibc/objdir/math:/home/dave/gnu/glibc/objdir/elf:/home/dave/gnu/glibc/objdir/dlfcn:/home/dave/gnu/glibc/objdir/nss:/home/dave/gnu/glibc/objdir/nis:/home/dave/gnu/glibc/objdir/rt:/home/dave/gnu/glibc/objdir/resolv:/home/dave/gnu/glibc/objdir/mathvec:/home/dave/gnu/glibc/objdir/support:/home/dave/gnu/glibc/objdir/crypt:/home/dave/gnu/glibc/objdir/nptl /home/dave/gnu/glibc/objdir/math/test-double-fma  > /home/dave/gnu/glibc/objdir/math/test-double-fma.out; \
../scripts/evaluate-test.sh math/test-double-fma $? false false > /home/dave/gnu/glibc/objdir/math/test-double-fma.test-result
make[2]: Leaving directory '/home/dave/gnu/glibc/glibc/math'
FAIL: math/test-double-fma
original exit status 1
testing double (without inline functions)
Failure: fma (-0x7.ffffffffffffp-1024, 0x8.0000000000008p-4, -0x4p-1076): Exception "Underflow" set
Failure: fma (0x7.ffffffffffffp-1024, 0x8.0000000000008p-4, 0x4p-1076): Exception "Underflow" set
Failure: fma_downward (-0x4p-1076, 0x8.8p-4, -0x3.ffffffffffffcp-1024): Exception "Underflow" set
Failure: fma_downward (-0x7.ffffffffffffp-1024, 0x8.0000000000008p-4, -0x4p-1076): Exception "Underflow" set
Failure: Test: fma_upward (-0x3.ffffffffffffep-712, 0x3.ffffffffffffep-276, 0x3.fffffc0000ffep-984)
Result:
 is:          1.8348707892449242e-296   0x1.7ffffe00007ffp-983
 should be:   1.8348707892449245e-296   0x1.7ffffe0000800p-983
 difference:  2.7161546124355486e-312   0x0.0008000000000p-1022
 ulp       :  1.0000
 max.ulp   :  0.0000
Failure: fma_upward (0x4p-1076, 0x8.8p-4, 0x3.ffffffffffffcp-1024): Exception "Underflow" set
Failure: fma_upward (0x7.ffffffffffffp-1024, 0x8.0000000000008p-4, 0x4p-1076): Exception "Underflow" set

Test suite completed:
  2524 test cases plus 2520 tests for exception flags and
    2520 tests for errno executed.
  7 errors occurred.
make[1]: Leaving directory '/home/dave/gnu/glibc/glibc'

Test doesn't fail with gcc-12.

Similar fails:
FAIL: math/test-double-ldouble-fma
FAIL: math/test-float32x-float64-fma
FAIL: math/test-float32x-fma
FAIL: math/test-float64-fma
FAIL: math/test-ldouble-fma

If s_fma.c is compiled with gcc-12, these fma fails don't occur.

This is glibc BZ 30664.

This is compile command for s_fma.c:

gcc-13 ../sysdeps/ieee754/dbl-64/s_fma.c -c -std=gnu11 -fgnu89-inline  -g -O2 -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wp,-U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fno-math-errno    -fPIC   -fno-builtin-fmal -fno-builtin-fmaf32x -fno-builtin-fmaf64      -DNO_LONG_DOUBLE -I../include -I/home/dave/gnu/glibc/objdir/math  -I/home/dave/gnu/glibc/objdir  -I../sysdeps/unix/sysv/linux/hppa  -I../sysdeps/hppa/nptl  -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux  -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  -I../sysdeps/unix  -I../sysdeps/posix  -I../sysdeps/hppa/hppa1.1  -I../sysdeps/wordsize-32  -I../sysdeps/ieee754/flt-32  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/hppa/fpu  -I../sysdeps/hppa  -I../sysdeps/ieee754  -I../sysdeps/generic  -I.. -I../libio -I. -nostdinc -isystem /usr/lib/gcc/hppa-linux-gnu/13/include -isystem /usr/include -D_LIBC_REENTRANT -include /home/dave/gnu/glibc/objdir/libc-modules.h -DMODULE_NAME=libm -include ../include/libc-symbols.h  -DPIC -DSHARED     -DTOP_NAMESPACE=glibc -o /home/dave/gnu/glibc/objdir/math/s_fma.os -MD -MP -MF /home/dave/gnu/glibc/objdir/math/s_fma.os.dt -MT /home/dave/gnu/glibc/objdir/math/s_fma.os
Comment 1 John David Anglin 2023-10-05 17:50:34 UTC
Created attachment 56057 [details]
Preprocessed source generated using gcc-13
Comment 2 John David Anglin 2023-10-05 17:51:45 UTC
Created attachment 56058 [details]
.s file for s_fma.c generated using gcc-12
Comment 3 John David Anglin 2023-10-05 17:52:33 UTC
Created attachment 56059 [details]
.s file for s_fma.c generated using gcc-13
Comment 4 John David Anglin 2023-10-05 18:01:18 UTC
Created attachment 56060 [details]
non pic .s file for s_fma.c generated using gcc-12 without debug info
Comment 5 John David Anglin 2023-10-05 18:01:51 UTC
Created attachment 56061 [details]
non pic .s file for s_fma.c generated using gcc-13 without debug info
Comment 6 John David Anglin 2023-10-05 18:06:54 UTC
Created attachment 56062 [details]
Diff between s_fma_12.s and s_fma_13.s
Comment 7 John David Anglin 2023-10-05 18:25:30 UTC
Joseph, is there a way to simplify the glibc test to the failing cases?

Maybe you have a clue as to what has changed.
Comment 8 jsm-csl@polyomino.org.uk 2023-10-05 20:33:42 UTC
Typically these sorts of issues result from floating-point operations 
being moved past environment manipulation (fesetround, feupdateenv, 
feholdexcept, etc.) - in either direction.  This might be a compiler 
issue, or it might well be a bug in the glibc function implementation 
(insufficient use of math_opt_barrier / math_force_eval to prevent such 
movement).  If the latter, make sure to fix it in all similar 
implementations of fma functions, not just the dbl-64 one.
Comment 9 Richard Biener 2023-10-06 07:50:27 UTC
Does it work on trunk?
Comment 10 dave.anglin 2023-10-07 22:33:14 UTC
On 2023-10-06 3:50 a.m., rguenth at gcc dot gnu.org wrote:
> Does it work on trunk? 
No.  Test results with gcc trunk are identical to with Debian gcc-13.

Tried just rebuilding s_fma.c, and a full build and check.
Comment 11 John David Anglin 2023-10-11 15:36:37 UTC
This is proving difficult to bisect due to _Floatn issues.

I know commit b85e79dce149df68b92ef63ca2a40ff1dfa61396 is good and
commit b939a5cc4143908ddda4b85a848c313136ff6e0c is bad.

The following glibc change breaks gcc build when BASE-VER changes to 13.
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=3e5760fcb48528d48deeb60cb885a97bb731160c

If I change __GNUC_PREREQ to 13, 1, I hit errors like:

In file included from /home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/incl
ude/cmath:45,
                 from /home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/incl
ude/complex:44,
                 from ../../../../../gcc/libstdc++-v3/src/c++98/complex_io.cc:25
:
/usr/include/math.h:1395:19: error: redefinition of ‘struct __iseqsig_type<float
>’
 1395 | template<> struct __iseqsig_type<_Float32>
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/math.h:1366:19: note: previous definition of ‘struct __iseqsig_type
<float>’
 1366 | template<> struct __iseqsig_type<float>

There are a lot of VRP changes for floats in the range that I haven't been able to bisect.
      |                   ^~~~~~~~~~~~~~~~~~~~~
Comment 12 John David Anglin 2023-10-15 01:15:50 UTC
The miscompilation of s_fma.c was introduced by the following change:

dave@atlas:~/gnu/gcc/gcc$ git bisect good
8c99e307b20c502e55c425897fb3884ba8f05882 is the first bad commit
commit 8c99e307b20c502e55c425897fb3884ba8f05882
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Sat Jun 25 18:58:02 2022 -0400

    Convert DOM to use Ranger rather than EVRP

    [Jeff, this is the same patch I sent you last week with minor tweaks
    to the commit message.]

    [Despite the verbosity of the message, this is actually a pretty
    straightforward patch.  It should've gone in last cycle, but there
    was a nagging regression I couldn't get to until after stage1
    had closed.]

    There are 3 uses of EVRP in DOM that must be converted.
    Unfortunately, they need to be converted in one go, so further
    splitting of this patch would be problematic.

    There's nothing here earth shattering.  It's all pretty obvious in
    retrospect, but I've added a short description of each use to aid in
    reviewing:

    * Convert evrp use in cprop to ranger.

      This is easy, as cprop in DOM was converted to the ranger API last
      cycle, so this is just a matter of using a ranger instead of an
      evrp_range_analyzer.

    * Convert evrp use in threader to ranger.

      The idea here is to use the hybrid approach we used for the initial
      VRP threader conversion last cycle.  The DOM threader will continue
      using the forward threader infrastructure while continuing to query
      DOM data structures, and only if the conditional does not relsolve,
      using the ranger.  This gives us the best of both worlds, and is a
      proven approach.

      Furthermore, as frange and prange come live in the next cycle, we
      can move away from the forward threader altogether, and just add
      another backward threader.  This will not only remove the last use
      of the forward threader, but will allow us to remove at least 1 or 2
      threader instances.

    * Convert conditional folding to use the method used by the ranger and
      evrp.  Previously DOM was calling into the guts of
      simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
      using fold_cond() which rewrites the conditional and edges
      automatically.

      When legacy is removed, simplify_using_ranges will be further
      cleaned up, and there will only be one entry point into simplifying
      a statement.

    * DOM was setting global ranges determined from unreachable edges as a
      side-effect of using the evrp engine.  We must handle these cases
      before nuking evrp, and DOM seems like a good fit.  I've just moved
      the snippet to DOM, but it could live anywhere else we do a DOM
      walk.

      For the record, this is the case *vrp handled:

            <bb C>:
            ...
            if (c_5(D) != 5)
            goto <bb N>;
            else
            goto <bb M>;
            <bb N>:
            __builtin_unreachable ();
            <bb M>:

      If M dominates all uses of c_5, we can set the global range of c_5
      to [5,5].

    I have tested on x86-64, pcc64le, and aarch64 Linux.

    I also ran threading benchmarks as well as performance benchmarks.

    DOM threads 1.56% more paths which ultimately yields a miniscule total
    increase of 0.03%.

    The conversion to ranger brings a 7.87% performance drop in DOM, which
    is a wash in overall compilation.  This is in line with other
    replacements of legacy evrp with ranger.  We handle a lot more cases.
    It's not free .

    There is a a regression on Wstringop-overflow-4.C which I'm planning
    on XFAILing.  It's another variant of the usual middle-end false
    positives: having no ranges produces no warnings, but slightly refined
    ranges, or worse-- isolating specific problematic cases in the
    threader causes flare-ups.

    As an aside, as Richi has suggested, I think we should discuss
    restricting the threader's ability to thread highly unlikely paths.
    These cause no end of pain for middle-end warnings.  However,
    I don't know if this would conflict with path isolation for
    things like null dereferencing.  ISTR you were interested in this.

    BTW, I think the Wstringop-overflow-4.C test is problematic and I've
    attached my analysis.  Basically the regression is caused by a bad
    interaction with the rounding/alignment that placement new has inlined
    into the IL.  This happens for int16_r[] which the test is testing.
    Ranger can glean some range info, which causes DOM threading to
    isolate a path which causes a warning.

    OK for trunk?

    gcc/ChangeLog:

            * tree-ssa-dom.cc (dom_jt_state): Pass ranger to constructor
            instead of evrp.
            (dom_jt_state::push): Remove m_evrp.
            (dom_jt_state::pop): Same.
            (dom_jt_state::record_ranges_from_stmt): Remove.
            (dom_jt_state::register_equiv): Remove updating of evrp ranges.
            (class dom_jt_simplifier): Pass ranger to constructor.
            Inherit from hybrid_jt_simplifier.
            (dom_jt_simplifier::simplify): Convert to ranger.
            (pass_dominator::execute): Same.
            (all_uses_feed_or_dominated_by_stmt): New.
            (dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): New.
            (dom_opt_dom_walker::before_dom_children): Call
            set_global_ranges_from_unreachable_edges.
            Do not call record_ranges_from_stmt.
            (dom_opt_dom_walker::after_dom_children): Remove evrp use.
            (cprop_operand): Use int_range<> instead of value_range.
            (dom_opt_dom_walker::fold_cond): New.
            (dom_opt_dom_walker::optimize_stmt): Pass ranger to
            cprop_into_stmt.
            Use fold_cond() instead of vrp_visit_cond_stmt().
            * tree-ssa-threadedge.cc (jt_state::register_equivs_stmt): Do not
            pass state to simplifier.
            * vr-values.h (class vr_values): Make fold_cond public.

    gcc/testsuite/ChangeLog:

            * gcc.dg/sancov/cmp0.c: Adjust for conversion to ranger.
            * gcc.dg/tree-ssa/ssa-dom-branch-1.c: Same.
            * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
            * gcc.dg/vect/bb-slp-pr81635-2.c: Same.
            * gcc.dg/vect/bb-slp-pr81635-4.c: Same.
            * g++.dg/warn/Wstringop-overflow-4.C: Likewise.
            * gcc.target/mips/data-sym-multi-pool.c: Likewise.
            * gcc.target/mips/mips.exp: Likewise.

 gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C   |  34 ++++
 gcc/testsuite/gcc.dg/sancov/cmp0.c                 |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c   |   5 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c   |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c       |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c       |   6 +-
 .../gcc.target/mips/data-sym-multi-pool.c          |   2 +-
 gcc/testsuite/gcc.target/mips/mips.exp             |   1 +
 gcc/tree-ssa-dom.cc                                | 223 +++++++++++----------
 gcc/tree-ssa-threadedge.cc                         |   4 +-
 gcc/vr-values.h                                    |   2 +-
 11 files changed, 170 insertions(+), 113 deletions(-)

I don't know anything about ranger but I wonder if this has to do with
the mips/hppa NaN representation.
Comment 13 matoro 2024-05-02 03:04:23 UTC
Current state of this has expanded to more of the math tests.

FAIL: math/test-double-fma
FAIL: math/test-double-j0
FAIL: math/test-double-j1
FAIL: math/test-double-ldouble-fma
FAIL: math/test-double-log
FAIL: math/test-float32x-float64-fma
FAIL: math/test-float32x-fma
FAIL: math/test-float32x-j0
FAIL: math/test-float32x-j1
FAIL: math/test-float32x-log
FAIL: math/test-float64-fma
FAIL: math/test-float64-j0
FAIL: math/test-float64-j1
FAIL: math/test-float64-log
FAIL: math/test-ldouble-fma
FAIL: math/test-ldouble-j0
FAIL: math/test-ldouble-j1
FAIL: math/test-ldouble-log
Comment 14 Jakub Jelinek 2024-05-21 09:18:05 UTC
GCC 13.3 is being released, retargeting bugs to GCC 13.4.
Comment 15 Andreas K. Huettel 2024-07-11 19:31:12 UTC
(In reply to matoro from comment #13)
> Current state of this has expanded to more of the math tests.
> 
> FAIL: math/test-double-fma
> FAIL: math/test-double-j0
> FAIL: math/test-double-j1
> FAIL: math/test-double-ldouble-fma
> FAIL: math/test-double-log
> FAIL: math/test-float32x-float64-fma
> FAIL: math/test-float32x-fma
> FAIL: math/test-float32x-j0
> FAIL: math/test-float32x-j1
> FAIL: math/test-float32x-log
> FAIL: math/test-float64-fma
> FAIL: math/test-float64-j0
> FAIL: math/test-float64-j1
> FAIL: math/test-float64-log
> FAIL: math/test-ldouble-fma
> FAIL: math/test-ldouble-j0
> FAIL: math/test-ldouble-j1
> FAIL: math/test-ldouble-log

I think the log, j0, j1 failures have a different origin (the glibc testsuite needed an update of the ulps file). 

See for current results
https://sourceware.org/glibc/wiki/Release/2.40#HPPA
Comment 16 John David Anglin 2024-07-11 21:09:02 UTC
Correct.  I recently did a couple of updates to the test ulps and now
only the fma tests fail when building glibc with PA 1.1 code.  Don't
know about PA 2.0.

I noticed that some RISCV processors have problems with underflow.  I
adapted fdiv test to test glibc fma function:

dave@atlas:~$ cat fma-repro.c
#include <math.h>
#include <fenv.h>
#include <stdio.h>

int main(void) {
  if (fesetround (FE_DOWNWARD)) {
    printf("ERROR: Failed to set rounding mode!\n");
    return 1;
  }
  fma(-0x7.ffffffffffffp-1024, 0x8.0000000000008p-4, -0x4p-1076);
  if(fetestexcept (FE_UNDERFLOW)) {
    printf("Failure: Exception Underflow is set!\n");
  } else {
    printf("Success: Exception Underflow is not set!!!\n");
  }
}

dave@atlas:~$ gcc fma-repro.c  -lm -std=c2x -o fma-repro -fno-builtin
dave@atlas:~$ ./fma-repro
Failure: Exception Underflow is set!

Option -fno-builtin is needed to ensure fma call isn't optimized away.
Test passes if it is optimized away.
Comment 17 Richard Biener 2024-10-14 07:20:06 UTC
Any update?  What's the current state?
Comment 18 John David Anglin 2024-10-14 14:29:31 UTC
As of my last build on Oct. 9, the following tests still fail:
FAIL: math/test-double-fma
FAIL: math/test-double-ldouble-fma
FAIL: math/test-float32x-float64-fma
FAIL: math/test-float32x-fma
FAIL: math/test-float64-fma
FAIL: math/test-ldouble-fma
Comment 19 Richard Biener 2025-02-03 12:27:48 UTC
Note the testcase in comment#16 can hardly be miscompiled by VRP, are you sure glibc doesn't set spurious flags in fma() on hppa?
Comment 20 John David Anglin 2025-02-03 18:45:36 UTC
Will investigate further but it's not the testcase but glibc's fma/fmal that is miscompiled.
Comment 21 John David Anglin 2025-02-05 16:24:53 UTC
For the testcase in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111709#c16,
the underflow bit is generated in this glibc code in s_fma.c:

          /* v.ieee.mantissa1 & 2 is LSB bit of the result before rounding,
             v.ieee.mantissa1 & 1 is the round bit and j is our sticky
             bit.  */
          w.d = 0.0;
          w.ieee.mantissa1 = ((v.ieee.mantissa1 & 3) << 1) | j;
          w.ieee.negative = v.ieee.negative;
          v.ieee.mantissa1 &= ~3U;
          v.d *= 0x1p-108;
          w.d *= 0x1p-2;
          return v.d + w.d;

Before the multiplication, w is

$4 = {d = -3.4584595208887258e-323, ieee = {negative = 1, exponent = 0,
    mantissa0 = 0, mantissa1 = 7}, ieee_nan = {negative = 1, exponent = 0,
    quiet_nan = 0, mantissa0 = 0, mantissa1 = 7}}

After the multiplaction by 0x1p-2 (0.25), w is

$3 = {d = -4.9406564584124654e-324, ieee = {negative = 1, exponent = 0,
    mantissa0 = 0, mantissa1 = 1}, ieee_nan = {negative = 1, exponent = 0,
    quiet_nan = 0, mantissa0 = 0, mantissa1 = 1}}

and the inexact and underflow bits are set on hppa.  The result looks
reasonable to me.

I'm not sure about ieee754 but different values can be returned when
a non-trapping underflow occurs.  So, the above code seems suspect.

Maybe underflow and inexact should be cleared after w.d is multiplied
by 0x1p-2?

Given the difference between gcc-12 and gcc-13, I'm not sure the return
paths are the same.
Comment 22 John David Anglin 2025-02-05 16:37:28 UTC
Actually, it appears the multiplication by 0.25 can be avoided by
setting w directly.
Comment 23 Joseph S. Myers 2025-02-05 16:39:52 UTC
hppa is an after-rounding architecture and this test is only meant to produce underflow on before-rounding architectures. You should investigate why the code in question is entered at all. I'd have expected

          /* If the exponent would be in the normal range when
             rounding to normal precision with unbounded exponent
             range, the exact result is known and spurious underflows
             must be avoided on systems detecting tininess after
             rounding.  */
          if (TININESS_AFTER_ROUNDING)
            {
              w.d = a1 + u.d;
              if (w.ieee.exponent == 109)
                return w.d * 0x1p-108;
            }

to have dealt with this case.
Comment 24 Joseph S. Myers 2025-02-05 16:42:42 UTC
See my previous comment about possible code movement / need for more usage of math_opt_barrier. Maybe the a1 + u.d computation got moved before the rounding mode was restored, or something like that?
Comment 25 John David Anglin 2025-02-05 18:35:15 UTC
The unions in this code have been completely optimized away making
this code very difficult to debug.  I worry that there is a disconnect
between the floating and integer values in the unions.  Float computations
may need to be forced to complete to ensure the union values are
updated.  I had to do this to determine where the underflow flag was
being set.

But maybe you are right about needing a barrier after the rounding
mode change.
Comment 26 John David Anglin 2025-02-05 21:13:44 UTC
(In reply to Joseph S. Myers from comment #23)
> hppa is an after-rounding architecture and this test is only meant to
> produce underflow on before-rounding architectures. You should investigate
> why the code in question is entered at all. I'd have expected
> 
>           /* If the exponent would be in the normal range when
>              rounding to normal precision with unbounded exponent
>              range, the exact result is known and spurious underflows
>              must be avoided on systems detecting tininess after
>              rounding.  */
>           if (TININESS_AFTER_ROUNDING)
>             {
>               w.d = a1 + u.d;
>               if (w.ieee.exponent == 109)
>                 return w.d * 0x1p-108;
>             }

The return in the above hunk isn't taken because the exponent calculated
for "a1 + u.d" is 108.

Note: The "w.d = a1 + u.d;" line seems redundant as "a1 + u.d" is
previously calculated in this hunk:

  if (__glibc_unlikely (adjust < 0))
    {
      if ((u.ieee.mantissa1 & 1) == 0)
        u.ieee.mantissa1 |= libc_fetestexcept (FE_INEXACT) != 0;
      v.d = a1 + u.d;
      /* Ensure the addition is not scheduled after fetestexcept call.  */
      math_force_eval (v.d);
    }

adjust = -1

Should we just return "v.d * 0x1p-108" when TININESS_AFTER_ROUNDING is true?
Comment 27 Joseph S. Myers 2025-02-05 21:17:56 UTC
That's not redundant, the previous calculation is in FE_TOWARDZERO mode, before the call to libc_feupdateenv_test. But maybe that call needs to be followed by "a1 = math_opt_barrier (a1);" or similar to ensure the value doesn't get wrongly reused in the new rounding mode. With a value calculated in the correct rounding mode, the exponent should be 109, not 108.
Comment 28 John David Anglin 2025-02-06 15:41:53 UTC
Created attachment 60399 [details]
Patch

Fixes fma testcases on hppa.

This is a glibc bug.
Comment 29 John David Anglin 2025-02-06 15:45:23 UTC
Created attachment 60400 [details]
Patch

Sorry, this replaces previous patch version.