85811 – Invalid optimization with fmax, fabs and nan

Bug 85811 - Invalid optimization with fmax, fabs and nan

Summary: Invalid optimization with fmax, fabs and nan

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	8.1.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Roger Sayle

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2018-05-16 21:18 UTC by Matt Peddie
Modified:	2022-02-05 17:55 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:	11.0
Known to fail:
Last reconfirmed:	2018-05-17 00:00:00

Attachments
Preprocessed source file (10.23 KB, text/plain) 2018-05-16 21:18 UTC, Matt Peddie	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matt Peddie 2018-05-16 21:18:23 UTC

Created attachment 44139 [details]
Preprocessed source file

I apologize in advance if I've chosen the wrong category for this bug; I have to choose a category and don't know what's correct.

The output from my small test program varies depending on whether I enable optimizations with -O.  The un-preprocessed program is the following (preprocessed output attached):

    #include <math.h>
    #include <stdio.h>
    #include <stdlib.h>

    int main(int argc __attribute__((unused)), char **argv __attribute__((unused))) {
      const double negval = -1.0;
      const double nanval = 0.0 / 0.0;
      const double val = fmax(negval, nanval);
      const double absval = fabs(val);
      printf("fabs(%.16e) = %.16e\n", val, absval);
      return absval >= 0 ? 0 : 1;
    }

With optimizations enabled, this prints

    fabs(-1.0000000000000000e+00) = -1.0000000000000000e+00

and returns 1.  This result is wrong (the absolute value must be positive).  Without optimizations, it prints

    fabs(1.0000000000000000e+00) = 1.0000000000000000e+00

and returns 0 as expected.  If I make the arguments to fmax() depend on inputs, for example the value of argc, the program behaves correctly.  If I call fmin() instead of fmax(), the program behaves correctly.  If neither of the arguments to fmax() is NaN, the program behaves correctly; the correct behavior happens if one argument is -NaN, INFINITY or -INFINITY.  The order of arguments to fmax() does not affect the result of the program.

If I look at the generated assembly, substituting fmin() for fmax() in the test program results in an andpd instruction immediately after fmin() that isn't emitted when fmax() is called.

The attached file test.i is the preprocessed source file.  The test program compiles without errors or warnings and never triggers the undefined-behavior sanitizer.  I've observed the same problem with gcc version 7.3.0 and 6.4.0 for x86_64 as well as gcc version 6.3.0 for arm32.  I don't see the problem with gcc version 5.5.0 for x86_64.  Below is the command-line invocation of gcc along with its complete output.




gcc-8 -v -save-temps -O -Wall -Wextra -Werror -fsanitize=undefined -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -lm test.c -o test

Using built-in specs.
COLLECT_GCC=gcc-8
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.1.0-3' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 8.1.0 (Debian 8.1.0-3)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O' '-Wall' '-Wextra' '-Werror' '-fsanitize=undefined' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-o' 'test' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/8/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu test.c -mtune=generic -march=x86-64 -Wall -Wextra -Werror -fsanitize=undefined -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -O -fpch-preprocess -o test.i
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/8/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O' '-Wall' '-Wextra' '-Werror' '-fsanitize=undefined' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-o' 'test' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/8/cc1 -fpreprocessed test.i -quiet -dumpbase test.c -mtune=generic -march=x86-64 -auxbase test -O -Wall -Wextra -Werror -version -fsanitize=undefined -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -o test.s
GNU C17 (Debian 8.1.0-3) version 8.1.0 (x86_64-linux-gnu)
	compiled by GNU C version 8.1.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (Debian 8.1.0-3) version 8.1.0 (x86_64-linux-gnu)
	compiled by GNU C version 8.1.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: bbc65a5a9118b9b79402871f4ead4543
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O' '-Wall' '-Wextra' '-Werror' '-fsanitize=undefined' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-o' 'test' '-mtune=generic' '-march=x86-64'
 as -v --64 -o test.o test.s
GNU assembler version 2.30 (x86_64-linux-gnu) using BFD version (GNU Binutils for Debian) 2.30
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/8/:/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/8/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O' '-Wall' '-Wextra' '-Werror' '-fsanitize=undefined' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-o' 'test' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/8/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/8/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper -plugin-opt=-fresolution=test.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o test /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/8/../../.. -lm test.o -lubsan -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/8/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O' '-Wall' '-Wextra' '-Werror' '-fsanitize=undefined' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-o' 'test' '-mtune=generic' '-march=x86-64'

Comment 1 Marc Glisse 2018-05-16 21:37:27 UTC

tree_binary_nonnegative_warnv_p for RDIV_EXPR does RECURSE (op0) && RECURSE (op1), but that doesn't work so well when the denominator can be 0. I guess it is still ok when finite-math-only (or no-nans and no-signed-zeros maybe?), or when we can prove that the denominator is non-zero.

Comment 2 Andrew Pinski 2018-05-16 21:46:30 UTC

Applying pattern match.pd:832, gimple-match.c:11245
Match-and-simplified ABS_EXPR <val_5> to val_5

Comment 3 Andrew Pinski 2018-05-16 21:47:57 UTC

(In reply to Andrew Pinski from comment #2)
> Applying pattern match.pd:832, gimple-match.c:11245
> Match-and-simplified ABS_EXPR <val_5> to val_5

Which is:
(simplify
 (abs tree_expr_nonnegative_p@0)
 @0)

Comment 4 Andrew Pinski 2018-05-16 21:53:37 UTC

    case MAX_EXPR:
      return RECURSE (op0) || RECURSE (op1);

This is not true if one is a NAN.

Comment 5 Andrew Pinski 2018-05-16 21:56:04 UTC

(In reply to Andrew Pinski from comment #4)
>     case MAX_EXPR:
>       return RECURSE (op0) || RECURSE (op1);
> 
> This is not true if one is a NAN.

And the reason why it is not true for NAN is simple:
If one of the two arguments is NaN, the value of the other argument is returned

Comment 6 Marc Glisse 2018-05-16 22:50:33 UTC

What does tree_expr_nonnegative_p call non-negative? A natural definition would exclude NaN, but for REAL_CST we just return ! REAL_VALUE_NEGATIVE.

Comment 7 Richard Biener 2018-05-17 07:16:58 UTC

(In reply to Marc Glisse from comment #6)
> What does tree_expr_nonnegative_p call non-negative? A natural definition
> would exclude NaN, but for REAL_CST we just return ! REAL_VALUE_NEGATIVE.

Is there sth like -NaN?!

Anyway, the bug is clearly in tree_expr_nonnegative_p and siblings.  Btw, tree_binary_nonzero_warnv_p looks similarly suspicious.

I guess first of all

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index faa184a2bbd..cb3de26f9f1 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -12949,7 +12949,8 @@ tree_single_nonnegative_warnv_p (tree t, bool *strict_overflow_p, int depth)
       return tree_int_cst_sgn (t) >= 0;
 
     case REAL_CST:
-      return ! REAL_VALUE_NEGATIVE (TREE_REAL_CST (t));
+      return (! REAL_VALUE_ISNAN (TREE_REAL_CST (t))
+             && ! REAL_VALUE_NEGATIVE (TREE_REAL_CST (t)));
 
     case FIXED_CST:
       return ! FIXED_VALUE_NEGATIVE (TREE_FIXED_CST (t));

but then how do we want to deal with MAX_EXPR (and MIN_EXPR which after the
above could see an optimization for min (NaN, x)?).  We'd have to special
case real-type MAX_EXPR (what about real complex/vector?  tree_single_nonnegative_warnv_p doesn't handle complex/vector constants
but at least we should be correct in the uplevel treatment).

Comment 8 Andreas Schwab 2018-05-17 08:33:01 UTC

> Is there sth like -NaN?!

signbit can tell you.

Comment 9 rguenther@suse.de 2018-05-17 08:47:38 UTC

On Thu, 17 May 2018, schwab@linux-m68k.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85811
> 
> --- Comment #8 from Andreas Schwab <schwab@linux-m68k.org> ---
> > Is there sth like -NaN?!
> 
> signbit can tell you.

True.  I guess that also asks for the guarantee on the operand
for tree_single_nonnegative_warnv_p returning true.  Does it
guarantee that OP >= 0 compares as true?  Or merely !(OP < 0)?
So does it make sense for the function to return true for +NaN
or should we err on the side of caution and not do that?

Comment 10 GCC Commits 2020-11-18 22:10:51 UTC

The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:1be4878116a2be82552bd59c3c1c9adcac3d106b

commit r11-5152-g1be4878116a2be82552bd59c3c1c9adcac3d106b
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Nov 18 15:09:46 2020 -0700

    Fix middle-end/85811: Introduce tree_expr_maybe_non_p et al.
    
    The motivation for this patch is PR middle-end/85811, a wrong-code
    regression entitled "Invalid optimization with fmax, fabs and nan".
    The optimization involves assuming max(x,y) is non-negative if (say)
    y is non-negative, i.e. max(x,2.0).  Unfortunately, this is an invalid
    assumption in the presence of NaNs.  Hence max(x,+qNaN), with IEEE fmax
    semantics will always return x even though the qNaN is non-negative.
    Worse, max(x,2.0) may return a negative value if x is -sNaN.
    
    I'll quote Joseph Myers (many thanks) who describes things clearly as:
    > (a) When both arguments are NaNs, the return value should be a qNaN,
    > but sometimes it is an sNaN if at least one argument is an sNaN.
    > (b) Under TS 18661-1 semantics, if either argument is an sNaN then the
    > result should be a qNaN (whereas if one argument is a qNaN and the
    > other is not a NaN, the result should be the non-NaN argument).
    > Various implementations treat sNaNs like qNaNs here.
    
    Under this logic, the tree_expr_nonnegative_p for IEEE fmax should be:
    
        CASE_CFN_FMAX:
        CASE_CFN_FMAX_FN:
          /* Usually RECURSE (arg0) || RECURSE (arg1) but NaNs complicate
             things.  In the presence of sNaNs, we're only guaranteed to be
             non-negative if both operands are non-negative.  In the presence
             of qNaNs, we're non-negative if either operand is non-negative
             and can't be a qNaN, or if both operands are non-negative.  */
          if (tree_expr_maybe_signaling_nan_p (arg0) ||
              tree_expr_maybe_signaling_nan_p (arg1))
            return RECURSE (arg0) && RECURSE (arg1);
          return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0)
                                  || RECURSE (arg1))
                                : (RECURSE (arg1)
                                  && !tree_expr_maybe_nan_p (arg1));
    
    Which indeed resolves the wrong code in the PR.  The infrastructure that
    makes this possible are the two new functions tree_expr_maybe_nan_p and
    tree_expr_maybe_signaling_nan_p which test whether a value may potentially
    be a NaN or a signaling NaN respectively.  In fact, this patch adds seven
    new predicates to the middle-end:
    
    bool tree_expr_finite_p (const_tree);
    bool tree_expr_infinite_p (const_tree);
    bool tree_expr_maybe_infinite_p (const_tree);
    bool tree_expr_signaling_nan_p (const_tree);
    bool tree_expr_maybe_signaling_nan_p (const_tree);
    bool tree_expr_nan_p (const_tree);
    bool tree_expr_maybe_nan_p (const_tree);
    
    These functions correspond to the "must" and "may" operators in modal logic,
    and allow us to triage expressions in the middle-end; definitely a NaN,
    definitely not a NaN, and unknown at compile-time, etc.  A prime example of
    the utility of these functions is that a IEEE floating point value promoted
    from an integer type can't be a NaN or infinite.  Hence (double)i+0.0 where
    i is an integer can be simplified to (double)i even with -fsignaling-nans.
    Currently in GCC optimizations are enabled/disabled based on whether the
    expression's type supports NaNs or sNaNs; with these new predicates they
    can be controlled by whether the actual operands may or may not be NaNs.
    
    Having added these extremely useful helper functions to the middle-end,
    I couldn't help by use then in a few places in fold-const.c, builtins.c
    and match.pd.  In the near term, these can/should be used in places
    where the tree optimizers test for HONOR_NANS, HONOR_INFINITIES or
    HONOR_SNANS, or explicitly test whether a REAL_CST is a NaN or Inf.
    In the longer term (I'm not volunteering) these predicates could perhaps
    be hooked into the middle-end's SSA chaining and/or VRP machinery,
    allowing finiteness to propagated around the CFG, much like we
    currently propagate value ranges.
    
    This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
    and "make -k check".
    Ok for mainline?
    
    2020-08-15  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            PR middle-end/85811
            * fold-const.c (tree_expr_finite_p): New function to test whether
            a tree expression must be finite, i.e. not a FP NaN or infinity.
            (tree_expr_infinite_p):  New function to test whether a tree
            expression must be infinite, i.e. a FP infinity.
            (tree_expr_maybe_infinite_p): New function to test whether a tree
            expression may be infinite, i.e. a FP infinity.
            (tree_expr_signaling_nan_p): New function to test whether a tree
            expression must evaluate to a signaling NaN (sNaN).
            (tree_expr_maybe_signaling_nan_p): New function to test whether a
            tree expression may be a signaling NaN (sNaN).
            (tree_expr_nan_p): New function to test whether a tree expression
            must evaluate to a (quiet or signaling) NaN.
            (tree_expr_maybe_nan_p): New function to test whether a tree
            expression me be a (quiet or signaling) NaN.
    
            (tree_binary_nonnegative_warnv_p) [MAX_EXPR]: In the presence
            of NaNs, MAX_EXPR is only guaranteed to be non-negative, if both
            operands are non-negative.
            (tree_call_nonnegative_warnv_p) [CASE_CFN_FMAX,CASE_CFN_FMAX_FN]:
            In the presence of signaling NaNs, fmax is only guaranteed to be
            non-negative if both operands are negative.  In the presence of
            quiet NaNs, fmax is non-negative if either operand is non-negative
            and not a qNaN, or both operands are non-negative.
    
            * fold-const.h (tree_expr_finite_p, tree_expr_infinite_p,
            tree_expr_maybe_infinite_p, tree_expr_signaling_nan_p,
            tree_expr_maybe_signaling_nan_p, tree_expr_nan_p,
            tree_expr_maybe_nan_p): Prototype new functions here.
    
            * builtins.c (fold_builtin_classify) [BUILT_IN_ISINF]: Fold to
            a constant if argument is known to be (or not to be) an Infinity.
            [BUILT_IN_ISFINITE]: Fold to a constant if argument is known to
            be (or not to be) finite.
            [BUILT_IN_ISNAN]: Fold to a constant if argument is known to be
            (or not to be) a NaN.
            (fold_builtin_fpclassify): Check tree_expr_maybe_infinite_p and
            tree_expr_maybe_nan_p instead of HONOR_INFINITIES and HONOR_NANS
            respectively.
            (fold_builtin_unordered_cmp): Fold UNORDERED_EXPR to a constant
            when its arguments are known to be (or not be) NaNs.  Check
            tree_expr_maybe_nan_p instead of HONOR_NANS when choosing between
            unordered and regular forms of comparison operators.
    
            * match.pd (ordered(x,y)->true/false): Constant fold ORDERED_EXPR
            if its operands are known to be (or not to be) NaNs.
            (unordered(x,y)->true/false): Constant fold UNORDERED_EXPR if its
            operands are known to be (or not to be) NaNs.
            (sqrt(x)*sqrt(x)->x): Check tree_expr_maybe_signaling_nan_p instead
            of HONOR_SNANS.
    
    gcc/testsuite/ChangeLog
            PR middle-end/85811
            * gcc.dg/pr85811.c: New test.
            * gcc.dg/fold-isfinite-1.c: New test.
            * gcc.dg/fold-isfinite-2.c: New test.
            * gcc.dg/fold-isinf-1.c: New test.
            * gcc.dg/fold-isinf-2.c: New test.
            * gcc.dg/fold-isnan-1.c: New test.
            * gcc.dg/fold-isnan-2.c: New test.

Comment 11 Arseny Solokha 2020-12-02 05:32:21 UTC

Are there backports pending?

Comment 12 Richard Biener 2021-01-11 12:23:54 UTC

Possibly could be backported even if not a regression but I guess the wrong-code is really restricted to cases we don't hit in the wild.

That said, not objecting if anybody wants to backport to GCC 10.

Comment 13 Roger Sayle 2022-02-05 17:55:06 UTC

Fixed on mainline.  [Sorry I've only just noticed this hadn't been closed].