100927 – [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

Bug 100927 - [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

Summary: [sse2] floating point to integer conversion functions incorrect results w/ Na...

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	11.1.1

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:	115115
	Show dependency tree / graph

Reported:	2021-06-05 18:40 UTC by Evan Nemerson
Modified:	2024-06-05 04:09 UTC (History)
CC List:	2 users (show)

See Also:	115115
Host:
Target:	x86_64-- i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Evan Nemerson 2021-06-05 18:40:27 UTC

_mm_cvttpd_epi32, _mm_cvttpd_pi32, _mm_cvttps_epi32, and _mm_cvttsd_si32 are supposed to return INT32_MIN for NaN inputs.  However, when compiled with optimization on GCC, if the values are known at compile time NaN inputs result in 0 in the output.

Here is a quick test case, using _mm_cvttpd_epi32:


#include <xmmintrin.h>
#include <stdio.h>

int main(void) {
  static const double values[] = {
    __builtin_nan(""), -__builtin_nan("")
  };
  int32_t res[4];

  _mm_storeu_si128((__m128i*) res, _mm_cvttpd_epi32(_mm_loadu_pd(values)));

  for (int i = 0 ; i < 4 ; i++) {
    printf("%d\n", res[i]);
  }

  return 0;
}


Compile with `gcc -O1 -o test test.c` and you get all zeros, `gcc -O0 -o test test.c` and the first two elements of the result are INT32_MIN as they should be.  Changing the const to volatile (and adding -Wno-discarded-qualifiers) "fixes" the issue.

Comment 1 Michael Crusoe 2023-02-12 09:15:22 UTC

2023 update: this is still happening in GCC 10.1+ including trunk

https://godbolt.org/z/YKKcdP8MY

Comment 2 Andrew Pinski 2023-02-12 09:20:19 UTC

Are you sure _mm_cvttpd_epi32 is documented that way? I suspect it is just unspecified behavior.

Comment 3 Michael Crusoe 2023-02-12 11:42:06 UTC

Good question, lets check the reference.

Summary: it is specified behavior that _mm_cvttpd_epi32 returns Integer Indefinite (80000000H) for NaN inputs.

All references below are from the December 2022 edition (Order Number: 325462-078US) of "Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" from https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

The formal signature of the _mm_cvttpd_epi32 intrinsic is in Table C-1 "Simple Intrinsics" on page 2987, reminding us that the mnemonic is CVTTPD2DQ.

The formal definition of CVTTPD2DQ is given in section 5.6.1.6 "Intel® SSE2 Conversion Instructions" on page 133

> Convert with truncation packed double precision floating-point values to packed double-
word integers.

On page 106 we learn more about what truncation means in the definition of CVTTPD2DQ

> 4.8.4.2 Truncation with Intel® SSE, SSE2, and AVX Conversion Instructions
> The following Intel SSE/SSE2 instructions automatically truncate the results of
> conversions from floating-point values to integers when the result it inexact: CVTTPD2DQ,
> CVTTPS2DQ, CVTTPD2PI, CVTTPS2PI, CVTTSD2SI, and CVTTSS2SI. Here, truncation means the
> round toward zero mode described in Table 4-8. There are also several Intel AVX2 and
> AVX-512 instructions which use truncation (VCVTT*)

Table 4.8 from section 4.8.4 states

> Rounding Mode: Round toward zero (Truncate)
> Description: Rounded result is closest to but no greater in absolute value than the infinitely precise result.

Section 11.4.1.6 ("SSE2 Conversion Instructions") states that

> The CVTTPD2DQ (convert with truncation packed double precision floating-point values to
> packed doubleword integers) instruction is similar to the CVTPD2DQ instruction except
> that truncation is used to round a source value to an integer value.

Table 11-1. "Masked Responses of SSE/SSE2/SSE3 Instructions to Invalid Arithmetic Operations" states that

> Condition: Conversion to integer when the value in the source register is a NaN, ∞, or
> exceeds the representable range for CVTPS2PI, CVTTPS2PI, CVTSS2SI, CVTTSS2SI, CVTPD2PI,
> CVTSD2SI, CVTPD2DQ, CVTTPD2PI, CVTTSD2SI, CVTTPD2DQ, CVTPS2DQ, or CVTTPS2DQ

> Masked Response: Return the integer Indefinite

More explicitly stated is in section D.4.2.2 "Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions" where Table D-8 (page 455) ("CVTPS2PI, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, CVTPD2PI, CVTSD2SI, CVTTPD2PI, CVTTSD2SI, CVTPS2DQ, CVTTPS2DQ, CVTPD2DQ, CVTTPD2DQ") states that the masked result from any type of NaN (SNaN or QNaN) will be the Integer Indefinite (80000000H in for 32-bit values).

Comment 4 Hongtao.liu 2023-02-13 06:53:52 UTC

The intrinsic is expanded to rtl FIX, and then be optimized to 0 for NANs.

2201      /* Although the overflow semantics of RTL's FIX and UNSIGNED_FIX
2202         operators are intentionally left unspecified (to ease implementation
2203         by target backends), for consistency, this routine implements the
2204         same semantics for constant folding as used by the middle-end.  */
2205
2206      /* This was formerly used only for non-IEEE float.
2207         eggert@twinsun.com says it is safe for IEEE also.  */
2208      REAL_VALUE_TYPE t;
2209      const REAL_VALUE_TYPE *x = CONST_DOUBLE_REAL_VALUE (op);
2210      wide_int wmax, wmin;
2211      /* This is part of the abi to real_to_integer, but we check
2212         things before making this call.  */
2213      bool fail;
2214
2215      switch (code)
2216        {
2217        case FIX:
2218          if (REAL_VALUE_ISNAN (*x))
2219            return const0_rtx;

According to IEEE-2019, when a NaN or infinite operand cannot be represented in the destination format and this cannot otherwise be indicated, the invalid operation exception shall be signaled.
And there's comments says "for consistency, this routine implements the same semantics for constant folding as used by the middle-end." and "This was formerly used only for non-IEEE float."

Maybe we should prevent this.

Comment 5 GCC Commits 2024-06-05 04:09:07 UTC

The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:b05288d1f1e4b632eddf8830b4369d4659f6c2ff

commit r15-1022-gb05288d1f1e4b632eddf8830b4369d4659f6c2ff
Author: liuhongt <hongtao.liu@intel.com>
Date:   Tue May 21 16:57:17 2024 +0800

    Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.
    
    According to IEEE standard, for conversions from floating point to
    integer. When a NaN or infinite operand cannot be represented in the
    destination format and this cannot otherwise be indicated, the invalid
    operation exception shall be signaled. When a numeric operand would
    convert to an integer outside the range of the destination format, the
    invalid operation exception shall be signaled if this situation cannot
    otherwise be indicated.
    
    The patch prevent simplication of the conversion from floating point
    to integer for NAN/INF/out-of-range constant when flag_trapping_math.
    
    gcc/ChangeLog:
    
            PR rtl-optimization/100927
            PR rtl-optimization/115161
            PR rtl-optimization/115115
            * simplify-rtx.cc (simplify_const_unary_operation): Prevent
            simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range
            constant when flag_trapping_math.
            * fold-const.cc (fold_convert_const_int_from_real): Don't fold
            for overflow value when_trapping_math.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/pr100927.c: New test.
            * c-c++-common/Wconversion-1.c: Add -fno-trapping-math.
            * c-c++-common/dfp/convert-int-saturate.c: Ditto.
            * g++.dg/ubsan/pr63956.C: Ditto.
            * g++.dg/warn/Wconversion-real-integer.C: Ditto.
            * gcc.c-torture/execute/20031003-1.c: Ditto.
            * gcc.dg/Wconversion-complex-c99.c: Ditto.
            * gcc.dg/Wconversion-real-integer.c: Ditto.
            * gcc.dg/c90-const-expr-11.c: Ditto.
            * gcc.dg/overflow-warn-8.c: Ditto.