Bug 106165 - incorrect result when using inlined asm implementation of floor() on i686
Summary: incorrect result when using inlined asm implementation of floor() on i686
Status: RESOLVED DUPLICATE of bug 323
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 11.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: inline-asm, wrong-code
Depends on:
Blocks:
 
Reported: 2022-07-02 01:30 UTC by xeioex
Modified: 2022-10-14 09:47 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description xeioex 2022-07-02 01:30:37 UTC
$ gcc-11 -v -save-temps -O2 minified_to_string_radix.i -o 507 -lm
Using built-in specs.                                                                                                                   
COLLECT_GCC=gcc-11                                                                                                                      
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/11/lto-wrapper                                                                          
Target: i686-linux-gnu                                                                                                                  
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.1.0-1ubuntu1~18.04.1' --with-bugurl=file:///usr/share/doc/gcc-11/READM
E.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 -
-program-prefix=i686-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threa
ds=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdc
xx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos
-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-targets=all --enable-multiarch --disable-werror --disabl
e-cet --with-arch-32=i686 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=i686
-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu                                                                                
Thread model: posix                                                                                                                     
Supported LTO compression algorithms: zlib zstd                                                                                         
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~18.04.1)                                                                                     
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-o' '507' '-mtune=generic' '-march=i686' '-dumpdir' '507-'                                
 /usr/lib/gcc/i686-linux-gnu/11/cc1 -E -quiet -v -imultiarch i386-linux-gnu 507_mini.c -mtune=generic -march=i686 -O2 -fpch-preprocess -
fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -o 507-507_mini.i                                       
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"                                                                      
ignoring nonexistent directory "/usr/lib/gcc/i686-linux-gnu/11/include-fixed"                                                           
ignoring nonexistent directory "/usr/lib/gcc/i686-linux-gnu/11/../../../../i686-linux-gnu/include"                                      
#include "..." search starts here: 
#include <...> search starts here: 
 /usr/lib/gcc/i686-linux-gnu/11/include
 /usr/local/include
 /usr/include/i386-linux-gnu
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-o' '507' '-mtune=generic' '-march=i686' '-dumpdir' '507-'
 /usr/lib/gcc/i686-linux-gnu/11/cc1 -fpreprocessed 507-507_mini.i -quiet -dumpdir 507- -dumpbase 507_mini.c -dumpbase-ext .c -mtune=generic -march=i686 -O2 -version -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -o 507-507_mini.s
GNU C17 (Ubuntu 11.1.0-1ubuntu1~18.04.1) version 11.1.0 (i686-linux-gnu)
        compiled by GNU C version 11.1.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (Ubuntu 11.1.0-1ubuntu1~18.04.1) version 11.1.0 (i686-linux-gnu)
        compiled by GNU C version 11.1.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 81b68deb607ea376c8bc5126cafbdd31
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-o' '507' '-mtune=generic' '-march=i686' '-dumpdir' '507-'
 as -v --32 -o 507-507_mini.o 507-507_mini.s
GNU assembler version 2.30 (i686-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.30
COMPILER_PATH=/usr/lib/gcc/i686-linux-gnu/11/:/usr/lib/gcc/i686-linux-gnu/11/:/usr/lib/gcc/i686-linux-gnu/:/usr/lib/gcc/i686-linux-gnu/11/:/usr/lib/gcc/i686-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/i686-linux-gnu/11/:/usr/lib/gcc/i686-linux-gnu/11/../../../i386-linux-gnu/:/usr/lib/gcc/i686-linux-gnu/11/../../../../lib/:/lib/i386-linux-gnu/:/lib/../lib/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/i686-linux-gnu/11/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-o' '507' '-mtune=generic' '-march=i686' '-dumpdir' '507.'
 /usr/lib/gcc/i686-linux-gnu/11/collect2 -plugin /usr/lib/gcc/i686-linux-gnu/11/liblto_plugin.so -plugin-opt=/usr/lib/gcc/i686-linux-gnu/11/lto-wrapper -plugin-opt=-fresolution=507.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr -m elf_i386 --hash-style=gnu --as-needed -dynamic-linker /lib/ld-linux.so.2 -pie -z now -z relro -o 507 /usr/lib/gcc/i686-linux-gnu/11/../../../i386-linux-gnu/Scrt1.o /usr/lib/gcc/i686-linux-gnu/11/../../../i386-linux-gnu/crti.o /usr/lib/gcc/i686-linux-gnu/11/crtbeginS.o -L/usr/lib/gcc/i686-linux-gnu/11 -L/usr/lib/gcc/i686-linux-gnu/11/../../../i386-linux-gnu -L/usr/lib/gcc/i686-linux-gnu/11/../../../../lib -L/lib/i386-linux-gnu -L/lib/../lib -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/i686-linux-gnu/11/../../.. 507-507_mini.o -lm -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/i686-linux-gnu/11/crtendS.o /usr/lib/gcc/i686-linux-gnu/11/../../../i386-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-o' '507' '-mtune=generic' '-march=i686' '-dumpdir' '507.'

$ uname -a
Linux c6c6b3a033a2 5.13.0-39-generic #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

minified_to_string_radix.i:
```
typedef unsigned int size_t;
typedef signed int __int32_t;
typedef unsigned int __uint32_t;
__extension__ typedef signed long long int __int64_t;
__extension__ typedef unsigned long long int __uint64_t;
int printf(const char *format, ...);
int memcmp(const void *s1, const void *s2, size_t n);

__extension__ typedef unsigned long long int __uint64_t;

typedef long double float_t;

typedef __int64_t int64_t;
typedef __uint32_t uint32_t;
typedef __uint64_t uint64_t;


extern double fmod (double __x, double __y);

extern __inline __attribute__ ((__always_inline__)) __attribute__ ((__gnu_inline__)) double __attribute__ ((__nothrow__)) floor (double
__x) { register long double __value; register int __ignore; unsigned short int __cw; unsigned short int __cwtmp; __asm __volatile ("fnstcw %3\n\t" "movzwl %3, %1\n\t" "andl $0xf3ff, %1\n\t" "orl $0x0400, %1\n\t" "movw %w1, %2\n\t" "fldcw %2\n\t" "frndint\n\t" "fldcw %3" : "=t" (__value), "=&q" (__ignore), "=m" (__cwtmp), "=m" (__cw) : "0" (__x)); return __value; }



typedef struct {
    uint64_t significand;
    int exp;
} njs_diyfp_t;

static  njs_diyfp_t
njs_d2diyfp(double d)
{
    int biased_exp;
    uint64_t significand;
    njs_diyfp_t r;

    union {
        double d;
        uint64_t u64;
    } u;

    u.d = d;

    biased_exp = (u.u64 & (((uint64_t) (0x7FF00000) << 32) + (0x00000000))) >> 52;
    significand = u.u64 & (((uint64_t) (0x000FFFFF) << 32) + (0xFFFFFFFF));

    if (biased_exp != 0) {
        r.significand = significand + (((uint64_t) (0x00100000) << 32) + (0x00000000));
        r.exp = biased_exp - (((int64_t) 0x3ff) + 52);

    } else {
        r.significand = significand;
        r.exp = (-(((int64_t) 0x3ff) + 52)) + 1;
    }

    return r;
}

static int
number_to_string_radix(double number, uint32_t radix)
{
    int digit;
    char ch;
    double n, remainder, integer, fraction, delta;
    char *p, *end;
    uint32_t size;
    char buf[((1 + 1024) + (1 + 1075))];

    static const char *digits = "0123456789abcdefghijklmnopqrstuvwxyz";

    p = buf + (1 + 1024);
    end = p;

    n = number;

    integer = floor(n);
    p = buf + (1 + 1024);

    while (njs_d2diyfp(integer / radix).exp > 0) {
        integer /= radix;
        *(--p) = '0';
    }

    do {
        remainder = fmod(integer, radix);
        *(--p) = digits[(int) remainder];
        integer = (integer - remainder) / radix;

    } while (integer > 0);

    size = (uint32_t) (end - p);

    printf("%g.toString(%d) = %.*s\n", number, radix, size, p);

    if (memcmp(p, "ga894a06abs0000", size) != 0) {
        printf("ERROR expected \"ga894a06abs0000\"\n");
        return -1;
    }

    return 0;
}

int main() {
    number_to_string_radix(1e23, 36);
    return 0;
}
```

1) gcc -O2  minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06ac80000
ERROR expected "ga894a06abs0000"

2) gcc -O1  minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06ac80000
OK

3) gcc -O1 -ffast-math minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06ac80000
OK

4) gcc -O1 -fstrict-aliasing minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06ac80000
OK

5) removing inlined floor() fixed the issue

6) clang-6.0  -O2  minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06abs0000
Comment 1 xeioex 2022-07-02 02:06:30 UTC
TO fix the last confirmations

1) gcc -O2  minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06ac80000
ERROR expected "ga894a06abs0000"

2) gcc -O1  minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06abs0000
OK

3) gcc -O1 -ffast-math minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06abs0000
OK

4) gcc -O1 -fstrict-aliasing minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06abs0000
OK 

5) removing inlined floor() fixed the issue

6) clang-6.0  -O2  minified_to_string_radix.i -o 507 -lm && ./507
1e+23.toString(36) = ga894a06abs0000
Comment 2 Andrew Pinski 2022-07-02 02:09:25 UTC
-fexcess-precision=standard (-std=c99 enables -fexcess-precision=standard) or -mfpmath=sse fixes the issue.

This is not wrong code but rather the way x87 works for GCC.
GCC defaults to using the execess precision of x87 (80bits) and sometimes if the floating point value is kept on the fpu stack, there is no rounding back to 64bits. And that is exactly what you are seeing here really.

Anyways this is a dup of bug 323.
The reason why it works for the non-inline floor case is because well there is a rounding step that happens.

The reason why -fexcess-precision=standard works (it is only implemented for the C front-end) is because there rounding steps are now explict in the IR and will use the 80bit fpu and then force the rounding back.

The reason why -mfpmath=sse works is instead of using x87, GCC will use the sse fpu implementation which is 64bit without excess precision.

This is kinda not a bug, just you not understanding fpu and execess precision.

*** This bug has been marked as a duplicate of bug 323 ***
Comment 3 xeioex 2022-07-06 00:34:41 UTC
Is there a portable (across platforms and compilers) to ensure that double values are always 64 bits?
Comment 4 Andrew Pinski 2022-07-06 00:44:52 UTC
(In reply to xeioex from comment #3)
> Is there a portable (across platforms and compilers) to ensure that double
> values are always 64 bits?

It is still 64bit storage on i686, just uses the excessive precission while doing calculations.
You should read "What Every Computer Scientist Should Know About Floating-Point Arithmetic": https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf about this and all.
Comment 5 xeioex 2022-07-06 01:00:46 UTC
My question is more practical. For example while `-fexcess-precision=standard` fixes the problem in GCC. But, I am left with the same problem regarding other compilers. At least am looking for a way to detect excessive precision as soon as possible (during configure time).

Tried to use FLT_EVAL_METHOD

t.c
```
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <fenv.h>

int main(){
  printf("%d\n", (int) FLT_EVAL_METHOD);
}
```

1) gcc -o t t.c && ./t
2

2) gcc -fexcess-precision=standard -o t t.c && ./t
2


How am I expected to use FLT_EVAL_METHOD correctly?
Comment 6 Vincent Lefèvre 2022-10-14 09:47:33 UTC
(In reply to xeioex from comment #5)
> My question is more practical. For example while
> `-fexcess-precision=standard` fixes the problem in GCC. But, I am left with
> the same problem regarding other compilers.

The need for -fexcess-precision=standard is due to a "bug" in GCC (more precisely, a non-conformance issue with the ISO C standard). If the other compilers conform to the C standard by default, you won't have to do anything.

Note that even with -fexcess-precision=standard, you get double rounding, i.e. a first rounding to extended precision, then a second rounding to double precision. This is allowed by the C standard, but may break some algorithms or give results different from platforms with a single rounding.