GCC 4.3.0 compiled on Linux does NOT pass as many tests as GCC 3.4.4 for Cygwin. How seriously will people take _newer_ versions gcc if it can't pass the same tests as older versions did. Something has slipped over the years. Now that I have a great compiler I decide to do some tests to see how well it worked. I've done the usual "make -i check" tests and submitted the results. I decided to try some math library tests available on the internet. These tests are designed to catch flaws not to check trivial math operations. They were written by people whose life's work is mathmatics. My Cygwin gcc 3.4.4 passed "almost" all these tests, my linux compilers did not. I am not the one who built _all_ the linux versions of gcc, I only built the 4.2.0 and 4.3.0 versions. Please obtain and build these tests yourselves. I did not run every single test on every possible version but I did run the shortest test on every gcc I have. I am satisfied that gcc 4.3.0 does not pass all the tests that I tired and that Cygwin gcc 3.4.4 passed almost every test I tried. Here are some of my notes: Platform GCC Version Output File Name i686-pc-cygwin gcc 3.4.4 release paranoia_3.4.4_release-cygwin.txt i686-pc-linux-gnu gcc 4.2.0 20070501 paranoia_4.2.0_20070501-linux.txt i686-pc-linux-gnu gcc 4.3.0 20070529 paranoia_4.3.0_20070529-linux.txt i486-linux gcc 3.3.5 (Debian 1:3.3.5-13) paranoia_3.3.5_Debian-1:3.3.5-13-linux.txt i486-linux-gnu gcc 3.4.6 (Debian 3.4.6-5) paranoia_3.4.6_Debian-3.4.6-5-linux.txt i486-linux-gnu gcc 4.1.2 (Debian 4.1.1-21) paranoia_4.1.2_Debian-4.1.1-21-linux.txt All these diffs produce no output - the linux tests are all the same result. # diff -Naur paranoia_4.2.0_20070501-linux.txt paranoia_4.3.0_20070529-linux.txt # diff -Naur paranoia_4.2.0_20070501-linux.txt paranoia_3.3.5_Debian-1:3.3.5-13-linux.txt # diff -Naur paranoia_4.2.0_20070501-linux.txt paranoia_3.4.6_Debian-3.4.6-5-linux.txt # diff -Naur paranoia_4.2.0_20070501-linux.txt paranoia_4.1.2_Debian-4.1.1-21-linux.txt # Here is the diff for Cygwin vs. Linux: # diff -Naur paranoia_3.4.4_release-cygwin.txt paranoia_4.3.0_20070529-linux.txt --- paranoia_3.4.4_release-cygwin.txt 2007-05-31 11:02:18.000000000 -0700 +++ paranoia_4.3.0_20070529-linux.txt 2007-05-31 11:04:38.000000000 -0700 @@ -127,7 +127,8 @@ Test for sqrt monotonicity. sqrt has passed a test for Monotonicity. Testing whether sqrt is rounded or chopped. -Square root appears to be correctly rounded. +Square root is neither chopped nor correctly rounded. +Observed errors run from -5.0000000e-01 to 5.0000000e-01 ulps. To continue, press RETURN Diagnosis resumes after milestone Number 90 Page: 7 @@ -152,7 +153,11 @@ This computed value is O.K. Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065218e+00 as X -> 1. -Accuracy seems adequate. +DEFECT: Calculated 7.38905609548934539e+00 for + (1 + (-1.11022302462515654e-16) ^ (-1.80143985094819840e+16); + differs from correct value by -3.44130679508225512e-09 . + This much error may spoil financial + calculations involving tiny interest rates. Testing powers Z^Q at four nearly extreme values. ... no discrepancies found. @@ -188,7 +193,9 @@ Diagnosis resumes after milestone Number 220 Page: 10 +The number of DEFECTs discovered = 1. The number of FLAWs discovered = 1. -The arithmetic diagnosed seems Satisfactory though flawed. +The arithmetic diagnosed may be Acceptable +despite inconvenient Defects. END OF TEST. Both Cygwin's gcc 3.4.4 and Linux gcc 3.3.5, 3.4.6, 4.1.1, 4.2.0 and 4.3.0 have a lack of a sticky bit which is considered a "flaw" by the test; but this may be "Satisfactory". Checking rounding on multiply, divide and add/subtract. * is neither chopped nor correctly rounded. / is neither chopped nor correctly rounded. Addition/Subtraction neither rounds nor chops. Sticky bit used incorrectly or not at all. FLAW: lack(s) of guard digits or failure(s) to correctly round or chop (noted above) count as one flaw in the final tally below. Only the Linux gcc compilers and not Cygwin's 3.4.4 (release) version are off on one of the calculations by -3.44130679508225512e-09 . It is not much to some people, for others it is a lot. This led me to more testing. Here is another math test - http://www.netlib.org/fp/ucbtest.tgz : # ucbREADME/linux.sh Total 60 tests: pass 59, flags err 0, value err 1, acosd Total 352 tests: pass 352, flags err 0, value err 0, addd Total 77 tests: pass 77, flags err 0, value err 0, asind Total 104 tests: pass 104, flags err 0, value err 0, atan2d Total 57 tests: pass 57, flags err 0, value err 0, atand Total 126 tests: pass 126, flags err 0, value err 0, cabsd Total 99 tests: pass 99, flags err 0, value err 0, ceild Total 53 tests: pass 53, flags err 0, value err 0, cosd Total 68 tests: pass 56, flags err 0, value err 12, coshd Total 383 tests: pass 383, flags err 0, value err 0, divd Total 97 tests: pass 86, flags err 0, value err 11, expd Total 37 tests: pass 37, flags err 0, value err 0, fabsd Total 103 tests: pass 103, flags err 0, value err 0, floord Total 352 tests: pass 352, flags err 0, value err 0, fmodd Total 126 tests: pass 126, flags err 0, value err 0, hypotd Total 89 tests: pass 89, flags err 0, value err 0, log10d Total 83 tests: pass 83, flags err 0, value err 0, logd Total 340 tests: pass 340, flags err 0, value err 0, muld Total 1543 tests: pass 1505, flags err 0, value err 38, powd Total 52 tests: pass 52, flags err 0, value err 0, sind Total 72 tests: pass 66, flags err 0, value err 6, sinhd Total 102 tests: pass 102, flags err 0, value err 0, sqrtd Total 321 tests: pass 321, flags err 0, value err 0, subd Total 54 tests: pass 54, flags err 0, value err 0, tand Total 72 tests: pass 68, flags err 0, value err 4, tanhd That doesn't happen with Cygwin's gcc 3.4.4. Here is the TestFloat / SoftFloat tests: TestFloat is a program for testing whether a computer's floating-point conforms to the IEC/IEEE Standard for Binary Floating-point Arithmetic. TestFloat works by comparing the behavior of the machine's floating-point with that of the SoftFloat software implementation of floating-point. Any differences found are reported as probable errors in the machine's floating-point. http://www.jhauser.us/arithmetic/TestFloat.html TestFloat and SoftFloat source files compress'ed tar archive, TestFloat-2a.tar.Z [150 kB]. http://www.jhauser.us/arithmetic/TestFloat-2a.tar.Z compress'ed tar archive, SoftFloat-2b.tar.Z [165 kB]. http://www.jhauser.us/arithmetic/SoftFloat-2b.tar.Z Here are the results of only two of the tests: # ./testsoftfloat -level 1 -errors 2000000 int32_to_float32 float32_add > /dev/null Testing float32_add, rounding nearest_even. 46464 tests total. 46464 tests performed; 39069 errors found. Testing float32_add, rounding to_zero. 46464 tests total. 46464 tests performed; 39099 errors found. Testing float32_add, rounding down. 46464 tests total. 46464 tests performed; 39213 errors found. Testing float32_add, rounding up. 46464 tests total. 46464 tests performed; 39150 errors found. # ./testsoftfloat -level 1 -errors 2000000 int32_to_float32 float32_eq > /dev/null Testing float32_eq. 46464 tests total. 46464 tests performed; 1321 errors found. Finally I tested GSL - GNU Scientific Library http://www.gnu.org/software/gsl The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. The current version is GSL-1.9. It was released on 21 February 2007. This is a stable release. ftp://ftp.gnu.org/gnu/gsl/ 02/20/2007 03:36PM 2,574,939 gsl-1.9.tar.gz ftp://ftp.gnu.org/gnu/gsl/gsl-1.9.tar.gz When compiled and checked GSL works flawlessly under Cygwin 3.4.4 but fails both the interpolation and sort tests under gcc 4.3.0 20070529 . WinXP - i686-pc-cygwin: $ cd /cygdrive/c/gsl-1.9 $ make -i -k check 2>&1 | tee check_1_log.txt $ grep -B 2 -A 2 fail check_1_log.txt $ (Prints NOTHING) Debian - i686-pc-linux-gnu: # cd /root/downloads/gsl-1.9 $ make -i -k check 2>&1 | tee check_1_log.txt # grep -B 2 -A 2 fail check_1_log.txt FAIL: test =================== 1 of 1 tests failed =================== make[2]: [check-TESTS] Error 1 (ignored) -- FAIL: test =================== 1 of 1 tests failed =================== make[2]: [check-TESTS] Error 1 (ignored) Cygwin: make[2]: Entering directory `/cygdrive/c/gsl-1.9/interpolation' Completed [1100/1100] PASS: test.exe ================== All 1 tests passed ================== make[2]: Leaving directory `/cygdrive/c/gsl-1.9/interpolation' Cygwin: make[2]: Entering directory `/cygdrive/c/gsl-1.9/sort' Completed [21600/21600] PASS: test.exe ================== All 1 tests passed ================== make[2]: Leaving directory `/cygdrive/c/gsl-1.9/sort' Linux: make[2]: Entering directory `/root/downloads/gsl-1.9/interpolation' FAIL: gsl_interp_eval_e linear [7] FAIL: gsl_interp_eval_deriv_e linear [8] FAIL: linear deriv 0 (0 observed vs 5.30544087554984718e-315 expected) [test uses subnormal value] [11] FAIL: linear integ 0 (0 observed vs 2.19361877441406383 expected) [12] (Over 900 lines of FAIL:) FAIL: cspline-periodic 60 (4.99961591105934785e-270 observed vs 6.8941659943411544 expected) [1072] FAIL: cspline-periodic deriv 60 (0 observed vs 7.16157787531728253e-313 expected) [test uses subnormal value] [1073] FAIL: cspline-periodic integ 60 (0 observed vs 6.89057922363291553 expected) [1074] FAIL: cspline periodic 3pt interpolation [1075] FAIL: test =================== 1 of 1 tests failed =================== make[2]: [check-TESTS] Error 1 (ignored) make[2]: Leaving directory `/root/downloads/gsl-1.9/interpolation' Linux: make[2]: Entering directory `/root/downloads/gsl-1.9/sort' FAIL: indexing gsl_vector_char, n = 128, stride = 1, ordered [19999] FAIL: sorting, gsl_vector_char, n = 128, stride = 1, ordered [20000] FAIL: smallest, gsl_vector_char, n = 128, stride = 1, ordered [20001] FAIL: largest, gsl_vector_char, n = 128, stride = 1, ordered [20002] (Over 120 lines of FAIL:) FAIL: sorting, gsl_vector_char, n = 512, stride = 3, randomized [21596] FAIL: smallest, gsl_vector_char, n = 512, stride = 3, randomized [21597] FAIL: largest, gsl_vector_char, n = 512, stride = 3, randomized [21598] FAIL: test =================== 1 of 1 tests failed =================== make[2]: [check-TESTS] Error 1 (ignored) make[2]: Leaving directory `/root/downloads/gsl-1.9/sort' GCC need a Ph.D. of math to give it a once over. If an old version of gcc can pass these tests there is an error preventing newer versions from passing. DRAFT Standard for Floating-Point Arithmetic P754 - Draft 1.3.0 Modified at 17:15 GMT on February 23, 2007 http://www.validlab.com/754R/drafts/archive/2007-02-23.pdf
Can you try with -ffloat-store, this might be the normal x87 issue (see PR 323).
Did GSL and Paranoia with -ffloat-store for gcc 4.3.0, same result. Instead of "the normal x87 issue" it might be a "libm issue" since it works with Cygwin's gcc but fails with all the Linux gcc's. Here is something that gsl configure prints: checking for inline... inline checking for extern inline... yes checking ieeefp.h usability... no checking ieeefp.h presence... no checking for ieeefp.h... no checking for vprintf... yes checking for _doprnt... no ... checking for cos in -lm... yes checking whether feenableexcept is declared... yes checking whether fesettrapenable is declared... no checking whether hypot is declared... yes checking whether expm1 is declared... yes checking whether acosh is declared... yes checking whether asinh is declared... yes checking whether atanh is declared... yes checking whether ldexp is declared... yes checking whether frexp is declared... yes checking whether isinf is declared... yes checking whether finite is declared... yes checking whether isfinite is declared... no checking whether isnan is declared... yes checking whether log1p is declared... yes checking for long double stdio... yes checking for extended floating point registers... yes checking for IEEE arithmetic interface type... gnux86 checking for FPU_SETCW... yes checking for IEEE compiler flags... none checking for IEEE comparisons... yes checking for IEEE denormalized values... yes ... When Cygwin builds gcc it answers yes to these questions: checking ieeefp.h usability... no checking ieeefp.h presence... no It is good to see that GSL check the need for FPU_SETCW. GSL, like GCC, uses "-lm" whereas GCC's libjava builds and uses it's own libfdlibm.a (but unfortunately _seems_ to use "-lm" in a few places). If these programs CAN be compiled on target i686-pc-cygwin (old gcc) WITHOUT errors and can NOT be compiled on target i686-pc-linux-gnu (newer gcc) WITHOUT errors then NEWER gcc is not implementing it's math with the same accuracy as the older gcc does. I am using the same computer to run both tests so, in theory, it would be possible to alter GCC 4.3.0 (and 4.2.1, etc.) so that it's math results were as accurate as they once were.
Here is simple test for the "float-store" issue: main() { double v = 1E308; double x = (v * v) / v; printf("Try compiling with and without -ffloat-store\n"); printf("(1E308 * 1E308) / 1E308\n"); printf("Correct output is 'inf 0' - Incorrect output is '1e+308 1'.\n"); printf("%g %d\n", x, x==v); } Reading the info file for cpp I notice in the node "Common Predefined Macros" that we define various macros for many things but nothing for "-ffloat-store". We have __NO_INLINE__ for -fno-inline We have __USER_LABEL_PREFIX__ for -f(no-)underscores We have __EXCEPTIONS for -fno-exceptions We have __NEXT_RUNTIME__ for -fnext-runtime We have __SSP__ for -fstack-protector We have __SSP_ALL__ for -fstack-protector-all On the basis that using "-ffloat-store" alters the resulting output of gcc I propose we use cpp_define() in gcc-4_3-trunk/gcc/c-cppbuiltin.c to add "__FLOAT_STORE__" for -ffloat-store. We should also fix the code so the math is as accurate as the standard provides for; regardless of use of -ffloat-store or -ffast-math, -msoft-float, -mfpmath=xyz, -msse*, -minline-float-*, -minline-sqrt-*, or anything else.
I copied gcc-4_3-build/i686-pc-linux-gnu/libjava/classpath/native/fdlibm/.libs/libfdlibm.a to my current directory and instead of using "-lm" I used "./libfdlibm.a" ... Guess what. I propose this simple and useful fix: We need to re-arrange gcc to build a directory like "gcc-4_3-build/libdecnumber" and "gcc-4_3-build/prev-libdecnumber" called "gcc-4_3-build/libfdlibm" "gcc-4_3-build/prev-libfdlibm". The first pass can build it using "-O0" so it is certain to work (or we must fix that) and the next pass can build it with profiling and "-O3" enabled. Profiling works on 4.2.0 (but not on 4.3.0). A message can be printed that if profiling the math library fails to rebuild by typing "make nomathprofile". The third pass can use the profiled information to build the math library in a fully optimized state. The resulting super-optimized math library can then be used for the libjava library in the fourth "library build pass" and any other language that wants to take advantage of this. No effort would be wasted and we would have a fast / accurate library that works on all platforms - OR is the libfdlibm not a good plan and should be removed from libjava? If you have this page on your computer look at /usr/share/doc/glibc-2.5/manual/libc/Errors-in-Math-Functions.html - it is created when you compile libc6. That page is a report of the libc6 tests that are ran when the code is built and shows the resulting errors for every function (as is suggested in IEEE754). It looks like this (edited for brevity): The table lists the ULP values for different architectures. Different architectures have different results since their hardware support for floating-point operations varies and also the existing hardware support is different. Function Alpha Generic ix86 IA64 PowerPC acosf - - - - - acos - - - - - cosf 1 - 1 1 1 cos 2 - 2 2 2 cosl - - 1 1 1 logl - - - - 1 log10f 2 - 1 1 2 log10 1 - - - 1 log10l - - 1 1 1 powl - - - - 1 ... Since we link some of gcc with "-lm" and some with libfdlibm we have an inconsistant math library upon which gcc is based. We can standardize and test our math library to provide the same (correct) result on all platforms (as IEEE 754 requires). Lets remove "-lm" from gcc and use libfdlibm on as many platforms as is possible, where that doesn't work we simply let them use "-lm", like they did before, and they will be no worse off. The rest of us will be much better off.
(In reply to comment #4) > Function Alpha Generic ix86 IA64 PowerPC > > acosf - - - - - > acos - - - - - > cosf 1 - 1 1 1 > cos 2 - 2 2 2 > cosl - - 1 1 1 > logl - - - - 1 > log10f 2 - 1 1 2 > log10 1 - - - 1 > log10l - - 1 1 1 > powl - - - - 1 > ... > Since we link some of gcc with "-lm" and some with libfdlibm we have an > inconsistant math library upon which gcc is based. We can standardize and test > our math library to provide the same (correct) result on all platforms (as IEEE > 754 requires). IEEE 754 does not discuss any of the functions you list above. In fact, IEEE 754 places requirements on exactly one function in libm, and that is sqrt(), which must be exact in all rounding modes.
I just ran a c version of double precision paranoia, and a single precsion f77 version with latest gcc and gfortran trunk as well as with g77 from 3.4 vintage and in all cases I get this: No failures, defects nor flaws have been discovered. Rounding appears to conform to the proposed IEEE standard P754. The arithmetic diagnosed appears to be Excellent! END OF TEST. Maybe I don't have the latest version of paranoia. Can you suggest a download URL?
>> kargl@gcc.gnu.org >> IEEE 754 does not discuss any of the functions you list above. In fact, >> IEEE 754 places requirements on exactly one function in libm, and that is >> sqrt(), which must be exact in all rounding modes. I did a search of this document for "sqrt". Adobe reader says the word is not there. DRAFT Standard for Floating-Point Arithmetic P754 - Draft 1.3.0 Modified at 17:15 GMT on February 23, 2007 http://www.validlab.com/754R/drafts/archive/2007-02-23.pdf I realize that is a draft, perhaps there is an ommision. Here is a clip from the document: Introduction ... PURPOSE: This standard provides a discipline for performing floating-point computation that yields results independent of whether the processing is done in hardware, software, or a combination of the two. For operations specified in the normative part of this standard, numerical results and exceptions are uniquely determined by the values of the input data, sequence of operations, and destination formats, all under user control. This standard defines a family of commercially feasible ways for systems to perform binary and decimal floating-point arithmetic. Among the desiderata that guided the formulation of this standard were a)Facilitate movement of existing programs from diverse computers to those that adhere to this standard. b)Enhance the capabilities and safety available to users and programmers who, though not expert in numerical methods, may well be attempting to produce numerically sophisticated programs. However, we recognize that utility and safety are sometimes antagonists. c)Encourage experts to develop and distribute robust and efficient numerical programs that are portable, by way of minor editing and recompilation, onto any computer that conforms to this standard and possesses adequate capacity. When restricted to a declared subset of the standard, these programs should produce identical results on all conforming systems. d)Provide direct support for 1)Execution-time diagnosis of anomalies 2)Smoother handling of exceptions 3)Interval arithmetic at a reasonable cost. e)Provide for development of 1)Standard elementary functions such as exp and cos 2)Very high precision (multiword) arithmetic 3)Coupling of numerical and symbolic algebraic computation. f)Enable rather than preclude further refinements and extensions. 2. References The following referenced documents are indispensable for the application of this standard: ANSI/IEEE Std 754–1985 R1990, IEEE Standard for Binary Floating-Point Arithmetic.[1] ISO/IEC 9899, Second edition 1999-12-01, Programming languages — C.[2] [1] IEEE publications are available from the Institute of Electrical and Electronics Engineers, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331, USA. [2] ISO publications are available from the ISO Central Secretariat, Case Postale 56, 1 rue de Varembé, CH-1211, Genève 20, Switzerland/Suisse. ISO publications are also available in the United States from the Sales Department, American National Standards Institute, 11 West 42nd Street, 13th Floor, New York, NY 10036, USA. Annex B Expression evaluation B.3 Reproducible results Languages should provide means for programmers to specify reproducible results that are identical on all platforms supporting that language and this standard, for operations completely specified by this standard. Annex D Elementary transcendental functions All the directives in this annex are optional. ... Functions implemented as directed here have the properties that they preserve monotonicity and reproduce results exactly from implementation to implementation. ... Languages should define which functions are required or recommended to be provided in correctly-rounded versions. When a language chooses not to specify a function as conforming to this annex, each implementation should document the available domain, exceptional cases, worst-case accuracies achieved, and indicate whether the accuracies are proven or measured for a subset of inputs. --- Please Note: "Languages should provide means for programmers to specify reproducible results that are identical on all platforms supporting that language and this standard, for operations completely specified by this standard." "When a language chooses not to specify a function as conforming to this annex, each implementation should document the available domain, exceptional cases, worst-case accuracies achieved, and indicate whether the accuracies are proven or measured for a subset of inputs." I understand "should" is defined in the document in section 3.1.4. You said: "IEEE 754 does not discuss any of the functions you list above. In fact, IEEE 754 places requirements on exactly one function in libm, and that is sqrt(), which must be exact in all rounding modes." I say: The language's standard chooses the functions it will support. IE: "BASIC" does not have to support complex numbers (in the old days, maybe today it does). IEEE 754r does not say how to write a compliant BASIC compiler (another standard might). IEEE 754r does say that if you include a function then you should be able to "reproduce results exactly from implementation to implementation". It is fair to say _correct_ results should be identical on any platform.
> jvdelisle@gcc.gnu.org > Can you suggest a download URL? Mine is from ATT. I didn't save the URL. Your output is certainly a few pages shorter than mine. There is a Paranoia test included in Ucbtest that has output similar to what I get - the URL for that is given above. What hardware are you using and which version of libm is installed?
There is a wiki here - your contributions are appreciated. http://en.wikipedia.org/wiki/IEEE_754r
(In reply to comment #7) > >> kargl@gcc.gnu.org > >> IEEE 754 does not discuss any of the functions you list above. In fact, > >> IEEE 754 places requirements on exactly one function in libm, and that is > >> sqrt(), which must be exact in all rounding modes. > > > I did a search of this document for "sqrt". Adobe reader says the word is not > there. > > DRAFT Standard for Floating-Point Arithmetic P754 - Draft 1.3.0 > Modified at 17:15 GMT on February 23, 2007 > http://www.validlab.com/754R/drafts/archive/2007-02-23.pdf Compare the current working draft of the revised IEEE 754 standard to the standing Standard is similar to a comparison of a Boeing 747 to the Wright brothers' aircraft. The standing IEEE 754 standard is 23 pages in total length. Here is the text concerning square root: 5.2. Square Root The square root operation shall be provided in all supported formats. The result is defined and has a positive sign for all operands >= 0, except that sqrt(0) shall be 0. The destination format shall be at least as wide as the operand's. The result shall be rounded as specified in Section 4. There are a few functions described in the Appendix, which contains this disclaimer: "(This Appendix is not a part of ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic.)"
>> kargl@gcc.gnu.org >> IEEE 754 does not discuss any of the functions you list above. >> Comment #4 From Rob >> That page is a report of the libc6 tests that are ran when the code is built >> kargl@gcc.gnu.org >> Compare the ... is similar to a comparison of a Boeing 747 to the Wright brothers' aircraft. That is why is was good of people who replied to tell us about the version of libm and the hardware they are using along with any parameters they may have altered that might affect testing. It was helpful of them to either "confirm" or "deny" _this_ bug report on target i686-pc-linux-gnu - but it works on a Cray does not assist this report. While Boeing _may_ have had pictures of the Wright brother's aircraft on the wall and considered it's principles when the 747 was designed they are likely going with modern methods of design and construction. I made some more builds of GSL using various combinations of -mfpmath=sse -mfpmath=387 -ffloat-store -fno-float-store and the OS's libm vs. fdlibm: # grep FAIL: ../gsl-1.9-New-Lib/check_1_log.txt | wc -l 1031 # grep FAIL: ../gsl-1.9-New-Lib_float-store/check_1_log.txt | wc -l 1051 # grep FAIL: ../gsl-1.9-fdlibm/check_1_log.txt | wc -l 1036 # grep FAIL: ../gsl-1.9-New-Lib_SSE/check_1_log.txt | wc -l 1030 # grep FAIL: ../gsl-1.9-Old-Lib/check_1_log.txt | wc -l 1063 grep FAIL: ../gsl-1.9-New-Lib_SSE_float-store/check_1_log.txt | wc -l 1051 Currently, using the new libc6 2.6 libm and -mfpmath=sse is one fail better than using -mfpmath=387. I'm doing some more testing to see if I can track down the problem.
I've done some more testing. With GNU/Linux 4.0 the file: /usr/include/bits/mathdef.h has this in it: # if defined __FLT_EVAL_METHOD__ && __FLT_EVAL_METHOD__ == 0 /* When using -mfpmath=sse, values are computed with the precission of the used type. */ typedef float float_t; /* `float' expressions are evaluated as `float'. */ typedef double double_t; /* `double' expressions are evaluated as `double'. */ # else /* The ix87 FPUs evaluate all values in the 80 bit floating-point format which is also available for the user as `long double'. Therefore we define: */ typedef long double float_t; /* `float' expressions are evaluated as `long double'. */ typedef long double double_t; /* `double' expressions are evaluated as `long double'. */ # endif That means, for many people but NOT everyone, that floats and doubles are actually _long_ doubles. I have not examined every line of GCC to determine if in fact GCC is using floats and doubles at the correct size - regardless of the actual size they are stored at. Since Cygwin's GCC 3.4.4 doesn't have long doubles it does not do that. Memory access is quicker but things like sticky bits will not be in the right spot and there will be too much precision. "Too much precision" is bad when tests rely on an exact implementation - otherwise it is usually OK. I've done a simple hack to the _root_ ./configure file ONLY and NOT changed one line of GCC's source code to alter the manner in which GCC is configured. It will need more testing - you _might_ be able to simulate what I am doing by typing: export CFLAGS="-mfpmath=sse -msse2" After doing that simply configure and make GCC as you normally would. (Note: I am _not_ using the above CFLAGS method but making more complicated changes within the root configuration file - so your results _might_ not be the same). After building GCC all my tests where the SAME, failures and passes were equal. I was also able to compile mpfr without error (as before). The ONLY difference was that Paranoia was in a worse state than before, failing miserably. I altered paranoia to use long doubles instead of doubles and re-ran it. This is what it says now (edited): ... Precision: long double; Version: 10 February 1989; ... Searching for Radix and Precision. Radix = 2 . Closest relative separation found is U1 = 5.421010862427522170037264e-20 . The number of significant digits of the Radix is 64 . ... Checking rounding on multiply, divide and add/subtract. Multiplication appears to round correctly. Division appears to round correctly. Addition/Subtraction appears to round correctly. Checking for sticky bit. Sticky bit apparently used correctly. ... Running test of square root(x). Testing if sqrt(X * X) == X for 20 Integers X. Test for sqrt monotonicity. sqrt has passed a test for Monotonicity. Testing whether sqrt is rounded or chopped. Square root appears to be correctly rounded ... Seeking Underflow thresholds UfThold and E0. Smallest strictly positive number found is E0 = 3.6452e-4951 . Since comparison denies Z = 0, evaluating (Z + Z) / Z should be safe. What the machine gets for (Z + Z) / Z is 2 . This is O.K., provided Over/Underflow has NOT just been signaled. Underflow is gradual; it incurs Absolute Error = (roundoff in UfThold) < E0. The Underflow threshold is 3.36210314311209350662719777e-4932, below which calculation may suffer larger Relative error than merely roundoff. Since underflow occurs below the threshold UfThold = (2) ^ (-16382) only underflow should afflict the expression (2) ^ (-32764); actually calculating yields: 0 . This computed value is O.K. Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065022739794268 as X -> 1. Accuracy seems adequate. Testing powers Z^Q at four nearly extreme values. ... no discrepancies found. ... Searching for Overflow threshold: This may generate an error. Can `Z = -Y' overflow? Trying it on Y = -inf . Seems O.K. Overflow threshold is V = 1.18973149535723176502126385e+4932 . Overflow saturates at V0 = inf . No Overflow should be signaled for V * 1 = 1.18973149535723176502126385e+4932 nor for V / 1 = 1.18973149535723176502126385e+4932 . No failures, defects nor flaws have been discovered. Rounding appears to conform to the proposed IEEE standard P854. The arithmetic diagnosed appears to be Excellent! -- Note: Previously I was getting output like this: Searching for Radix and Precision. Radix = 2.000000 . Closest relative separation found is U1 = 1.1102230e-16 . Recalculating radix and precision confirms closest relative separation U1 . Radix confirmed. The number of significant digits of the Radix is 53.000000 . ... Checking rounding on multiply, divide and add/subtract. * is neither chopped nor correctly rounded. / is neither chopped nor correctly rounded. Addition/Subtraction neither rounds nor chops. Sticky bit used incorrectly or not at all. FLAW: lack(s) of guard digits or failure(s) to correctly round or chop (noted above) count as one flaw in the final tally below. Does Multiplication commute? Testing on 20 random pairs. No failures found in 20 integer pairs. Running test of square root(x). Testing if sqrt(X * X) == X for 20 Integers X. Test for sqrt monotonicity. sqrt has passed a test for Monotonicity. Testing whether sqrt is rounded or chopped. Square root is neither chopped nor correctly rounded. Observed errors run from -5.0000000e-01 to 5.0000000e-01 ulps. ... Seeking Underflow thresholds UfThold and E0. Smallest strictly positive number found is E0 = 4.94066e-324 . Since comparison denies Z = 0, evaluating (Z + Z) / Z should be safe. What the machine gets for (Z + Z) / Z is 2.00000000000000000e+00 . This is O.K., provided Over/Underflow has NOT just been signaled. Underflow is gradual; it incurs Absolute Error = (roundoff in UfThold) < E0. The Underflow threshold is 2.22507385850720188e-308, below which calculation may suffer larger Relative error than merely roundoff. Since underflow occurs below the threshold UfThold = (2.00000000000000000e+00) ^ (-1.02200000000000000e+03) only underflow should afflict the expression (2.00000000000000000e+00) ^ (-2.04400000000000000e+03); actually calculating yields: 0.00000000000000000e+00 . This computed value is O.K. Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065218e+00 as X -> 1. DEFECT: Calculated 7.38905609548934539e+00 for (1 + (-1.11022302462515654e-16) ^ (-1.80143985094819840e+16); differs from correct value by -3.44130679508225512e-09 . This much error may spoil financial calculations involving tiny interest rates. Testing powers Z^Q at four nearly extreme values. ... no discrepancies found. ... Searching for Overflow threshold: This may generate an error. Can `Z = -Y' overflow? Trying it on Y = -inf . Seems O.K. Overflow threshold is V = 1.79769313486231571e+308 . Overflow saturates at V0 = inf . No Overflow should be signaled for V * 1 = 1.79769313486231571e+308 nor for V / 1 = 1.79769313486231571e+308 . ... The number of DEFECTs discovered = 1. The number of FLAWs discovered = 1. The arithmetic diagnosed may be Acceptable despite inconvenient Defects. As previously stated above, I did NOT alter the _source_ code, only the way that GCC is configured. GCC was already using "too much precision", now it has "way too much precision" - that is better and no worse than "too much precision". All programs I tested compiled the same with the same passes or failures. Only Paranoia performs differently and after fixing it, to test for more precision, it says that GCC is better than it ever was. I will do more testing before submitting a dozen line patch.
One other thing: When I build and test the origonal (un-modified) Paranoia with GCC I can compile with different flags and get different results - but these are "_similar_" flags :( 1 Defect 1 Flaw -mmmx -msse -m3dnow 1 Defect -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -msse4a The code for "-mmmx", "-msse", and "-m3dnow" produce worse result on the tests than the other SSE flags - the results should be the same (only the FPU code should differ and that should not alter the result of math operations since they are all SSE and not 387).
Here are the test results. I enabled almost every possible option and all the checking that is functional. Nearly every test passed. I diffed it with a result from a few days ago (before the mod), I do not seem to have caused any failures. I now have more accurate math than a "normal" build. Results for 4.3.0 20070606 (experimental) testsuite on i686-pc-linux-gnu http://gcc.gnu.org/ml/gcc-testresults/2007-06/msg00372.html
Correctly Rounded mathematical library http://lipforge.ens-lyon.fr/www/crlibm/index.html CRlibm, an efficient and proven correctly-rounded mathematical library CRlibm is a free mathematical library (libm) which provides: * implementations of the double-precision C99 standard elementary functions, * correctly rounded in the four IEEE-754 rounding modes, * with a comprehensive proof of both the algorithms used and their implementation, * sufficiently efficient in average time, worst-case time, and memory consumption to replace existing libms transparently, CRlibm is distributed under the GNU Lesser General Public License (LGPL). Included in the distribution is an extensive documentation with the proof of each function (currently more than 100 pages), as well as all the Maple scripts used to develop the functions. This makes this library an excellent tutorial on software elementary function development. Note than they provide "proof" for the library. I'll examine this link more closely. Nearly finished my patch testing - may be a few days. Some functions are _many_ times faster than MPFR.
I did some testing on the CVS of crlibm, (it needs a few files from crlibm-1.0beta1.tar.gz). The new docs list these interesting features: • portable to any system implementing the ISO-C99 and IEEE-754 standards, • correctly rounded in the four rounding modes, • proven, both theoretically and in the implementation, • and reasonably efficient in terms of performance (both average and worst-case) and resource usage, • exploit fused multiply-and-add operators for the Itanium and PowerPC platforms. • optimized for double precision arithmetic and float • the subtleties of subnormal numbers are handled properly • Compared with the best libraries crlibm offers a precision improvement of a fraction of a unit in the last place (ulp) in round to nearest mode but in the three other rounding modes (used in interval arithmetic) gains up to one ulp of precision in each computation. Results: GNU C Library 2.6 crlibm version 1.0beta1 GMP version 4.2.1 MPFR version 2.2.1-p5 min time avg time max time -------------- log -------------- default libm 65 204 1006 MPFR 155 17637 67921 CRLibm 35 127 896 -------------- exp -------------- default libm 151 154 164 MPFR 8951 11970 24460 CRLibm 215 222 1252 -------------- sin -------------- default libm 118 137 174 MPFR 9189 31276 85022 CRLibm 77 198 10513 -------------- cos -------------- default libm 116 136 177 MPFR 4065 21546 66963 CRLibm 69 203 9929 -------------- tan -------------- default libm 126 150 188 MPFR 12525 36528 93573 CRLibm 97 335 24587 -------------- asin -------------- default libm 257 270 278 MPFR 31985 104582 533268 CRLibm 69 319 2442 -------------- acos -------------- default libm 266 281 293 MPFR 32585 102440 533559 CRLibm 99 335 2435 -------------- atan -------------- default libm 178 192 197 MPFR 18172 35185 261762 CRLibm 101 265 11572 -------------- log10 -------------- default libm 67 206 1004 MPFR 157 39135 159891 CRLibm 38 326 1691 -------------- log2 -------------- default libm 68 212 1021 MPFR 155 21607 84683 CRLibm 38 327 1695 -------------- sinpi -------------- default libm 127 136 153 MPFR 12884 20203 70476 CRLibm 324 349 1242 -------------- cospi -------------- default libm 127 136 157 MPFR 8461 13748 37559 CRLibm 914 935 961 -------------- tanpi -------------- default libm 135 148 169 MPFR 15784 23623 59309 CRLibm 2340 2441 2479 The libm library is the least accurate and on average the fastest; though not fastest for every function instance. The most accurate is always CRLibm, sometimes it is fastest. The MPFR library is second most accurate and from 50 to over 300 times slower than CRLibm. Here are a few tests of transcendental functions using normal rounding: # ./crlibm_soaktest acos RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 275 failures out of 1000000 (ratio 2.750000e-04) # ./crlibm_soaktest acospi RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 17393 failures out of 1000000 (ratio 1.739300e-02) # ./crlibm_soaktest asin RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 5837 failures out of 1000000 (ratio 5.837000e-03) # ./crlibm_soaktest asinpi RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 252488 failures out of 1000000 (ratio 2.524880e-01) # ./crlibm_soaktest atan RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 220 failures out of 1000000 (ratio 2.200000e-04) # ./crlibm_soaktest atanpi RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 318734 failures out of 1000000 (ratio 3.187340e-01) # ./crlibm_soaktest cos RN 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 413003 failures out of 1000000 (ratio 4.130030e-01) Here is one example of testing the "cos" function with other roundings: # ./crlibm_soaktest cos RU 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 695290 failures out of 1000000 (ratio 6.952900e-01) # ./crlibm_soaktest cos RD 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 694636 failures out of 1000000 (ratio 6.946360e-01) # ./crlibm_soaktest cos RZ 1 | grep 1000000 CRLIBM : 0 failures out of 1000000 (ratio 0.000000e+00) LIBM : 695241 failures out of 1000000 (ratio 6.952410e-01) The libm library does better with normal rounding, in the other modes it is worse. Here is another compilation of paranoia. I replaced the "log" and "pow" functions with crlibm. The "sqrt" and "floor" are not implemented in crlibm so I used libm. # gcc -o paranoia_crlibm paranoia.c -Dlog=log_rn -Dpow=pow_rn -lcrlibm -lm # ./paranoia_crlibm ... No failures, defects nor flaws have been discovered. Rounding appears to conform to the proposed IEEE standard P754, except for possibly Double Rounding during Gradual Underflow. The arithmetic diagnosed appears to be Excellent! Considering the nasty command line used to compile the test it worked well. I'm going to see if I can make a patch that takes parts of glibc, fdlibm and _all_ of crlibm and uses it to replace all instances of "-lm" and "-lfdlibm" in gcc 4.3.0 after some more testing. Then we can have gcc provide the same (exactly correct to a part of a ulp) answer on any platform. The support for PowerPC and Itanium processors will appeal to some. The compilation time is a couple of minutes, the check tests perform many thousands of tests in seconds. I'm still reading the "proof". Note: "Proof" does NOT mean it passes "check tests" - "proof" means that there is a mechanical proof (computer program) that proves with advanced math that the algorithms are correct. This will improve our documentation where we make claims of IEEE 754 compatability (but don't offer it on all platforms).
Subject: Re: Paranoia UCB GSL TestFloat libm tests fail - accuracy of recent gcc math poor I previously suggested to the crlibm authors that they consider assigning it to the FSF for contribution to libgcc-math <http://sourceware.org/ml/libc-alpha/2007-02/msg00010.html>. I don't know if anything has happened on that since then.
> The libm library is the least accurate and on average the fastest; though not > fastest for every function instance. The most accurate is always CRLibm, > sometimes it is fastest. The MPFR library is second most accurate and from 50 > to over 300 times slower than CRLibm. You've shown nothing to validate that crlibm is more accurate than mpfr. So how did you do this measurement?
>>You've shown nothing to validate that crlibm is more accurate than mpfr. >>So how did you do this measurement? Read 1st section of http://www.mpfr.org/faq.html and either crlibm-0.18beta1.pdf or better still crlibm-1.01beta1.pdf (build from the CVS). I've done some tweaking which further doubles the speed in some cases by playing with the gcc 4.3.0 command line options. If all you want is _one_ ulp then the speed increase alone should be enough to prefer it over mpfr; for such a low standard a difference can not be found (except in speed and proof, of which mpfr has none).
Created attachment 13709 [details] Specific example where libm, libcrlibm, and mpfr differ Here is a specific example of three different math libraries providing three different answers to the same question. The number does not use very many decimal places and thus __could__ come up in common use. It is not really how likely the number would be used that is important but what the result of using the answer would be. If your life is important don't be it on this number. This is just one number. How many more could there be, how will you prove you are correct and deduce the actual correct answer in those instances. This is what you must answer. This is why we need a fast, simple, library that is accurate and comes with "proof".
(In reply to comment #20) > Created an attachment (id=13709) [edit] > Specific example where libm, libcrlibm, and mpfr differ > > Here is a specific example of three different math libraries providing three > different answers to the same question. The number does not use very many > decimal places and thus __could__ come up in common use. > > It is not really how likely the number would be used that is important but what > the result of using the answer would be. If your life is important don't be it > on this number. > > This is just one number. How many more could there be, how will you prove you > are correct and deduce the actual correct answer in those instances. This is > what you must answer. This is why we need a fast, simple, library that is > accurate and comes with "proof". > What does this have to do with GCC developement? GCC does not supply libm. That's an OS vendor library. GCC can't include crlibm at this time. See Joseph's comment. Additionally, crlibm does not support float or long double. GCC does not supply mpfr. It uses mpfr to do constant folding. AFAIK, sufficient guard digits are used to achieve the desired accuracy. Among all the noise that you've generated, I've become lost in the waht exactly you think is broken. On x86_64-*-*freebsd, dpara.f from netlib with gfortran 4.3, I get No failures, defects nor flaws have been discovered. Rounding appears to conform to the proposed IEEE standard P754 The arithmetic diagnosed appears to be Excellent! End of Test. with gfc4x -o z dpara.f gfc4x -o z -O dpara.f gfc4x -o z -O2 dpara.f I get similar results with spara.f from netlib. What, in 25 or fewer words, am I missing?
Subject: Re: Paranoia UCB GSL TestFloat libm tests fail - accuracy of recent gcc math poor On Fri, 15 Jun 2007, rob1weld at aol dot com wrote: > This is just one number. How many more could there be, how will you prove you > are correct and deduce the actual correct answer in those instances. This is > what you must answer. This is why we need a fast, simple, library that is > accurate and comes with "proof". You are ascribing too much significance to the proofs. The proofs are proofs of bounds on the accumulation of error during the calculation (round-off errors and errors arising from the approximation to the function being used in the algorithm), not of the actual implementations being correct. To go from there to the functions being correct, the proofs rely on: * The tables of values / polynomial coefficients / ... used in the approximations must have been computed correctly. This relies on another implementation (such as Maple) being correct. * The exhaustive searches for worst cases for approximation must also have used another correct implementation of the functions at higher precision. * Where the exhaustive searches haven't been able to cover the whole domain of the function, there must be no particularly bad problem cases outside the area covered. * The C code for the functions must accurately correspond to the algorithm whose error bounds are proved. Proving things directly about C code is hard in practice.
> Comment #17 From joseph@codesourcery.com 2007-06-13 11:30 [reply] ------- > >>> On Tue, 6 Feb 2007, Florent de Dinechin wrote: >>> We are the maintainers of the crlibm project, which aims at developping a >>> modern, correctly rounded version of the libm licenced under the LGPL. >>> Project homepage is https://lipforge.ens-lyon.fr/projects/crlibm/ >>> >>> I feel that this project is now achieving sufficient maturity to be a >>> candidate for gradual inclusion in the glibc. I would appreciate any feedback >>> on this question, and probably some practical advice. >>> >>The following are just my views, I don't speak for the glibc maintainers. >>First, I expect it will be necessary to follow the FSF copyright >>assignment process. While various parts of glibc at present, including > ... (many lines deleted) >>The glibc bug database <http://sources.redhat.com/bugzilla/> has 57 open >>bugs in the "math" component, mostly suspended until someone comes forward >>with patches for them. Perhaps your new implementations fix some of them? > I previously suggested to the crlibm authors that they consider assigning > it to the FSF for contribution to libgcc-math > <http://sourceware.org/ml/libc-alpha/2007-02/msg00010.html>. I don't know > if anything has happened on that since then. I guess not. When someone offers you something tell them that. ;) I did follow the libgcc-math thread. The glibc people refused the libgcc-math permission to use glibc code in libgcc-math. Numerous patches were backed out leaving little left. http://gcc.gnu.org/ml/gcc/2006-05/msg00408.html
http://gcc.gnu.org/gcc-4.3/changes.html#mpfropts
I understand that this is INVALID because all the points raised by comment #21. If crlibm is better than what we have, but we cannot use it, it is the same as if it didn't exist.
(In reply to comment #25) > I understand that this is INVALID because all the points raised by comment #21. > If crlibm is better than what we have, but we cannot use it, it is the same as > if it didn't exist. It is possible to use various other Programs (or Libraries with a bit of programming) and run them seperately on a test script to check that both programs give the same result. We could compile and run 'crlibm' (and some other programs) _without_ incorporating it's code into gcc and compare the results of the two Programs with each other using 'diff' . Doing the above is useful for _testing_ but ultimatley we should either 'fix' our code to use the default flags (so the math is not broken) _or_ use the flags that I suggested as the defaults and avoid the problem altogether. We could (but do not have to) use a different math Library. The existance of other code that works correctly proves that we could write our own code differently (properly) and have correct results instead of insisting that we can only have a wrong result and that we should accept it.