The following testcase fails on s390x and Power. Constant folding and runtime execution of a division of complex numbers produce different results. The testcase works fine on x86 so it looks like S/390 and Power do something different here. It looks somewhat like: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30789 #include <complex.h> #include <stdio.h> #include <stdlib.h> _Complex float __attribute__ ((noinline)) calc (_Complex float a, _Complex float b) { return a / b; } int main (int argc, char **argv) { _Complex float a = calc (10 + 6 * I, 5 + 12 * I); _Complex float b = (10 + 6 * I) / (5 + 12 * I); printf ("%ap + %ap * i\n", creal (a), cimag (a)); printf ("%ap + %ap * i\n", creal (b), cimag (b)); if (a != b) abort (); return 0; } gcc -O0 t.c -o f ./f 0x1.719c08p-1p + -0x1.10a9a8p-1p * i 0x1.719c06p-1p + -0x1.10a9a8p-1p * i Aborted
There are no specified accuracy requirements for complex multiplication / division, even under Annex G (parts of which - imaginary types in particular - are not implemented in GCC at present), beyond certain requirements for special cases and avoiding certain cases of overflow from the simplest formulas. Constant folding is correctly rounding, runtime complex multiplication / division isn't (and given complaints about slowness at present, I don't think users would want it to be even slower, though there may well be a case for defining a standard library interface for correctly rounding complex multiplication / division). x86 probably benefits from excess precision being used implicitly by GCC when compiling the implementations of complex multiplication and division.
I think that it needs to be decided on a case-by-case basis whether the runtime complex division routine is "precise enough". But yes, you generally cannot expect constant folding and runtime execution to produce the exact same result. This is FP after all ... (I would expect it for operations that are specified to be rounded correctly to 0.5ulp precision though) Note that the goal we have instead is that cross-compiling from all hosts produces the same constant folding results for a target (code generation doesn't depend on the host).
I'll keep the bugreport open with low prio. If I find the time I will at least try to understand what's going on before closing it. The testcase is extracted from gcc/testsuite/go.test/test/ken/cplx2.go which fails due to this problem currently on s390.
Could this because of the use of fma for s390 and PPC inside the division code?
*** Bug 63172 has been marked as a duplicate of this bug. ***
If the last comment is true, does that mean the fold_const.c file in gcc should be built in a way so that it doesn't use the fma, like using some kind of option during the build of gcc at least for this file on the s390 and Power?
confirmed.
(In reply to boger from comment #6) > If the last comment is true, does that mean the fold_const.c file in gcc > should be built in a way so that it doesn't use the fma, like using some > kind of option during the build of gcc at least for this file on the s390 > and Power? No it is fma usage in the runtime sources and not in the fold-const case.
Here is another example: $ cat t.c #include <complex.h> #include <stdio.h> #include <stdlib.h> _Complex double __attribute__ ((noinline)) calc_div (_Complex double a, _Complex double b) { return a / b; } _Complex double __attribute__ ((noinline)) calc_mul (_Complex double a, _Complex double b) { return a * b; } int main (int argc, char **argv) { double x, y, z, t; _Complex double a, b, c, d; x = -1.0689346237980500e+252; y = 7.8846096489686452e-145; z = 5.1152592576527620e-318; t = 6.7327111521288819e+78; a = calc_div (x + y * I, z + t * I); b = (x + y * I) / (z + t * I); printf ("calc_div: %ap + %ap * i\n", creal (a), cimag (a)); printf ("div: %ap + %ap * i\n", creal (b), cimag (b)); x = -4.8798814420289904e-22; y = -6.4569205261627488e+209; z = -1.0918976936190148e+147; t = -8.2942999263695497e-85; c = calc_mul (x + y * I, z + t * I); d = (x + y * I) * (z + t * I); printf ("calc_mul: %ap + %ap * i\n", creal (c), cimag (c)); printf ("mul: %ap + %ap * i\n", creal (d), cimag (d)); return 0; } With gcc version 9.2.1 20200123 (Debian 9.2.1-25) on x86_64 I get: $ gcc -O0 t.c; ./a.out calc_div: 0x1.5ac8471ecdabp-741p + 0x1.48aa457eda0eep+575p * i div: 0x1.5ac8471ecdabp-741p + 0x1.48aa457eda0eep+575p * i calc_mul: -0x1.07a6046b053p+410p + infp * i mul: -0x1.07a6046b053p+410p + infp * i $ gcc -O2 t.c; ./a.out calc_div: 0x1.5ac8471ecdabp-741p + 0x1.48aa457eda0eep+575p * i div: -0x1.4d35f7c831ef5p-746p + 0x1.48aa457eda0eep+575p * i calc_mul: -0x1.07a6046b053p+410p + infp * i mul: -0x1.07a6046b053p+410p + infp * i For division, the correctly rounded result is the one for "div" with -O2 (negative real part). For multiplication, all results are incorrectly rounded, we should get -0x1.07a6046b05347p410 if I'm correct. I understand there is no accuracy requirement, but I thought with -O2 gcc was using GNU MPC to get correct rounding, thus with -O2 both the "div" and "mul" results should be correctly rounded. Why isn't it the case for mul?
Flags like -ftrapping-math can prevent gcc from folding at compile-time when the result is infinite (or maybe it always refuses to fold in that case). In your example, gcc generates a runtime call to __muldc3 in main. The behavior with -fno-trapping-math is a bit surprising. CCP folds the division but not the multiplication (not even with -ffast-math)). CPLXLOWER turns it into __muldc3, and DOM2 folds that to a (bad?) constant. So in some sense, when gcc fails to fold to a constant, then we go to the runtime behavior, and gcc can still fold that one to a constant (not as good as what it could get in the first round).