Created attachment 40699 [details] test case In some very specific code paths with specific optimization compiler generates VNEG (negate) + VFMA (multiply and accumulate) instead of VSUB + VMUL. Most likely this is not a problem in most cases, but in the test case I attach this leads to inexact results in a well-defined code. The variable "distance", which is basically a length of a difference of two _IDENTICAL_ vectors multiplied by some constant factor, is expected to be 0. "x - x" for components of each vector gives zero. "0 * 0" gives zero. "0 + 0" gives zero. "sqrtf(0)" gives zero. "0 * x" gives 0. These are basically all the operations in the example. However the value of "distance" is calculated to be 1.34925369e-06 and the assertion fails. Several notes: 1. The code seems complicated, but if I simplify it, different sequence of instructions is generated, and the VNEG + VFMA used instead of VSUB + VMUL is essential to the problem. To generate slightly different sequence it's enough to uncomment the assertion that checks equality of vectors. 2. The problem appears on -O2, -O3 and -Os. It does not appear on -O1 and -Og (probably neither with -O0, but I did not check). 3. The problem can be observed for GCC 6.3.0 and GCC 5.3.1. With or without ARM patches. Exact compilation command: arm-none-eabi-g++ -Wall -Wextra -Wshadow -std=gnu++11 -g -ggdb3 -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O2 -ffunction-sections -fdata-sections -fno-rtti -fno-exceptions -c vfma.cpp -o vfma.o I also attach assembly output of the "invalid" version (-O2) and "valid" version (-O1).
Created attachment 40700 [details] assembly dump of invalid version
Created attachment 40701 [details] assembly dump of valid version
https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Optimize-Options.html#index-ffp-contract-715 -ffp-contract=style -ffp-contract=off disables floating-point expression contraction. -ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them. -ffp-contract=on enables floating-point expression contraction if allowed by the language standard. This is currently not implemented and treated equal to -ffp-contract=off. The default is -ffp-contract=fast. In this case what is happening is the following: we get: a * b - a * b; Which gets transformed into: c = a*b c1 = -c d = a*b + (c1) Which is correct but not what you are expecting. See above above about -ffp-contract=.
Hello Andrew and thanks for your answer. Generally I don't care about the sequence of operations or the exact instructions that get generated, but in this case this default behaviour produces invalid results. Generally the whole calculations are like this: mx = ex - sx; my = ey - sy; distance = sqrtf(mx * mx + my * my) * constant; The important thing here is that ex and sx are bitwise identical, just as ey and sy. Thus everything above can be transformed to: mx = x - x; my = y - y; distance = sqrtf(mx * mx + my * my) * constant; and then: mx = 0; my = 0; distance = sqrtf(0 * 0 + 0 * 0) * constant; However you rearrange that, the expected result is 0 and I see no place for "typical" floating point inaccuracies here. Let me reiterate - "startVector" and "endVector" in my test case are bitwise identical. Yet the code produces 1.34925369e-06 at the end... The same code compiled at -O2 for x86-64 does not assert. I don't know x86-64 assembly, but I'm pretty sure that it supports this kind of instructions. If the results of VFMA are considered "good enough" I think that the default value of -ffp-contract should be changed to "off" - after all -funsafe-math-optimizations or -ffast-math are not enabled by default either. BTW - VFMA is used also with "-std=c++11". Thus I think that the bug is not invalid and I ask you to reconsider. Thanks in advance!
"this default behaviour produces invalid results." No. Read http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6768 > The same code compiled at -O2 for x86-64 does not assert. That because the default x86_64 target does not have the fused multiple add instruction. On a newer Intel (or AMD) machine, add -march=naitve and you will see the same behavior. You are not understand what fused multiple add instruction does and how floating point works. VFMA is not just multiply and accumulate but rather fused multiply and add. The multiplication is done in infinite precision and then the add is done in that same infinite precision and then a round happens down to double. Instead of what happens with -ffp-contract=off which is not to used the fused multiple add instruction which means round between the multiply and addition.
> On a newer Intel (or AMD) machine, add -march=naitve and you will > see the same behavior. You are right, adding that switch causes the assert to appear... > VFMA is not just multiply and accumulate but rather fused multiply and add. > The multiplication is done in infinite precision and then the add is done > in that same infinite precision and then a round happens down to double. > Instead of what happens with -ffp-contract=off which is not to used the > fused multiple add instruction which means round between the multiply and > addition. I've read that info already, but in my (limited) understanding subtracting two identical numbers still gives 0 at infinite precision, no matter how you round the result ); But I get your point - somehow after re-arranging the whole sequence, the result got inexact (from my point of view). Any advice how to handle this problem in this particular code? I'd prefer not to just set "-ffp-contract=off" for my whole project, as the toolchain libraries are compiled with contracting enabled anyway (I see A LOT of VFMA in functions like sinf() or cosf()), so that "solution" would be partial at best. Is there any "generic" way to write code that would allow the result to be "correct", possibly by also allowing optimizations which don't change accuracy?
double __attribute__((optimize("fp-contract=off"))) x (double a, double b, double c) { return a*b + c; } You might also need to mark the function as no-inline to prevent it being inlined into functions where the default still applies.