The program below shows (at all the optimization levels) a miscompilation of the remainder expression that causes INT_MIN % -1 to cause a SIGFPE on CPUs of the i386 family. #include <limits.h> #include <stdio.h> int minus_one(int n) { return (n+1)*(n-1)-n*n; } void p(int x, int y) { int z = x % y; printf("%d %% %d -> %d\n", x, y, z); } int main(int argc, char** argv) { p(INT_MIN, minus_one(argc)); } For simpler programs, the behavior depends on the ability of the optimizer to realize that the divisor is -1, in which case the compiler evaluates the remainder expression (to 0, at compile-time) and no signal is raised. Since the remainder is always defined, this violates the C standard. By the way, the Ariane 5 Flight 501 crash was caused by an unexpected exception (http://en.wikipedia.org/wiki/Ariane_5_Flight_501).
Is this specific to x86? On PowerPC (gcc 4.0.1 from Mac OS X), I get: -2147483648 % -1 -> -2147483648 Ditto with: #include <limits.h> #include <stdio.h> int main (void) { volatile int i = INT_MIN, j = -1; printf ("%d\n", i % j); return 0; }
-2147483648, this was on a G5, with gcc 4.0.1 under Mac OS X. On a G4 under Linux, with gcc 4.1.2 prerelease (Debian), I get 2147483647.
(In reply to comment #1) > Is this specific to x86? On PowerPC (gcc 4.0.1 from Mac OS X), I get: This is because the PPC ISA says for divide: If an attempt is made to perform either of the divisions -- 0x8000_0000 / -1 or <anything> / 0, then the contents of rD are undefined, as are the contents of the LT, GT, and EQ bits of the CR0 field (if Rc = 1). In this case, if OE = 1 then OV is set. The 32-bit signed remainder of diving the contents of rA by the contents of rB can be computed as follows, exept in the case that hthe constnat of ra = - 2^31 and the constants of rB = -1. divw rD, rA, rB mullw rD, rD, rB subf rD, rD, rA ---------------------- So the ISA in fact even mentions this case :).
(In reply to comment #3) > So the ISA in fact even mentions this case :). But the PPC compiler writers guide does not talk about that case, hmmm.
(In reply to comment #0) > The program below shows (at all the optimization levels) a miscompilation of > the remainder expression that causes INT_MIN % -1 to cause a SIGFPE on CPUs of > the i386 family. notice that this is language dependent. I.e. in Fortran the equivalent of the above 'INT_MIN % -1' is undefined. So, whatever the fix for C and friends, it should not slow down Fortran programs using MOD.
(In reply to comment #0) > The program below shows (at all the optimization levels) a miscompilation of > the remainder expression that causes INT_MIN % -1 to cause a SIGFPE on CPUs of > the i386 family. For the record on Linux i386, this was my suggestion for the best performing work-around (fastest at least for all cases other than INT_MIN % -1). Intercept SIGFPE, and if it comes from idivl then set the remainder to 0, and the quotient to INT_MIN (in C/C++ we are allowed to do this because INT_MIN/-1 is undefined). There are several technical problems to this suggestion: (1) To avoid interference with user assembly code that expects SIGFPE in case of INT_MIN/-1 (e.g. -ftrapv), the compiler will have to mark this idivl in some special way (e.g. add some useless prefixes, or write something in one of the ELF sections). (2) Who should intercept SIGFPE? User space or kernel? (2.1) User space is much more complicated, because it might interfere with other user set SIGFPE signal handlers. libgcc would have to chain the signal handlers. (2.2) If implemented in the kernel then it will take much more time to see this change propagate to all users. This also means that BSD,Hurd and cygwin will all have to use a different fix, each.
*** Bug 37503 has been marked as a duplicate of this bug. ***
I suggest an option such as -fdivide-checks, off by default. -std=c99 and other conformance options, for languages where INT_MIN % -1 is defined, would enable this option unless -fno-divide-checks is specified by the user. -fwrapv would enable this option unless -fno-divide-checks is specified by the user. The option would cause checks to be inserted at gimplification time or earlier: A % B would evaluate A and B for their side effects, then check whether B is -1 and if so evaluate to 0 instead of carrying out the modulo operation. If flag_wrapv is set as well, similar checks would be applied to division to catch INT_MIN / -1. If a target macro is defined that says that the implementations of the relevant RTL insn patterns will generate the desired results (0 for modulo, INT_MIN for division) without trapping, then the option would have no effect. I don't know what processors this might apply to. libgcc functions for long long division and modulo need checking. I'd guess they can be arranged to get this right unconditionally rather than needing to call different functions in the two modes.
Note libgcc functions would only need to get it right for CPUs defining the macro, if in other cases the source checks would be inserted anyway. Or the macro could depend on the mode of the division/modulo operation.
This issue was discussed on the WG14 reflector in October 2008, and the general view was that the standard should not make INT_MIN % -1 well defined (as this would impose a significant performance cost on many programs to benefit very few) and probably didn't intend to. There is still a bug for the -fwrapv case, where clearly both INT_MIN / -1 and INT_MIN % -1 should be well defined, but probably the extra checks if implemented should only be enabled implicitly for -fwrapv, not for C standards conformance modes.
(In reply to comment #10) > This issue was discussed on the WG14 reflector in October 2008, and the general > view was that the standard should not make INT_MIN % -1 well defined (as this > would impose a significant performance cost on many programs to benefit very > few) and probably didn't intend to. My opinion is that introducing an undefined behavior on a particular case like that is a bad idea: If this case can occur in some application, then the programmer would have to do a test anyway (and this would even be more costly as the test would be needed for all implementations, instead of being generated by the compiler only when needed) or the software could behave erratically (which is worse). If this case cannot occur, then the programmer should have a way to tell that to the compiler.
(In reply to Joseph S. Myers from comment #10) > There is still a bug for the -fwrapv case, where clearly both INT_MIN / -1 > and INT_MIN % -1 should be well defined, but probably the extra checks > if implemented should only be enabled implicitly for -fwrapv, not for C > standards conformance modes. I don't understand why it is still a bug for -fwrapv. Mathematically, INT_MIN % -1 gives 0; there is no wrapping for the modulo. So, -fwrapv shouldn't introduce a change.
For -fwrapv, the mathematical values of INT_MIN / -1 and INT_MIN % -1 should be reduced using modulo arithmetic, so both operations are well-defined, and there is a bug then either operation causes SIGFPE or fails to produce the correct modulo result. Without -fwrapv, INT_MIN / -1 is not representable, so "otherwise, the behavior of both a/b and a%b is undefined" (C11 and later, considered as a defect fix for older standard revisions) applies to make both operations undefined, and there is no bug. That is, there is a bug in the -fwrapv case (when the operations should give well-defined results but GCC fails to implement that), but not in the -fno-wrapv case because both operations are undefined if either one is.
Well, you could change the definition of -fwrapv in the same way that the standard has changed. I mean that the intent of making INT_MIN / -1 undefined was *only* for a performance reason (not for a mathematical reason). Normally, -fwrapv should be as fast as without it (except for some optimizations like VRP, but I would actually expect a program based on -fwrapv to be faster in general, because the programmer does not have to care about intermediate overflows). However, if INT_MIN % -1 is defined with -fwrapv, you get a performance penalty. And ditto for INT_MIN / -1.
On Mon, 23 Aug 2021, vincent-gcc at vinc17 dot net wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30484 > > --- Comment #14 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> --- > Well, you could change the definition of -fwrapv in the same way that the > standard has changed. I mean that the intent of making INT_MIN / -1 undefined > was *only* for a performance reason (not for a mathematical reason). Normally, > -fwrapv should be as fast as without it (except for some optimizations like > VRP, but I would actually expect a program based on -fwrapv to be faster in > general, because the programmer does not have to care about intermediate > overflows). On the contrary - -fwrapv allows the compiler to make less assumptions and thus usually results in slower code. Given overflow is undefined with -fno-wrapv the actual operation code generation can generate the same code as with -fwrapv so it should never be slower even on the machine instruction level.
The issue is that the source code assuming -fno-wrapv may be more complex, thus giving slower generated code. Here's an example, which consists in adding 3 signed integers, for which the user knows that the sum is representable, so that the only issue is a potential integer overflow in the first addition. I've used GCC 11.2.0 on x86_64. With -fwrapv, the integer overflow is well-defined as wrapping, so that the user can write: int f (int a, int b, int c) { return a + b + c; } The generated code with -O3 -fwrapv has 2 instructions (the 2 additions): addl %edx, %esi leal (%rsi,%rdi), %eax But without -fwrapv, one needs to make sure that one doesn't get any integer overflow. Assume that the user knows that there is a single negative number among the 3 integers, so that using this negative number in the first addition will avoid an integer overflow. So the user can write: int f (int a, int b, int c) { if (b < 0) return a + b + c; else return a + c + b; } The generated code with -O3 has 6 instructions: leal (%rdi,%rdx), %eax addl %esi, %edi addl %edx, %edi addl %esi, %eax testl %esi, %esi cmovs %edi, %eax In theory, the compiler could normally optimize to produce the same code as with the source that assumes -fwrapv (here, a + b + c and a + c + b are obviously equivalent on a typical processor), but in practice, this is often not the case as shown above.
On Tue, 24 Aug 2021, vincent-gcc at vinc17 dot net wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30484 > > --- Comment #16 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> --- > The issue is that the source code assuming -fno-wrapv may be more complex, thus > giving slower generated code. Here's an example, which consists in adding 3 > signed integers, for which the user knows that the sum is representable, so > that the only issue is a potential integer overflow in the first addition. I've > used GCC 11.2.0 on x86_64. > > With -fwrapv, the integer overflow is well-defined as wrapping, so that the > user can write: > > int f (int a, int b, int c) > { > return a + b + c; > } > > The generated code with -O3 -fwrapv has 2 instructions (the 2 additions): > > addl %edx, %esi > leal (%rsi,%rdi), %eax > > But without -fwrapv, one needs to make sure that one doesn't get any integer > overflow. Assume that the user knows that there is a single negative number > among the 3 integers, so that using this negative number in the first addition > will avoid an integer overflow. So the user can write: > > int f (int a, int b, int c) > { > if (b < 0) > return a + b + c; > else > return a + c + b; > } > > The generated code with -O3 has 6 instructions: > > leal (%rdi,%rdx), %eax > addl %esi, %edi > addl %edx, %edi > addl %esi, %eax > testl %esi, %esi > cmovs %edi, %eax True. The user could have written the following though: int f (int a, int b, int c) { return (unsigned)a + b + c; } or alternatively int f (int a, int b, int c) { return (long)a + b + c; } both of which produce optimal code. > In theory, the compiler could normally optimize to produce the same code as > with the source that assumes -fwrapv (here, a + b + c and a + c + b are > obviously equivalent on a typical processor), but in practice, this is often > not the case as shown above.
(In reply to Vincent Lefèvre from comment #16) > int f (int a, int b, int c) > { > if (b < 0) > return a + b + c; > else > return a + c + b; > } > > The generated code with -O3 has 6 instructions: > > leal (%rdi,%rdx), %eax > addl %esi, %edi > addl %edx, %edi > addl %esi, %eax > testl %esi, %esi > cmovs %edi, %eax > > In theory, the compiler could normally optimize to produce the same code as > with the source that assumes -fwrapv (here, a + b + c and a + c + b are > obviously equivalent on a typical processor), but in practice, this is often > not the case as shown above. Surprisingly, GCC can optimize this second test to 2 instructions with -fwrapv. I've reported PR102032 about the missed optimization without -fwrapv.
(In reply to rguenther@suse.de from comment #17) > True. The user could have written the following though: > > int f (int a, int b, int c) > { > return (unsigned)a + b + c; > } This code is incorrect, as based on an implementation-defined behavior. > or alternatively > > int f (int a, int b, int c) > { > return (long)a + b + c; > } This code is incorrect, and it may yield an integer overflow when long = int, e.g. on 32-bit processors.
Re-confirmed.
operation_could_trap_helper_p is also wrong in only checking for integer_zerop on the divisor though a literal x / -1 could be optimized to x * -1 and thus safely rewritten to unsigned arithmetic. Likewise RTLs may_trap_p_1 is wrong in the same way (that can also look at the first operand though). We might also want to have a SDIV_TRAPS_ON_OVERFLOW (mode) target hook which is half of the work to fix this very bug at RTL expansion time (not sure if we want to diverge earlier).
Mine.
RTL documents no specific behavior for 'div' but says 'ss_div' "ensures that an out-of-bounds result saturates to the maximum or minimum signed value". I'm not sure we want target specific semantics for the RTL IL, but since x86 at least has a valid 'div' for the case overflow is undefined we probably have to assume div by minus one might trap. Practically we already assume that give we simplify division by -1 to negate and have to assume that division by a non-constant amount traps since it could be a division by zero. That means practically speaking the -fwrapv issue remains. Since RTL doesn't document any specific behavior we have to generally expand signed division differently. Using b == -1 ? -a : a / b will for example generate cmpl $-1, %esi je .L5 cltd idivl %esi ... .L5: negl %eax ... We'd expect the following to pass. I wonder how many targets FAIL it, thus whether we can require that RTL sdiv has the desired overflow behavior for -fwrapv (it always appeared to me that RTL behaves as if -fwrapv). It works fine with the libgcc implementation for int128. /* { dg-additional-options "-fwrapv" } */ #define TYPE int #define UTYPE unsigned //#define TYPE __int128_t //#define UTYPE __uint128_t TYPE __attribute__((noipa)) div (TYPE a, TYPE b) { return a / b; } TYPE __attribute__((noipa)) neg (TYPE a) { return -a; } int main() { TYPE min = (TYPE)((UTYPE)1 << (sizeof(TYPE)*8-1)); if (div (min, -1) != min || neg (min) != min) __builtin_abort (); return 0; }
So in case we declare this a bug in the backend and we'd really want to check flag_wrapv in the machine description we could widen + truncate or conditionalize the minus one divisor. There are quite some division patterns though.