Bug 30484 - INT_MIN % -1 is well defined for -fwrapv but traps on x86
Summary: INT_MIN % -1 is well defined for -fwrapv but traps on x86
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.1
: P3 normal
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: wrong-code
: 37503 (view as bug list)
Depends on:
Blocks: 22200
  Show dependency treegraph
 
Reported: 2007-01-16 15:19 UTC by bagnara
Modified: 2023-10-27 17:55 UTC (History)
9 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-09-18 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bagnara 2007-01-16 15:19:39 UTC
The program below shows (at all the optimization levels) a miscompilation of the remainder expression that causes INT_MIN % -1 to cause a SIGFPE on CPUs of the i386 family.

#include <limits.h>
#include <stdio.h>

int minus_one(int n) {
  return (n+1)*(n-1)-n*n;
}

void p(int x, int y) {
  int z = x % y;
  printf("%d %% %d -> %d\n", x, y, z);
}

int main(int argc, char** argv) {
  p(INT_MIN, minus_one(argc));
}

For simpler programs, the behavior depends on the ability of the optimizer to realize that the divisor is -1, in which case the compiler evaluates the remainder expression (to 0, at compile-time) and no signal is raised.

Since the remainder is always defined, this violates the C standard.

By the way, the Ariane 5 Flight 501 crash was caused by an unexpected exception (http://en.wikipedia.org/wiki/Ariane_5_Flight_501).
Comment 1 Vincent Lefèvre 2007-01-16 22:03:59 UTC
Is this specific to x86? On PowerPC (gcc 4.0.1 from Mac OS X), I get:

-2147483648 % -1 -> -2147483648

Ditto with:

#include <limits.h>
#include <stdio.h>

int main (void)
{
  volatile int i = INT_MIN, j = -1;
  printf ("%d\n", i % j);
  return 0;
}
Comment 2 Vincent Lefèvre 2007-01-16 22:10:37 UTC
-2147483648, this was on a G5, with gcc 4.0.1 under Mac OS X. On a G4 under Linux, with gcc 4.1.2 prerelease (Debian), I get 2147483647.
Comment 3 Andrew Pinski 2007-01-16 22:11:28 UTC
(In reply to comment #1)
> Is this specific to x86? On PowerPC (gcc 4.0.1 from Mac OS X), I get:

This is because the PPC ISA says for divide:
If an attempt is made to perform either of the divisions -- 0x8000_0000 / -1 or <anything> / 0, then the contents of rD are undefined, as are the contents of the LT, GT, and EQ bits of the CR0 field (if Rc = 1).  In this case, if OE = 1 then OV is set.

The 32-bit signed remainder of diving the contents of rA by the contents of rB can be computed as follows, exept in the case that hthe constnat of ra = - 2^31 and the constants of rB = -1.
divw rD, rA, rB
mullw rD, rD, rB
subf rD, rD, rA

----------------------
So the ISA in fact even mentions this case :).
Comment 4 Andrew Pinski 2007-01-16 22:15:46 UTC
(In reply to comment #3)
> So the ISA in fact even mentions this case :).
But the PPC compiler writers guide does not talk about that case, hmmm.
Comment 5 Joost VandeVondele 2007-01-17 07:14:43 UTC
(In reply to comment #0)
> The program below shows (at all the optimization levels) a miscompilation of
> the remainder expression that causes INT_MIN % -1 to cause a SIGFPE on CPUs of
> the i386 family.

notice that this is language dependent. I.e. in Fortran the equivalent of the above 'INT_MIN % -1' is undefined. So, whatever the fix for C and friends, it should not slow down Fortran programs using MOD.
Comment 6 Michael Veksler 2007-01-17 08:49:10 UTC
(In reply to comment #0)
> The program below shows (at all the optimization levels) a miscompilation of
> the remainder expression that causes INT_MIN % -1 to cause a SIGFPE on CPUs of
> the i386 family.
For the record on Linux i386, this was my suggestion for the best performing
work-around (fastest at least for all cases other than INT_MIN % -1).
Intercept SIGFPE, and if it comes from
  idivl
then set the remainder to 0, and the quotient to INT_MIN (in C/C++ we are
allowed to do this because INT_MIN/-1 is undefined).

There are several technical problems to this suggestion:

(1)
To avoid interference with user assembly code that expects SIGFPE in case of
INT_MIN/-1 (e.g. -ftrapv), the compiler will have to mark this 
  idivl 
in some special way (e.g. add some useless prefixes, or write something
in one of the ELF sections).

(2)
Who should intercept SIGFPE? User space or kernel?

(2.1)
User space is much more complicated, because it might interfere with
other user set SIGFPE signal handlers. libgcc would have to chain
the signal handlers.

(2.2)
If implemented in the kernel then it will take much more time to see this
change propagate to all users. This also means that BSD,Hurd and cygwin 
will all have to use a different fix, each.
Comment 7 nightstrike 2008-09-12 20:32:57 UTC
*** Bug 37503 has been marked as a duplicate of this bug. ***
Comment 8 Joseph S. Myers 2008-09-12 20:39:32 UTC
I suggest an option such as -fdivide-checks, off by default.  -std=c99 and
other conformance options, for languages where INT_MIN % -1 is defined, would
enable this option unless -fno-divide-checks is specified by the user.  -fwrapv
would enable this option unless -fno-divide-checks is specified by the user.

The option would cause checks to be inserted at gimplification time or earlier:
A % B would evaluate A and B for their side effects, then check whether B is -1
and if so evaluate to 0 instead of carrying out the modulo operation.  If
flag_wrapv is set as well, similar checks would be applied to division to catch
INT_MIN / -1.

If a target macro is defined that says that the implementations of the relevant
RTL insn patterns will generate the desired results (0 for modulo, INT_MIN
for division) without trapping, then the option would have no effect.  I don't
know what processors this might apply to.

libgcc functions for long long division and modulo need checking.  I'd guess
they can be arranged to get this right unconditionally rather than needing to
call different functions in the two modes.
Comment 9 Joseph S. Myers 2008-09-12 20:41:52 UTC
Note libgcc functions would only need to get it right for CPUs defining the
macro, if in other cases the source checks would be inserted anyway.  Or the
macro could depend on the mode of the division/modulo operation.
Comment 10 Joseph S. Myers 2009-02-21 18:45:39 UTC
This issue was discussed on the WG14 reflector in October 2008, and the general
view was that the standard should not make INT_MIN % -1 well defined (as this
would impose a significant performance cost on many programs to benefit very
few) and probably didn't intend to.

There is still a bug for the -fwrapv case, where clearly both INT_MIN / -1
and INT_MIN % -1 should be well defined, but probably the extra checks
if implemented should only be enabled implicitly for -fwrapv, not for C
standards conformance modes.
Comment 11 Vincent Lefèvre 2010-02-19 13:08:09 UTC
(In reply to comment #10)
> This issue was discussed on the WG14 reflector in October 2008, and the general
> view was that the standard should not make INT_MIN % -1 well defined (as this
> would impose a significant performance cost on many programs to benefit very
> few) and probably didn't intend to.

My opinion is that introducing an undefined behavior on a particular case like that is a bad idea: If this case can occur in some application, then the programmer would have to do a test anyway (and this would even be more costly as the test would be needed for all implementations, instead of being generated by the compiler only when needed) or the software could behave erratically (which is worse). If this case cannot occur, then the programmer should have a way to tell that to the compiler.
Comment 12 Vincent Lefèvre 2021-08-22 23:54:21 UTC
(In reply to Joseph S. Myers from comment #10)
> There is still a bug for the -fwrapv case, where clearly both INT_MIN / -1
> and INT_MIN % -1 should be well defined, but probably the extra checks
> if implemented should only be enabled implicitly for -fwrapv, not for C
> standards conformance modes.

I don't understand why it is still a bug for -fwrapv. Mathematically, INT_MIN % -1 gives 0; there is no wrapping for the modulo. So, -fwrapv shouldn't introduce a change.
Comment 13 jsm-csl@polyomino.org.uk 2021-08-23 19:44:34 UTC
For -fwrapv, the mathematical values of INT_MIN / -1 and INT_MIN % -1 
should be reduced using modulo arithmetic, so both operations are 
well-defined, and there is a bug then either operation causes SIGFPE or 
fails to produce the correct modulo result.

Without -fwrapv, INT_MIN / -1 is not representable, so "otherwise, the 
behavior of both a/b and a%b is undefined" (C11 and later, considered as a 
defect fix for older standard revisions) applies to make both operations 
undefined, and there is no bug.

That is, there is a bug in the -fwrapv case (when the operations should 
give well-defined results but GCC fails to implement that), but not in the 
-fno-wrapv case because both operations are undefined if either one is.
Comment 14 Vincent Lefèvre 2021-08-23 21:06:58 UTC
Well, you could change the definition of -fwrapv in the same way that the standard has changed. I mean that the intent of making INT_MIN / -1 undefined was *only* for a performance reason (not for a mathematical reason). Normally, -fwrapv should be as fast as without it (except for some optimizations like VRP, but I would actually expect a program based on -fwrapv to be faster in general, because the programmer does not have to care about intermediate overflows). However, if INT_MIN % -1 is defined with -fwrapv, you get a performance penalty. And ditto for INT_MIN / -1.
Comment 15 rguenther@suse.de 2021-08-24 06:50:04 UTC
On Mon, 23 Aug 2021, vincent-gcc at vinc17 dot net wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30484
> 
> --- Comment #14 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> ---
> Well, you could change the definition of -fwrapv in the same way that the
> standard has changed. I mean that the intent of making INT_MIN / -1 undefined
> was *only* for a performance reason (not for a mathematical reason). Normally,
> -fwrapv should be as fast as without it (except for some optimizations like
> VRP, but I would actually expect a program based on -fwrapv to be faster in
> general, because the programmer does not have to care about intermediate
> overflows).

On the contrary - -fwrapv allows the compiler to make less assumptions
and thus usually results in slower code.  Given overflow is undefined
with -fno-wrapv the actual operation code generation can generate the
same code as with -fwrapv so it should never be slower even on the
machine instruction level.
Comment 16 Vincent Lefèvre 2021-08-24 08:33:07 UTC
The issue is that the source code assuming -fno-wrapv may be more complex, thus giving slower generated code. Here's an example, which consists in adding 3 signed integers, for which the user knows that the sum is representable, so that the only issue is a potential integer overflow in the first addition. I've used GCC 11.2.0 on x86_64.

With -fwrapv, the integer overflow is well-defined as wrapping, so that the user can write:

int f (int a, int b, int c)
{
  return a + b + c;
}

The generated code with -O3 -fwrapv has 2 instructions (the 2 additions):

        addl    %edx, %esi
        leal    (%rsi,%rdi), %eax

But without -fwrapv, one needs to make sure that one doesn't get any integer overflow. Assume that the user knows that there is a single negative number among the 3 integers, so that using this negative number in the first addition will avoid an integer overflow. So the user can write:

int f (int a, int b, int c)
{
  if (b < 0)
    return a + b + c;
  else
    return a + c + b;
}

The generated code with -O3 has 6 instructions:

        leal    (%rdi,%rdx), %eax
        addl    %esi, %edi
        addl    %edx, %edi
        addl    %esi, %eax
        testl   %esi, %esi
        cmovs   %edi, %eax

In theory, the compiler could normally optimize to produce the same code as with the source that assumes -fwrapv (here, a + b + c and a + c + b are obviously equivalent on a typical processor), but in practice, this is often not the case as shown above.
Comment 17 rguenther@suse.de 2021-08-24 08:38:07 UTC
On Tue, 24 Aug 2021, vincent-gcc at vinc17 dot net wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30484
> 
> --- Comment #16 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> ---
> The issue is that the source code assuming -fno-wrapv may be more complex, thus
> giving slower generated code. Here's an example, which consists in adding 3
> signed integers, for which the user knows that the sum is representable, so
> that the only issue is a potential integer overflow in the first addition. I've
> used GCC 11.2.0 on x86_64.
> 
> With -fwrapv, the integer overflow is well-defined as wrapping, so that the
> user can write:
> 
> int f (int a, int b, int c)
> {
>   return a + b + c;
> }
> 
> The generated code with -O3 -fwrapv has 2 instructions (the 2 additions):
> 
>         addl    %edx, %esi
>         leal    (%rsi,%rdi), %eax
> 
> But without -fwrapv, one needs to make sure that one doesn't get any integer
> overflow. Assume that the user knows that there is a single negative number
> among the 3 integers, so that using this negative number in the first addition
> will avoid an integer overflow. So the user can write:
> 
> int f (int a, int b, int c)
> {
>   if (b < 0)
>     return a + b + c;
>   else
>     return a + c + b;
> }
> 
> The generated code with -O3 has 6 instructions:
> 
>         leal    (%rdi,%rdx), %eax
>         addl    %esi, %edi
>         addl    %edx, %edi
>         addl    %esi, %eax
>         testl   %esi, %esi
>         cmovs   %edi, %eax

True.  The user could have written the following though:

int f (int a, int b, int c)
{
  return (unsigned)a + b + c;
}

or alternatively

int f (int a, int b, int c)
{ 
  return (long)a + b + c;
} 

both of which produce optimal code.

> In theory, the compiler could normally optimize to produce the same code as
> with the source that assumes -fwrapv (here, a + b + c and a + c + b are
> obviously equivalent on a typical processor), but in practice, this is often
> not the case as shown above.
Comment 18 Vincent Lefèvre 2021-08-24 08:47:01 UTC
(In reply to Vincent Lefèvre from comment #16)
> int f (int a, int b, int c)
> {
>   if (b < 0)
>     return a + b + c;
>   else
>     return a + c + b;
> }
> 
> The generated code with -O3 has 6 instructions:
> 
>         leal    (%rdi,%rdx), %eax
>         addl    %esi, %edi
>         addl    %edx, %edi
>         addl    %esi, %eax
>         testl   %esi, %esi
>         cmovs   %edi, %eax
> 
> In theory, the compiler could normally optimize to produce the same code as
> with the source that assumes -fwrapv (here, a + b + c and a + c + b are
> obviously equivalent on a typical processor), but in practice, this is often
> not the case as shown above.

Surprisingly, GCC can optimize this second test to 2 instructions with -fwrapv. I've reported PR102032 about the missed optimization without -fwrapv.
Comment 19 Vincent Lefèvre 2021-08-24 08:49:51 UTC
(In reply to rguenther@suse.de from comment #17)
> True.  The user could have written the following though:
> 
> int f (int a, int b, int c)
> {
>   return (unsigned)a + b + c;
> }

This code is incorrect, as based on an implementation-defined behavior.

> or alternatively
> 
> int f (int a, int b, int c)
> { 
>   return (long)a + b + c;
> } 

This code is incorrect, and it may yield an integer overflow when long = int, e.g. on 32-bit processors.
Comment 20 Richard Biener 2023-09-18 07:36:01 UTC
Re-confirmed.
Comment 21 Richard Biener 2023-09-18 08:47:14 UTC
operation_could_trap_helper_p is also wrong in only checking for integer_zerop on the divisor though a literal x / -1 could be optimized to x * -1 and thus
safely rewritten to unsigned arithmetic.  Likewise RTLs may_trap_p_1 is wrong
in the same way (that can also look at the first operand though).  We might
also want to have a SDIV_TRAPS_ON_OVERFLOW (mode) target hook which is half
of the work to fix this very bug at RTL expansion time (not sure if we want
to diverge earlier).
Comment 22 Richard Biener 2023-09-18 08:47:32 UTC
Mine.
Comment 23 Richard Biener 2023-09-19 12:21:34 UTC
RTL documents no specific behavior for 'div' but says 'ss_div' "ensures that an out-of-bounds result saturates to the maximum or minimum signed value".

I'm not sure we want target specific semantics for the RTL IL, but since
x86 at least has a valid 'div' for the case overflow is undefined we probably
have to assume div by minus one might trap.  Practically we already assume
that give we simplify division by -1 to negate and have to assume that
division by a non-constant amount traps since it could be a division by zero.

That means practically speaking the -fwrapv issue remains.  Since RTL
doesn't document any specific behavior we have to generally expand signed
division differently.

Using

  b == -1 ? -a : a / b

will for example generate

        cmpl    $-1, %esi
        je      .L5
        cltd
        idivl   %esi
...

.L5:
        negl    %eax
...


We'd expect the following to pass.  I wonder how many targets FAIL it, thus
whether we can require that RTL sdiv has the desired overflow behavior for
-fwrapv (it always appeared to me that RTL behaves as if -fwrapv).  It
works fine with the libgcc implementation for int128.

/* { dg-additional-options "-fwrapv" } */

#define TYPE int
#define UTYPE unsigned

//#define TYPE __int128_t
//#define UTYPE __uint128_t

TYPE __attribute__((noipa))
div (TYPE a, TYPE b)
{
  return a / b;
}

TYPE __attribute__((noipa))
neg (TYPE a)
{
  return -a;
}

int main()
{
  TYPE min = (TYPE)((UTYPE)1 << (sizeof(TYPE)*8-1));
  if (div (min, -1) != min
      || neg (min) != min)
    __builtin_abort ();
  return 0;
}
Comment 24 Richard Biener 2023-09-19 12:28:02 UTC
So in case we declare this a bug in the backend and we'd really want to check
flag_wrapv in the machine description we could widen + truncate or
conditionalize the minus one divisor.  There are quite some division patterns
though.