Bug 48580 - missed optimization: integer overflow checks
Summary: missed optimization: integer overflow checks
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-12 18:36 UTC by Zack Weinberg
Modified: 2014-08-24 07:05 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-04-13 12:11:47


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zack Weinberg 2011-04-12 18:36:06 UTC
To the best of my knowledge, this is the only safe way (without -fwrapv) to check whether the product of two signed integers overflowed:

bool product_does_not_overflow(signed x, signed y)
{
  unsigned tmp = x * unsigned(y);

  return signed(tmp) > 0 && tmp / x == unsigned(y);
}

(I believe C and C++ are the same in this regard but I could be wrong.  If there is a better way to write this test I would love to know about it.)

g++ 4.6 produces this assembly dump on x86-64:

_Z25product_does_not_overflowii:
	movl	%esi, %edx
	xorl	%eax, %eax
	imull	%edi, %edx
	testl	%edx, %edx
	jle	.L2
	movl	%edx, %eax
	xorl	%edx, %edx
	divl	%edi
	cmpl	%eax, %esi
	sete	%al
.L2:
	rep
	ret

but, if I understand the semantics of IMUL correctly, it could do this instead:

_Z25product_does_not_overflowii:
	xorl	%eax, %eax
	imull	%edi, %esi
	setno	%al
	ret

which is a pretty substantial micro-win, particularly in getting rid of a divide.
Comment 1 joseph@codesourcery.com 2011-04-12 20:18:13 UTC
On Tue, 12 Apr 2011, zackw at panix dot com wrote:

> To the best of my knowledge, this is the only safe way (without -fwrapv) to
> check whether the product of two signed integers overflowed:
> 
> bool product_does_not_overflow(signed x, signed y)
> {
>   unsigned tmp = x * unsigned(y);
> 
>   return signed(tmp) > 0 && tmp / x == unsigned(y);
> }

Two signed integers given that they are known to be positive, anyway.  
This may return unexpected results if either or both arguments are 
negative or zero.

I sort of think GCC should have built-in functions exposing C and C++ 
interfaces for: each basic arithmetic operation, defined to wrap; each 
basic arithmetic operation, defined to saturate; each basic arithmetic 
operation, defined to have undefined overflow; each basic arithmetic 
operation, with a separate overflow flag being set; each basic arithmetic 
operation, defined to trap on overflow.  All of these for both signed and 
unsigned and for any desired number of bits (up to the maximum number 
supported for arithmetic, so generally 1-64 bits on 32-bit configurations 
and 1-128 bits on 64-bit configurations); except for the defined-to-trap 
case, all would still have undefined behavior on division by 0.  You could 
then have optimizations mapping generic C idioms to such built-in 
operations where the target supports efficient code for the operations.  
But this rather relies on the no-undefined-overflow work being finished 
first so that some of the required operations actually exist inside GCC, 
before they can easily be exposed to the user.

> which is a pretty substantial micro-win, particularly in getting rid of a
> divide.

(If the function gets called with one constant operand, you can make it 
inline and use __builtin_constant_p to replace a divide with a range check 
on the other operand.  That's only useful for some cases of overflow 
checks, of course.)
Comment 2 Zack Weinberg 2011-04-12 20:40:41 UTC
(In reply to comment #1)
> 
> Two signed integers given that they are known to be positive, anyway.  
> This may return unexpected results if either or both arguments are 
> negative or zero.
...
> (If the function gets called with one constant operand, you can make it 
> inline and use __builtin_constant_p to replace a divide with a range check 
> on the other operand.  That's only useful for some cases of overflow 
> checks, of course.)

In the code that this is cut down from, both arguments are known to be strictly positive, but neither is constant.  (They're only signed for historical reasons, I think, but it would be a huge amount of work to change that.)

> I sort of think GCC should have built-in functions exposing C and C++ 
> interfaces for: each basic arithmetic operation, defined to wrap; each 
> basic arithmetic operation, defined to saturate; each basic arithmetic 
> operation, defined to have undefined overflow; each basic arithmetic 
> operation, with a separate overflow flag being set; each basic arithmetic 
> operation, defined to trap on overflow.  All of these for both signed and 
> unsigned and for any desired number of bits (up to the maximum number 
> supported for arithmetic, so generally 1-64 bits on 32-bit configurations 
> and 1-128 bits on 64-bit configurations); except for the defined-to-trap 
> case, all would still have undefined behavior on division by 0.  You could 
> then have optimizations mapping generic C idioms to such built-in 
> operations where the target supports efficient code for the operations.  
> But this rather relies on the no-undefined-overflow work being finished 
> first so that some of the required operations actually exist inside GCC, 
> before they can easily be exposed to the user.

So you see this as more of a tree-level than an RTL-level missed optimization, then?  Your plan sounds fine to me, although I might look for a less ambitious but more likely to get done soon approach, personally.
Comment 3 joseph@codesourcery.com 2011-04-12 20:52:48 UTC
On Tue, 12 Apr 2011, zackw at panix dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580
> 
> --- Comment #2 from Zack Weinberg <zackw at panix dot com> 2011-04-12 20:40:41 UTC ---
> (In reply to comment #1)
> > 
> > Two signed integers given that they are known to be positive, anyway.  
> > This may return unexpected results if either or both arguments are 
> > negative or zero.
> ...
> > (If the function gets called with one constant operand, you can make it 
> > inline and use __builtin_constant_p to replace a divide with a range check 
> > on the other operand.  That's only useful for some cases of overflow 
> > checks, of course.)
> 
> In the code that this is cut down from, both arguments are known to be strictly
> positive, but neither is constant.  (They're only signed for historical
> reasons, I think, but it would be a huge amount of work to change that.)

My point in noting the need for the integers to be positive was really 
that unless the compiler knows they are positive, the transformation 
you're asking for appears to be incorrect - the semantics of your function 
are that a product with either term 0 counts as overflowing, but using a 
processor overflow flag would report it as not overflowing.

> So you see this as more of a tree-level than an RTL-level missed optimization,
> then?  Your plan sounds fine to me, although I might look for a less ambitious
> but more likely to get done soon approach, personally.

I think it's both levels, but probably tree-level for spotting the 
patterns unless it seems likely that expansion or RTL optimizations will 
produce more instances of them.  And tree-level is where we have VRP 
information, if you want the compiler to tell that the arguments must be 
strictly positive that way.
Comment 4 Zack Weinberg 2011-04-12 21:03:01 UTC
On Tue, Apr 12, 2011 at 1:52 PM, joseph at codesourcery dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
>> In the code that this is cut down from, both arguments are known to be strictly
>> positive, but neither is constant.  (They're only signed for historical
>> reasons, I think, but it would be a huge amount of work to change that.)
>
> My point in noting the need for the integers to be positive was really
> that unless the compiler knows they are positive, the transformation
> you're asking for appears to be incorrect - the semantics of your function
> are that a product with either term 0 counts as overflowing, but using a
> processor overflow flag would report it as not overflowing.

Well, if the compiler didn't know that, it could still use the
overflow flag plus an extra test for either input operand being zero,
couldn't it?  The C idiom has to test for a zero result, because e.g.
0x4000_0000U * 16 wraps to zero.

(The original code does in fact check for x or y  <= 0 in a place
where VRP would notice; I should have said that instead of "known to
be strictly positive", sorry for any confusion.)
Comment 5 Zack Weinberg 2011-04-12 21:04:34 UTC
Addendum: what would *you* describe as the correct C idiom for
ensuring that the product of two signed integers was positive and did
not overflow the range of a same-sized signed integer, assuming
nothing about either multiplicand?
Comment 6 joseph@codesourcery.com 2011-04-12 21:09:53 UTC
On Tue, 12 Apr 2011, zackw at panix dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580
> 
> --- Comment #4 from Zack Weinberg <zackw at panix dot com> 2011-04-12 21:03:01 UTC ---
> On Tue, Apr 12, 2011 at 1:52 PM, joseph at codesourcery dot com
> <gcc-bugzilla@gcc.gnu.org> wrote:
> >> In the code that this is cut down from, both arguments are known to be strictly
> >> positive, but neither is constant.  (They're only signed for historical
> >> reasons, I think, but it would be a huge amount of work to change that.)
> >
> > My point in noting the need for the integers to be positive was really
> > that unless the compiler knows they are positive, the transformation
> > you're asking for appears to be incorrect - the semantics of your function
> > are that a product with either term 0 counts as overflowing, but using a
> > processor overflow flag would report it as not overflowing.
> 
> Well, if the compiler didn't know that, it could still use the
> overflow flag plus an extra test for either input operand being zero,
> couldn't it?  The C idiom has to test for a zero result, because e.g.
> 0x4000_0000U * 16 wraps to zero.

Yes (a check for them being <= 0, that is; I think that function will 
report any case with negative operands as overflowing as well).
Comment 7 joseph@codesourcery.com 2011-04-12 21:16:41 UTC
On Tue, 12 Apr 2011, zackw at panix dot com wrote:

> Addendum: what would *you* describe as the correct C idiom for
> ensuring that the product of two signed integers was positive and did
> not overflow the range of a same-sized signed integer, assuming
> nothing about either multiplicand?

I think that's bordering complicated enough that any two people are likely 
to produce different enough code that it may not be worth detecting - the 
cases where generic C becomes complicated like that are the ones providing 
the best case for built-in operations where you can just say "multiply two 
(signed) values, check whether the result fits in 31-bit unsigned and set 
an overflow flag accordingly".
Comment 8 Richard Biener 2011-04-13 12:11:47 UTC
I too would see it as two pieces, pattern matching on trees to produce
a canonical builtin call and RTL expansion of that for optimal target
code generation (where I don't know whether we can do better than using
UNSPECs).  Note that usually you also want to use the result of the
multiplication (if it didn't overflow), and using just a single
multiplication might be even more complicated (if we need to go the
UNSPEC way).

For the latter reasons I think the builtins should be sth like
__builtin_smul_ovfl_p (multiplication-result, op0, op1), thus pass
in the multiplication result and keep the multiplication itself
in the IL to also allow for regular optimizations to work on them.
If the multiplication is just used as the builtin call argument
expansion can get rid of it.

The set of builtins with defined behavior is still useful, if not
only to allow mixing of wrapv/trapv/... operations in C source.
But it isn't exactly the form I'd like the operations to reside
in the IL.

Pattern-matching the multiplication overflow code can be tricky
because of the various ways the handling of zero operands can
be implemented.  Implementing this entirely on the RTL side seems
very tricky to me.
Comment 9 joseph@codesourcery.com 2011-04-13 12:45:35 UTC
On Wed, 13 Apr 2011, rguenth at gcc dot gnu.org wrote:

> For the latter reasons I think the builtins should be sth like
> __builtin_smul_ovfl_p (multiplication-result, op0, op1), thus pass
> in the multiplication result and keep the multiplication itself
> in the IL to also allow for regular optimizations to work on them.
> If the multiplication is just used as the builtin call argument
> expansion can get rid of it.

My inclination is that we probably want to separate the API for users from 
the built-in functions or other operations used internally - possibly 
providing a header gccarith.h with a supported API and saying the built-in 
functions are subject to change and should not be used directly in source 
code.  Maybe (inspired by the style used on C1X stdatomic.h) one could 
have operations like

arith_mul_signed (a, b, 32, ARITH_SAT)

that multiplies a and b (integers to infinite precision, producing an 
infinite precision result), saturates the result to 32-bit signed and 
returns an implementation-defined type (signed, two's complement, at least 
32 bits), or

arith_mul_signed_check (a, b, 32, ARITH_WRAP, &overflow_p)

that stores an overflow flag in the provided _Bool *.  Or the overflow 
handling (wrap, saturate, undefined, unspecified-value, trap) could be 
part of the macro/function name (putting it as a separate operand is 
inspired by the memory_order parameters in stdatomic.h).

The _check versions with explicit overflow flags could be used with any 
overflow handling other than "undefined behavior".  (So if you don't care 
about what the value is when the overflow flag is set - if you'll just 
call some error handler - then you'd use "unspecified-value" as the 
overflow handling.)  On divide by zero, both division and modulo 
operations would have undefined behavior except for the "trap" case - but 
modulo -1 would always be defined, unlike for C INT_MIN % -1 (I'm thinking 
of these operations as all being defined on their own by producing an 
infinite precision result first, but there might also be operations such 
as divmod producing multiple results).  It may also make sense to expose 
different variations of division/modulo (rounding to 0, -infinity or 
+infinity; a true modulo operation, rounding to -infinity, is one of those 
things C doesn't make easy; GCC has these as TRUNC_DIV_EXPR, 
FLOOR_DIV_EXPR, CEIL_DIV_EXPR and corresponding *_MOD_EXPR).  Starting 
with addition then increasing the set of operations gradually probably 
makes sense.

Along with all the arithmetic operations you get range check operations - 
take a value (in any integer type), and check whether it is in the 
signed/unsigned range with a given number of bits, wrapping/saturating 
etc. and setting overflow flags as needed.  These are just like the above 
but with one fewer operands.

The explicit number of bits (required to be an integer constant 
expression) is deliberate because it is sometimes useful e.g. to saturate 
to 16 bits, but integer promotions in C make it rather fragile to rely on 
16-bit operands remaining 16-bit rather than getting implicitly promoted 
to 32 bits.  A consequence is that operations with strange numbers of bits 
can be written without involving bit-field types (since the result is 
allowed to have more than the given number of bits).  I don't expect those 
to be very efficient in general, but note that ARM for example has SSAT 
and USAT instructions to saturate a value to a given number of bits (up to 
32) and set a flag if it was out of range.
Comment 10 Steven Fuerst 2011-04-13 17:44:22 UTC
There are C and x86 assembly code fragments showing how to do signed and unsigned saturating arithmetic here: http://locklessinc.com/articles/sat_arithmetic/

Although, if there were new intrinsics that allowed direct access to the overflow and carry flags from arithmetic instructions, they would be quite useful.
Comment 11 Joseph S. Myers 2011-06-20 09:46:59 UTC
*** Bug 49467 has been marked as a duplicate of this bug. ***
Comment 12 jules 2011-10-05 12:41:55 UTC
I don't much like the idea of using builtins for operations as fundamental as integer arithmetic. How about this as a straw-man suggestion: adding new qualifiers for "fat" integers-with-flags, somewhat in the spirit of the embedded-C fractional/saturating types? So you might have:

int x, y;

void foo (void)
{
  _Flagged int res;
  
  res = (_Flagged int) x + y;
  
  if (_Carry (res))
    printf ("sum carried\n");
  
  if (_Overflow (res))
    printf ("sum overflowed\n");
}

this avoids problems with global state, and allows for the programming style which (I vaguely get the impression that) people seem to want -- performing a "normal" integer operation, then querying for carry/overflow flags afterwards.

These types wouldn't be allowed to cross ABI boundaries (although of course you could use the magic builtins _Carry/_Overflow to extract values to pass to functions), and "_Overflow" would only be allowed for signed types. You could also have "_Borrow" for subtraction maybe (whose meaning would be the inverse of _Carry).

Signalling variants could look like, e.g.:

void bar (void)
{
  int res;
  
  res = (_Signalling _Flagged int) x + y;
}

Things would get awkward if you tried to use these new constructs in more-complicated expressions, I suppose. Anyway, just an idea.
Comment 13 jules 2011-10-05 13:05:47 UTC
Coming to think of it, if _Sat were allowed on plain integers too, a _Flagged _Sat int could also be queried for saturation using a similar mechanism, like:

int foo (_Sat int x, _Sat int y)
{
  return _Saturated ((_Sat _Flagged) x + y);
}

I'm probably getting ahead of myself :-).
Comment 14 joseph@codesourcery.com 2011-10-05 15:19:01 UTC
On Wed, 5 Oct 2011, jules at gcc dot gnu.org wrote:

> I don't much like the idea of using builtins for operations as fundamental as
> integer arithmetic. How about this as a straw-man suggestion: adding new
> qualifiers for "fat" integers-with-flags, somewhat in the spirit of the
> embedded-C fractional/saturating types? So you might have:

The trouble is that the nature of an operation is more a property of the 
operation than of the type - and the proliferation of types for what 
should be operations on normal types is much of the problem with what the 
embedded-C TR does.  You could have pragmas to say that a cumulative flag 
for a particular scope goes in a particular variable, and that normal 
operations have particular overflow semantics in that scope, maybe.
Comment 15 Martin von Gagern 2013-02-02 14:03:12 UTC
(In reply to comment #7)
> […] built-in operations where you can just say "multiply two 
> (signed) values, check whether the result fits in 31-bit unsigned and set 
> an overflow flag accordingly".

Would be easier to read, easier to maintain, and less difficult to detect. Sounds like an overall win. I'm very much in favor of builtins like these.

(In reply to comment #9)
> arith_mul_signed_check (a, b, 32, ARITH_WRAP, &overflow_p)

I'd rather make the overflow indicator the return value, and transfer the result to a location indicated via a pointer. I.e.

overflow_p = arith_mul_signed_check (a, b, 32, ARITH_WRAP, &a_times_b)

One reason is that one usually wants to check for overflow BEFORE using the result, i.e. the common use case would likely use the above as a condition of some branching construct. A second reason is that this would allow for a flavor which will not modify a_times_b in case of an overflow, which might be useful for some scenarios, particularly if the result is stored in the same location as one of the operands, thus overwriting the operand. So an unchecked a*=b could be transformed into the checked construct

overflow_p = arith_mul_signed_check (a, b, 32, ARITH_CAREFUL, &a)

(In reply to comment #14)
> The trouble is that the nature of an operation is more a property of the 
> operation than of the type

I agree. The main point of all of this is optimization. And in terms of optimization, one would want to examine some flag immediately after an operation setting that flag. One would act upon the flag, and then discard it. If the information were part of the type, one would require extra storage to save those flags, which would lead to a change in the size of these types, which in turn would severely impact many other things.
Comment 16 Jeffrey Walton 2013-02-02 17:01:55 UTC
(In reply to comment #15)
> I agree. The main point of all of this is optimization. And in terms of
> optimization, one would want to examine some flag immediately after an
> operation setting that flag. One would act upon the flag, and then discard it.
I somewhat disagree. A program must be correct; it should be secure; and it can be efficient.

I'm interested in "correct" and "secure". If a program silently overflows, its surely not correct. If an adversary can do something with the errant result, its not secure either.

What's the point in doing something wrong but doing it quickly?

Jeff
Comment 17 Martin von Gagern 2013-02-02 18:54:43 UTC
(In reply to comment #16)
> I somewhat disagree. A program must be correct; it should be secure;
> and it can be efficient. I'm interested in "correct" and "secure".
> If a program silently overflows, its surely not correct.

I'm not talking about silently ignoring overflows, quite the contrary. Always doing the correct thing leads to arbitrary size integers. Checking all (signed) arithmetic leads to -ftrapv. Checking some arithmetic might perhaps be achieved with the signalling types from comment #12, although semantics for mixed types might be problematic. The non-signalling versions will only improve things if one actually checks the additional information after the operation, which might easily be forgotten. Checking individual operations could also (and in my opinion better) be achieved with builtins, and in this case a warning could be issued if the return value indicating the overflow is ignored. Builtins might even allow using specific overflow semantics for code otherwise compiled with -ftrapv, thus increasing the usability of that flag.
Comment 18 Zack Weinberg 2013-02-02 21:59:37 UTC
I find it a little disappointing that what should have been a straightforward additional optimization has gotten totally derailed into bikeshedding of an enormous class of builtins which seem unlikely ever to gain any traction.  I just want the code in the original bug report to be optimized.  That would be enough.
Comment 19 Martin von Gagern 2013-02-02 22:08:09 UTC
Bug 49467 asked about builtins, and got duped here, so small wonder people wanting a builtin-colored bikeshed like I do end up here...
Comment 20 Marc Glisse 2013-05-19 13:03:49 UTC
(In reply to Zack Weinberg from comment #5)
> Addendum: what would *you* describe as the correct C idiom for
> ensuring that the product of two signed integers was positive and did
> not overflow the range of a same-sized signed integer, assuming
> nothing about either multiplicand?

The most natural way to do it depends on whether you have access to a wider type (and may rely on modular casts to signed integer types as guaranteed by gcc). For instance if you want a non-negative result that fits an int:

  return x * (unsigned long long)(y) <= __INT_MAX__;

or if you only care about signed overflow and not the sign of the result:

  long long l = x * (long long)(y);
  return l == (int) l;

The value of the overflow flags after a multiplication doesn't seem modeled in i386.md currently (apart from "clobbered"), so it won't be used, but the code generated is not too horrible.
Comment 21 Martin von Gagern 2014-08-24 07:05:47 UTC
(In reply to myself from comment #15)
> (In reply to comment #7)
> > […] built-in operations where you can just say "multiply two 
> > (signed) values, check whether the result fits in 31-bit unsigned and set 
> > an overflow flag accordingly".
> 
> Would be easier to read, easier to maintain, and less difficult to detect.
> Sounds like an overall win. I'm very much in favor of builtins like these.
> 
> I'd rather make the overflow indicator the return value, and transfer the
> result to a location indicated via a pointer.

Cross reference: bug #59708 is going in this direction, motivated by a precedence set by clang.

(In reply to Zack Weinberg from comment #18)
> I just want the code in the original bug report to be optimized.

I suggested duping bug #49467 there instead, in which case this issue here could return to the originally requested goal of detecting idioms without relying on builtins.