This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC] Detect most integer overflows.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Geert Bosch <bosch at adacore dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, Hannes Frederic Sowa <hannes at stressinduktion dot org>, "gcc at gnu dot org" <gcc at gnu dot org>
- Date: Wed, 27 Nov 2013 14:18:02 +0100
- Subject: Re: [RFC] Detect most integer overflows.
- Authentication-results: sourceware.org; auth=none
- References: <20131026192912 dot GA25428 at domone dot podge> <20131026235014 dot GF18009 at order dot stressinduktion dot org> <CAFiYyc0+wTbE1FwwLscquWvoEtM6JQw4p5qhnhBmGtVCMkx9fQ at mail dot gmail dot com> <45A6E575-8411-48BC-9353-44C5B817082A at adacore dot com> <20131109074834 dot GA6601 at domone> <DA52BBFE-C894-4DAE-917B-B97A68A7602C at adacore dot com>
On Tue, Nov 26, 2013 at 12:31:00PM -0500, Geert Bosch wrote:
> >> [...]
> >> A few things helped to make the cost small: the biggest one is that
> >> typically on of the operands is known to be negative or positive.
> >> Gigi will use Ada type information, and Natural or Positive integer
> >> variables are very common. So, if you compute X + C with C positive,
> >> you can write the conditional expression as:
> >
> > On x64 efect of this analysis is small, processor does overflow detection for you.
>
> The problem as always is that of pass ordering. If we would encapsulate
> the overflow check some kind of builtin that we'd directly very late in
> the compilation process to processor-specific instructions, then early
> optimizers cannot do their work.
>
That is flaw in gcc design, one can make these transparent to analysis by supplying
a equivalent alternative implementation. As one can supply a
implementation for early passes like in following implementation.
static inline uint64_t
mul_s (uint64_t x, uint64_t y)
{
if (__builtin_constant_p (x < (1L << 32)) && x < (1L << 32) &&
__builtin_constant_p (y < (1L << 32)) && y < (1L << 32))
return x * y;
if (__builtin_constant_p (x) && __builtin_constant_p (y))
if (x != 0 && UINT64_MAX / x < y)
return -1;
else
return x * y;
return builtin_mul_s (x, y);
}
> > By just expanding out to regular additions, comparisons and control
> > flow we can avoid this problem.
> >
> >> (if X < Integer'Last - C then X + C else raise Constraint_Error)
> >>
> >
> >> On my x86-64 this generates something like:
> >> 00 cmpl $0x7fffffff,%de
> >> 06 je 0x0000000c
> >> 08 leal 0x01(%rdi),%eax
> >> 0b ret
> >> [..]
> >
> > This has redundant compare instruction that cost a cycle and 6 bytes.
> > You can just write
> >
> > 0: 83 c7 01 add $0x1,%edi
> > 3: 71 03 jno 0x8
> > [...]
> > When you know that one operand is positive or you deal with unsigned
> > then you could replace jno with jnc which is bit faster on sandy bridge
> > processors and later as add, jnc pair is macro-fused but add jno is not.
>
> Indeed that is ideal, assuming condition flags are dead, but I think
> that the right way to do that is by late combine-like optimizations
> after instruction selection.
>
How would do you that for multiplication?
Addition is handled by gcc almost optimally, in following function gcc
does eliminate cmp but uses cmov instead of jump both on -Os and -O3
leading to larger code size.
uint64_t
add_s (uint64_t x, uint64_t y)
{
if (__builtin_expect(x + y < x, 0))
return -1;
return x + y;
}
assembly is following:
4003c0: 48 89 f8 mov %rdi,%rax
4003c3: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
4003ca: 48 01 f0 add %rsi,%rax
4003cd: 48 0f 42 c2 cmovb %rdx,%rax
4003d1: c3 retq
btw also when I replace -1 by x * y then cmov stays which is suboptimal
given a expect hint.
> In looking at generated code in actual programs, most checks are
> optimized away. This is more important than saving a cycle here or
> there in the much smaller number of checks that remain.
That specialization is unwarranted, this holds in general case but
calculating allocation size has different use pattern. A typical
use case is passing a size by argument or reading from io and allocating
apppropriate array.