This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Detect most integer overflows.


On Tue, Nov 26, 2013 at 12:31:00PM -0500, Geert Bosch wrote:
 
> >> [...]
> >> A few things helped to make the cost small: the biggest one is that
> >> typically on of the operands is known to be negative or positive.
> >> Gigi will use Ada type information, and Natural or Positive integer
> >> variables are very common.  So, if you compute X + C with C positive, 
> >> you can write the conditional expression as:
> > 
> > On x64 efect of this analysis is small, processor does overflow detection for you.
> 
> The problem as always is that of pass ordering. If we would encapsulate
> the overflow check some kind of builtin that we'd directly very late in
> the compilation process to processor-specific instructions, then early 
> optimizers cannot do their work.
>
That is flaw in gcc design, one can make these transparent to analysis by supplying
a equivalent alternative implementation. As one can supply a
implementation for early passes like in following implementation.
static inline uint64_t 
mul_s (uint64_t x, uint64_t y)
{
  if (__builtin_constant_p (x < (1L << 32)) && x < (1L << 32) &&
      __builtin_constant_p (y < (1L << 32)) && y < (1L << 32))
    return x * y;
  if (__builtin_constant_p (x) && __builtin_constant_p (y))
    if (x != 0 && UINT64_MAX / x < y)
      return -1;
    else
      return x * y;
  return builtin_mul_s (x, y);
}
 
> > By just expanding out to regular additions, comparisons and control
> > flow we can avoid this problem.
> > 
> >> (if X < Integer'Last - C then X + C else raise Constraint_Error)
> >> 
> > 
> >> On my x86-64 this generates something like:
> >> 00	cmpl	$0x7fffffff,%de
> >> 06	je	0x0000000c
> >> 08	leal	0x01(%rdi),%eax
> >> 0b	ret
> >> [..]
> > 
> > This has redundant compare instruction that cost a cycle and 6 bytes.
> > You can just write
> > 
> >   0:	83 c7 01             	add    $0x1,%edi
> >   3:	71 03                	jno    0x8
> > [...]
> > When you know that one operand is positive or you deal with unsigned
> > then you could replace jno with jnc which is bit faster on sandy bridge
> > processors and later as add, jnc pair is macro-fused but add jno is not.
> 
> Indeed that is ideal, assuming condition flags are dead, but I think
> that the right way to do that is by late combine-like optimizations
> after instruction selection.
> 
How would do you that for multiplication?

Addition is handled by gcc almost optimally, in following function gcc
does eliminate cmp but uses cmov instead of jump both on -Os and -O3
leading to larger code size.

uint64_t
add_s (uint64_t x, uint64_t y)
{
  if (__builtin_expect(x + y < x, 0))
    return -1;
  return x + y;
}

assembly is following:

  4003c0:	48 89 f8             	mov    %rdi,%rax
  4003c3:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  4003ca:	48 01 f0             	add    %rsi,%rax
  4003cd:	48 0f 42 c2          	cmovb  %rdx,%rax
  4003d1:	c3                   	retq

btw also when I replace -1 by x * y then cmov stays which is suboptimal
given a expect hint.

> In looking at generated code in actual programs, most checks are
> optimized away. This is more important than saving a cycle here or
> there in the much smaller number of checks that remain.

That specialization is unwarranted, this holds in general case but
calculating allocation size has different use pattern. A typical
use case is passing a size by argument or reading from io and allocating
apppropriate array. 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]