This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Detect most integer overflows.

On Tue, Nov 26, 2013 at 12:31:00PM -0500, Geert Bosch wrote:
> >> [...]
> >> A few things helped to make the cost small: the biggest one is that
> >> typically on of the operands is known to be negative or positive.
> >> Gigi will use Ada type information, and Natural or Positive integer
> >> variables are very common.  So, if you compute X + C with C positive, 
> >> you can write the conditional expression as:
> > 
> > On x64 efect of this analysis is small, processor does overflow detection for you.
> The problem as always is that of pass ordering. If we would encapsulate
> the overflow check some kind of builtin that we'd directly very late in
> the compilation process to processor-specific instructions, then early 
> optimizers cannot do their work.
That is flaw in gcc design, one can make these transparent to analysis by supplying
a equivalent alternative implementation. As one can supply a
implementation for early passes like in following implementation.
static inline uint64_t 
mul_s (uint64_t x, uint64_t y)
  if (__builtin_constant_p (x < (1L << 32)) && x < (1L << 32) &&
      __builtin_constant_p (y < (1L << 32)) && y < (1L << 32))
    return x * y;
  if (__builtin_constant_p (x) && __builtin_constant_p (y))
    if (x != 0 && UINT64_MAX / x < y)
      return -1;
      return x * y;
  return builtin_mul_s (x, y);
> > By just expanding out to regular additions, comparisons and control
> > flow we can avoid this problem.
> > 
> >> (if X < Integer'Last - C then X + C else raise Constraint_Error)
> >> 
> > 
> >> On my x86-64 this generates something like:
> >> 00	cmpl	$0x7fffffff,%de
> >> 06	je	0x0000000c
> >> 08	leal	0x01(%rdi),%eax
> >> 0b	ret
> >> [..]
> > 
> > This has redundant compare instruction that cost a cycle and 6 bytes.
> > You can just write
> > 
> >   0:	83 c7 01             	add    $0x1,%edi
> >   3:	71 03                	jno    0x8
> > [...]
> > When you know that one operand is positive or you deal with unsigned
> > then you could replace jno with jnc which is bit faster on sandy bridge
> > processors and later as add, jnc pair is macro-fused but add jno is not.
> Indeed that is ideal, assuming condition flags are dead, but I think
> that the right way to do that is by late combine-like optimizations
> after instruction selection.
How would do you that for multiplication?

Addition is handled by gcc almost optimally, in following function gcc
does eliminate cmp but uses cmov instead of jump both on -Os and -O3
leading to larger code size.

add_s (uint64_t x, uint64_t y)
  if (__builtin_expect(x + y < x, 0))
    return -1;
  return x + y;

assembly is following:

  4003c0:	48 89 f8             	mov    %rdi,%rax
  4003c3:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  4003ca:	48 01 f0             	add    %rsi,%rax
  4003cd:	48 0f 42 c2          	cmovb  %rdx,%rax
  4003d1:	c3                   	retq

btw also when I replace -1 by x * y then cmov stays which is suboptimal
given a expect hint.

> In looking at generated code in actual programs, most checks are
> optimized away. This is more important than saving a cycle here or
> there in the much smaller number of checks that remain.

That specialization is unwarranted, this holds in general case but
calculating allocation size has different use pattern. A typical
use case is passing a size by argument or reading from io and allocating
apppropriate array. 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]