Traps for signed arithmetic overflow
Segher Boessenkool
segher@kernel.crashing.org
Fri Nov 23 20:28:00 GMT 2018
Hi!
On Fri, Nov 23, 2018 at 09:01:56PM +0100, Helmut Eller wrote:
> when compiling this example with gcc -O2 -ftrapv:
>
> long foo (long x, long y) { return x + y; }
>
> long bar (long x, long y) {
> long z;
> if (__builtin_add_overflow (x, y, &z))
> __builtin_trap ();
> return z;
> }
>
> then GCC seems to produce less efficient code for foo than for bar:
>
> foo:
> subq $8, %rsp
> call __addvdi3@PLT
> addq $8, %rsp
> ret
>
> bar:
> movq %rdi, %rax
> addq %rsi, %rax
> jo .L9
> rep ret
> .L9:
> ud2
>
> I see several inefficiencies:
>
> 1.) __addvdi3 is not inlined.
It is implemented in libgcc. The x86 target code does not handle addvdi3,
only addvdi4 (3 calls abort, 4 jumps to its 4th arg).
> 2.) %rsp is adjusted before calling __addvdi3. Why is that needed?
To keep the stack aligned (to 16 bytes).
> 3.) Obviously __addvdi3 is not implemented as sibling-call even though
> -O2 should enable that.
It calls via the PLT, do sibling calls via the PLT work in your ABI?
> Where should I start, if I wanted to teach GCC how to produce the same
> code for foo as for bar? Would it be enough to add a pattern to
> i386.md? There is already a pattern for "addv<mode>4", but apparently
> it's not used in this case.
As Marc says, -ftrapv is probably not the way to go.
Adding an addv<mode>3 to the i386 backend might help.
You do *not* want exactly the same code, btw; addv3 calls abort on
overflow, that's not the same as executing an ud2 instruction.
Segher
More information about the Gcc-help
mailing list