[ Reported to the Debian BTS as report #105309. Please CC 105309-quiet@bugs.debian.org on replies. Log of report can be found at http://bugs.debian.org/105309 ] Here's another segment that needs to have an assembler optimiser run over it: int foo(char c) { if (c && !(c & 0x80)) { a(); } else { b(); } } produces with -O2: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: 8a 45 08 mov 0x8(%ebp),%al 9: 84 c0 test %al,%al b: 74 04 je 11 <foo+0x11> d: 84 c0 test %al,%al f: 79 07 jns 18 <foo+0x18> 9-f can be rewritten as: test %al, %al jg 18 11: e8 fc ff ff ff call 12 <foo+0x12> 12: R_386_PC32 b 16: c9 leave 17: c3 ret 18: e8 fc ff ff ff call 19 <foo+0x19> 19: R_386_PC32 a 1d: eb f7 jmp 16 <foo+0x16> And what purpose does this jmp serve? Surely it can be replaced with leave ret 1f: 90 nop Release: 3.0 (Debian GNU/Linux) Environment: System: Debian GNU/Linux (testing/unstable) Architecture: i386 host: i386-linux build: i386-linux target: i386-linux configured with: ../src/configure -v --enable-languages=c,c++,java,f77,proto,objc --prefix=/usr --infodir=/share/info --mandir=/share/man --enable-shared --with-gnu-as --with-gnu-ld --with-system-zlib --enable-long-long --enable-nls --without-included-gettext --disable-checking --enable-threads=posix --enable-java-gc=boehm --with-cpp-install-dir=bin --enable-objc-gc i386-linux
State-Changed-From-To: open->analyzed State-Changed-Why: Fold needs to recognize !(x & sign) as x >= 0. At which point the regular simplifications take care of it. As for leave+ret, may be handled by Jan's block duplication code on cfg-branch. At present we only issue multiple returns without a stack frame.
State-Changed-From-To: analyzed->closed State-Changed-Why: This problem has been fixed on mainline by the recent patch: 2002-05-06 Roger Sayle <roger@eyesopen.com> * fold-const.c (sign_bit_p): New function. (fold) [EQ_EXPR]: Use this to convert (A & C) == 0 into A >= 0 and (A & C) != 0 into A < 0, when constant C is the sign bit of A's type. Reapply fold when converting (A & C) == C into (A & C) != 0. We now join the two tests together and generate better code.
Herbert Xu writes: This bug is mostly fixed, but one small detail remains. The final jump back to the leave+ret sequence is still there. Since leave+ret is a two-byte sequence, it makes sense to replace the jump with a leave+ret. Here is the -S -O2 output with gcc 3.3: foo: pushl %ebp movl %esp, %ebp subl $8, %esp cmpb $0, 8(%ebp) jle .L2 call a .L3: leave ret .p2align 2,,3 .L2: call b jmp .L3 The last jmp should become leave ret --
This is already fixed on the mainline GCC produce this: foo: pushl %ebp movl %esp, %ebp subl $8, %esp cmpb $0, 8(%ebp) jle .L2 call a leave ret .p2align 2,,3 .L2: call b leave ret