Consider: struct flags { unsigned f0 : 1; unsigned f1 : 1; }; _Bool foo (struct flags *p) { if (p->f0) return 1; return p->f1; } With "cc1 -O2 -fomit-frame-pointer", I get foo: movl 4(%esp), %eax movb (%eax), %dl movb %dl, %al andl $1, %eax testb %al, %al jne .L7 xorl %eax, %eax testb $2, %dl setne %al ret .p2align 2,,3 .L7: movl $1, %eax ret Note that andl and testb are not combined to "testb $1, %al". If they were combined, we would not destroy %eax, so we would not need to make a copy of %al for a later use. With -march=pentium4 or -march=athlon-xp, I get a similar result.
Actually, we don't need "movl $1, %eax" at the end, either.
Confirmed, combine does not simplify: (insn 14 13 15 0 (parallel [ (set (reg:QI 64) (and:QI (reg:QI 63) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) 206 {*andqi_1} (insn_list:REG_DEP_TRUE 13 (nil)) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 15 14 16 0 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 64) (const_int 0 [0x0]))) 6 {*cmpqi_ccno_1} (insn_list:REG_DEP_TRUE 14 (nil)) (expr_list:REG_DEAD (reg:QI 64) (nil)))
Note PPC's resulting asm is so much better: lwz r3,0(r3) li r0,1 cmpwi cr7,r3,0 blt- cr7,L4 rlwinm r0,r3,2,31,31 L4: mr r3,r0 blr But can be improved still down to (but that is a register allocator problem): lwz r0,0(r3) cmpwi cr7,r0,0 li r3,1 bltlr- cr7 rlwinm r3,r0,2,31,31 blr
This is a regression from 2.95.3: movl 4(%esp),%eax testb $1,(%eax) jne .L3 movb (%eax),%al shrb $1,%al andl $1,%eax ret .p2align 4,,7 .L3: movl $1,%eax ret And 3.0.4: foo: movl 4(%esp), %eax testb $1, (%eax) je .L2 movl $1, %eax ret .p2align 4,,7 .L2: movzbl (%eax), %eax shrb %al andl $1, %eax ret And 3.2.3,3.3.3, and 3.4.0: foo: movl 4(%esp), %eax movl $1, %edx movzbl (%eax), %eax testb $1, %al jne .L1 shrb %al movl %eax, %edx andl $1, %edx .L1: movl %edx, %eax ret .size foo, .-foo .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.0"
The combiner does try to combine andl and testb, but the suggested combined insn is rejected by combine_validate_cost. The cost of "andl $1, %eax" is 4. The cost of "testb %al, %al" is 4. So the original total cost is 8. The cost of the combined insn, shown below, is 12. (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:SI (subreg:SI (reg:QI 63) 0) (const_int 1 [0x1]) (const_int 0 [0x0])) (const_int 0 [0x0]))) We need to teach ix86_rtx_cost to treat (compare (zero_extract X (const_1) ...)) the same as and.
I'll be testing a patch shortly.
A patch posted at: http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00986.html
Subject: Bug 17931 CVSROOT: /cvs/gcc Module name: gcc Changes by: kazu@gcc.gnu.org 2004-10-12 17:14:43 Modified files: gcc : ChangeLog gcc/config/i386: i386.c Log message: PR rtl-optimization/17931 * config/i386/i386.c (ix86_rtx_costs): Handle COMPARE with ZERO_EXTRACT in it. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5846&r2=2.5847 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.734&r2=1.735
Fixed.