17931 – [4.0 Regression] andl and testb are not combined

Bug 17931 - [4.0 Regression] andl and testb are not combined

Summary: [4.0 Regression] andl and testb are not combined

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.0.0

Importance:	P2 minor
Target Milestone:	4.0.0
Assignee:	Kazu Hirata

URL:
Keywords:	missed-optimization, patch

Depends on:
Blocks:

Reported:	2004-10-11 13:52 UTC by Kazu Hirata
Modified:	2004-10-12 17:57 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	i686-pc-linux-gnu
Build:
Known to work:	3.3.3 3.0.4 2.95.3 3.4.0
Known to fail:	4.0.0
Last reconfirmed:	2004-10-11 14:01:46

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kazu Hirata 2004-10-11 13:52:27 UTC

Consider:

struct flags {
  unsigned f0 : 1;
  unsigned f1 : 1;
};

_Bool
foo (struct flags *p)
{
  if (p->f0)
    return 1;

  return p->f1;
}

With "cc1 -O2 -fomit-frame-pointer", I get

foo:
	movl	4(%esp), %eax
	movb	(%eax), %dl
	movb	%dl, %al
	andl	$1, %eax
	testb	%al, %al
	jne	.L7
	xorl	%eax, %eax
	testb	$2, %dl
	setne	%al
	ret
	.p2align 2,,3
.L7:
	movl	$1, %eax
	ret

Note that andl and testb are not combined to "testb $1, %al".
If they were combined, we would not destroy %eax,
so we would not need to make a copy of %al for a later use.

With -march=pentium4 or -march=athlon-xp, I get a similar result.

Comment 1 Kazu Hirata 2004-10-11 13:54:53 UTC

Actually, we don't need "movl $1, %eax" at the end, either.

Comment 2 Andrew Pinski 2004-10-11 14:01:45 UTC

Confirmed, combine does not simplify:
(insn 14 13 15 0 (parallel [
            (set (reg:QI 64)
                (and:QI (reg:QI 63)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) 206 {*andqi_1} (insn_list:REG_DEP_TRUE 13 (nil))
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

(insn 15 14 16 0 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 64)
            (const_int 0 [0x0]))) 6 {*cmpqi_ccno_1} (insn_list:REG_DEP_TRUE 14 (nil))
    (expr_list:REG_DEAD (reg:QI 64)
        (nil)))

Comment 3 Andrew Pinski 2004-10-11 14:06:43 UTC

Note PPC's resulting asm is so much better:
        lwz r3,0(r3)
        li r0,1
        cmpwi cr7,r3,0
        blt- cr7,L4
        rlwinm r0,r3,2,31,31
L4:
        mr r3,r0
        blr

But can be improved still down to (but that is a register allocator problem):
        lwz r0,0(r3)
        cmpwi cr7,r0,0
        li r3,1
        bltlr- cr7
        rlwinm r3,r0,2,31,31
        blr

Comment 4 Andrew Pinski 2004-10-11 14:12:41 UTC

This is a regression from 2.95.3:
        movl 4(%esp),%eax
        testb $1,(%eax)
        jne .L3
        movb (%eax),%al
        shrb $1,%al
        andl $1,%eax
        ret
        .p2align 4,,7
.L3:
        movl $1,%eax
        ret
And 3.0.4:
foo:
        movl    4(%esp), %eax
        testb   $1, (%eax)
        je      .L2
        movl    $1, %eax
        ret
        .p2align 4,,7
.L2:
        movzbl  (%eax), %eax
        shrb    %al
        andl    $1, %eax
        ret
And 3.2.3,3.3.3, and 3.4.0:
foo:
        movl    4(%esp), %eax
        movl    $1, %edx
        movzbl  (%eax), %eax
        testb   $1, %al
        jne     .L1
        shrb    %al
        movl    %eax, %edx
        andl    $1, %edx
.L1:
        movl    %edx, %eax
        ret
        .size   foo, .-foo
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.0"

Comment 5 Kazu Hirata 2004-10-11 14:47:04 UTC

The combiner does try to combine andl and testb,
but the suggested combined insn is rejected by combine_validate_cost.

The cost of "andl $1, %eax" is 4.
The cost of "testb %al, %al" is 4.
So the original total cost is 8.

The cost of the combined insn, shown below, is 12.

(set (reg:CCZ 17 flags)
    (compare:CCZ (zero_extract:SI (subreg:SI (reg:QI 63) 0)
            (const_int 1 [0x1])
            (const_int 0 [0x0]))
        (const_int 0 [0x0])))

We need to teach ix86_rtx_cost to treat
(compare (zero_extract X (const_1) ...)) the same as and.

Comment 6 Kazu Hirata 2004-10-11 15:32:34 UTC

I'll be testing a patch shortly.

Comment 7 Kazu Hirata 2004-10-12 14:11:31 UTC

A patch posted at:
http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00986.html

Comment 8 GCC Commits 2004-10-12 17:21:19 UTC

Subject: Bug 17931

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	kazu@gcc.gnu.org	2004-10-12 17:14:43

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.c 

Log message:
	PR rtl-optimization/17931
	* config/i386/i386.c (ix86_rtx_costs): Handle COMPARE with
	ZERO_EXTRACT in it.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5846&r2=2.5847
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.734&r2=1.735

Comment 9 Andrew Pinski 2004-10-12 17:57:35 UTC

Fixed.