Bug 19331 - [4.0 Regression] Inefficient code generated for bitfield assignment
Summary: [4.0 Regression] Inefficient code generated for bitfield assignment
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.0.0
: P3 minor
Target Milestone: 4.0.0
Assignee: Roger Sayle
URL:
Keywords: missed-optimization
Depends on:
Blocks: 19466
  Show dependency treegraph
 
Reported: 2005-01-08 18:24 UTC by Giovanni Bajo
Modified: 2005-03-14 03:58 UTC (History)
8 users (show)

See Also:
Host:
Target: i?86-*-* avr
Build:
Known to work: 3.4.4 4.1.0
Known to fail:
Last reconfirmed: 2005-01-27 00:22:51


Attachments
Testcase (365 bytes, text/plain)
2005-01-08 18:24 UTC, Giovanni Bajo
Details
Testcase (the other was wrong) (368 bytes, text/plain)
2005-01-08 18:27 UTC, Giovanni Bajo
Details
Inefficient code generated with 4.0 (340 bytes, text/plain)
2005-01-08 18:28 UTC, Giovanni Bajo
Details
Efficient code generated with 3.4 (334 bytes, text/plain)
2005-01-08 18:28 UTC, Giovanni Bajo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Giovanni Bajo 2005-01-08 18:24:01 UTC
The attached testcase shows a code generation regression which can be seen on 
x86 but I believe it to be target indepedent.

The code is copying data between two structures, both of which contain 
bitfields. The source structure uses 1-bit bitfields of type "int", while the 
destination structure uses 1-bit bitfields of type "_Bool". At the end of the 
tree optimizers, the code looks like this:

    params.cliplmt_ch1 = (int) msg.cliplmt_ch1 != 0;
    params.cliplmt_ch2 = (int) msg.cliplmt_ch2 != 0;
    params.gate_ch1 = (int) msg.gate_ch1 != 0;
    params.gate_ch2 = (int) msg.gate_ch2 != 0;

The problem is that the comparison with zero is never removed by the 
optimizers. The generated code looks like this:

	testb	$1, %bl
	setne	%cl

In 3.4, instead, the generated code looks better:

	andl	$1, %ebx



Notice this this does not happen in C++ (using a "bool" bitfield instead 
of "_Bool"). We get a BITFIELD_REF there:

    D.1643 = BIT_FIELD_REF <msg, 8, 40>;
    params.cliplmt_ch1 = (D.1643 & 1) != 0;
    params.cliplmt_ch2 = (D.1643 & 2) != 0;
    params.gate_ch1 = (D.1643 & 4) != 0;
    params.gate_ch2 = (D.1643 & 8) != 0;

and the generated code is optimal.
Comment 1 Giovanni Bajo 2005-01-08 18:24:34 UTC
Created attachment 7900 [details]
Testcase
Comment 2 Andrew Pinski 2005-01-08 18:26:43 UTC
I think this is related to PR 18008 (or maybe even a dup).
Comment 3 Giovanni Bajo 2005-01-08 18:27:40 UTC
Created attachment 7901 [details]
Testcase (the other was wrong)
Comment 4 Giovanni Bajo 2005-01-08 18:28:19 UTC
Created attachment 7902 [details]
Inefficient code generated with 4.0
Comment 5 Andrew Pinski 2005-01-08 18:28:25 UTC
Also I think it comes changing the C front-end representation of bitfields to be more correct.
Comment 6 Giovanni Bajo 2005-01-08 18:28:46 UTC
Created attachment 7903 [details]
Efficient code generated with 3.4
Comment 7 Andrew Pinski 2005-01-08 18:37:07 UTC
Hmm, on PPC, the mainline produces much better code than 3.3.2.
3.3.2:
        lbz r6,69(r1)
        la r2,lo16(_params)(r7)
        lwz r3,48(r2)
        addi r1,r1,80
        rlwinm r5,r6,26,31,31
        rlwinm r4,r6,27,31,31
        rlwinm r0,r6,28,31,31
        rlwimi r3,r6,23,1,1
        rlwimi r3,r5,29,2,2
        rlwimi r3,r4,28,3,3
        rlwimi r3,r0,27,4,4
        stw r3,48(r2)

mainline:
        lwz r2,60(r1)
        lwz r0,48(r9)
        addi r1,r1,80
        rlwimi r0,r2,7,1,1
        rlwimi r0,r2,7,2,2
        rlwimi r0,r2,7,3,3
        rlwimi r0,r2,7,4,4
        stw r0,48(r9)
Comment 8 Andrew Pinski 2005-01-08 19:02:41 UTC
AVR also has the same problem, I have no idea where the problem is right now (I have to compare the 
initial RTL).
Comment 9 Steven Bosscher 2005-01-27 00:22:51 UTC
Not fixed by the patch for PR18008 
Comment 10 Andrew Pinski 2005-02-01 00:32:55 UTC
This changed between 20040708 and 20040709.
Which means that it was most likely caused by:
2004-07-08  Joseph S. Myers  <jsm@polyomino.org.uk>
            Neil Booth  <neil@daikokuya.co.uk>

        PR c/2511
        PR c/3325

Which is what I thought as we get optimial code from the C++ front-end for x86 though on ppc we get 
the same optimial code for both the C and C++ front-ends.
Comment 11 Andrew Pinski 2005-02-01 05:57:04 UTC
Hmm, maybe gcc should be able to optimize the following RTL better when combining them (if gcc 
does combine them):
(insn 19 18 20 0 (set (reg:CCZ 17 flags)
        (compare:CCZ (zero_extract:SI (subreg:DI (reg:QI 61 [+5 ]) 0)
                (const_int 1 [0x1])
                (const_int 0 [0x0]))
            (const_int 0 [0x0]))) 283 {*testqi_ext_3} (insn_list:REG_DEP_TRUE 16 (nil))
    (nil))
...
(insn 23 22 24 0 (set (reg:QI 68)
        (ne:QI (reg:CCZ 17 flags)
            (const_int 0 [0x0]))) 480 {*setcc_1} (insn_list:REG_DEP_TRUE 19 (nil))
    (expr_list:REG_DEAD (reg:CC 17 flags)
        (nil)))

And we should get:
(set (reg:QI 68) (zero_extract:SI (subreg:DI (reg:QI 61 [+5 ]) 0)
                                                 (const_int 1 [0x1])
                                                 (const_int 0 [0x0])))

because we know that what the compare compares against can only be 1 or 0 as it is a zero_extract of 
size 1.
Comment 12 Giovanni Bajo 2005-02-03 09:27:31 UTC
Patch posted by Roger:
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00205.html
(thanks!)
Comment 13 Giovanni Bajo 2005-03-13 15:08:48 UTC
The above link is broken because of the gcc.gnu.org hd crash. Roger, I can't 
understand if this bug was fixed or not. Can you elaborate?
Comment 14 CVS Commits 2005-03-13 22:34:14 UTC
Subject: Bug 19331

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	sayle@gcc.gnu.org	2005-03-13 22:34:07

Modified files:
	gcc            : ChangeLog tree.c fold-const.c 

Log message:
	PR middle-end/19331
	* tree.c (get_unwidened): Treat CONVERT_EXPR and NOP_EXPR identically.
	* fold-const.c (fold_sign_changed_comparison): Likewise.
	(fold_binary): Optimize comparisons against widened operands if
	the extension is represented by a CONVERT_EXPR, same as a NOP_EXPR.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7828&r2=2.7829
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.c.diff?cvsroot=gcc&r1=1.467&r2=1.468
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/fold-const.c.diff?cvsroot=gcc&r1=1.541&r2=1.542

Comment 15 Andrew Pinski 2005-03-13 23:29:17 UTC
Fixed at least in 4.1.0.
Comment 16 CVS Commits 2005-03-14 03:48:58 UTC
Subject: Bug 19331

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-4_0-branch
Changes by:	sayle@gcc.gnu.org	2005-03-14 03:48:51

Modified files:
	gcc            : ChangeLog tree.c fold-const.c 

Log message:
	PR middle-end/19331
	* tree.c (get_unwidened): Treat CONVERT_EXPR and NOP_EXPR identically.
	* fold-const.c (fold_sign_changed_comparison): Likewise.
	(fold_binary): Optimize comparisons against widened operands if
	the extension is represented by a CONVERT_EXPR, same as a NOP_EXPR.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=2.7592.2.46&r2=2.7592.2.47
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.466&r2=1.466.4.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/fold-const.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.517&r2=1.517.2.1

Comment 17 Andrew Pinski 2005-03-14 03:58:36 UTC
Fixed.  Thanks Roger for looking into this bug and fixing it.