Bug 18008 - [4.0 Regression] Duplicate mask on bitfield insertion
Summary: [4.0 Regression] Duplicate mask on bitfield insertion
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: 4.0.0
Assignee: Richard Henderson
URL:
Keywords: missed-optimization
Depends on:
Blocks: bitfield
  Show dependency treegraph
 
Reported: 2004-10-15 09:17 UTC by Nathan Sidwell
Modified: 2005-01-27 00:12 UTC (History)
3 users (show)

See Also:
Host: i686-pc-linux-gnu
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-01-26 18:11:01


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Sidwell 2004-10-15 09:17:16 UTC
This code
  struct B { unsigned b : 2; };
  void store (struct B *b, int v)
  { b->b = v;}
produces this assembler
store:
        pushl  %ebp
        movl %esp, %ebp
        movl 8(%ebp), %ecx
        movb 12(%ebp), %dl
        andl $3, %edx
        andl $3, %edx
        movl (%ecx), %eax
        andl $-4, %eax
        orl  %edx, %eax
        movl %eax, (%ecx)
        leave
        ret

Those two andl $3's get inserted becuase rtl expansion generates somethine
daft. Here's the bits of rtl
(insn 13 11 14 1 (parallel [
            (set (reg:QI 60)
                (and:QI (subreg:QI (reg/v:SI 59 [ v ]) 0)
                    (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 15 14 16 1 (parallel [
            (set (reg:SI 62)
                (zero_extend:SI (reg:QI 60)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 16 15 17 1 (parallel [
            (set (reg:SI 63)
                (and:SI (reg:SI 62)
                    (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

huh?
Comment 1 Andrew Pinski 2004-10-15 12:45:53 UTC
Confirmed, this is a middle-end issue as the trees look good
  b->b = (<unnamed type>) (unsigned char) v;
Comment 2 Wolfgang Bangerth 2004-10-15 15:03:06 UTC
I thought one of Kazu's PRs already had this case. Someone should go through 
his PRs... 
 
W. 
Comment 3 Kazu Hirata 2004-10-15 16:42:46 UTC
No, I don't think this is a duplicate of mine.
Comment 4 Kazu Hirata 2004-10-15 16:55:14 UTC
There are at least two strange things.

1. We emit two ands in the first place.
2. The combiner does not seem to be working.
Comment 5 Kazu Hirata 2004-10-15 21:18:42 UTC
This is a regression from 3.3.3, which does not generate two consecutive andl.
Comment 6 Kazu Hirata 2004-10-16 03:46:24 UTC
All the interesting events happen in store_field.

When store_field expands "(<unnamed type>) (unsigned char) v",
it generates an AND with 3 because (<unnamed type>) is an anonymous type
of two-bit integer.

When it expands the assignment, it again generates an AND with 3.

It's probably easier to avoid the first AND if we see a narrowing conversion
(from 8-bit to 2-bit in this case).
Comment 7 Kazu Hirata 2004-10-16 03:50:12 UTC
From IRC:
<sayle> kazu: Looks like PR 18008 only has a single AND with -march=pentiumpro,
but two ANDs with -march=i386
Comment 8 Andrew Pinski 2004-10-16 04:05:30 UTC
Actually there are two ands for -march=pentiumpro also:
        movl    4(%esp), %ecx
        movzbl  8(%esp), %edx  <--- one
        movl    (%ecx), %eax
        andl    $3, %edx  <--- two
        andl    $-4, %eax <-- I have not figured out where this one comes from but it shows up on x86 
and x86-64 and also PPC
        orl     %edx, %eax
        movl    %eax, (%ecx)
        ret

PPC produces:
        lwz r0,0(r3)
        rlwinm r4,r4,0,30,31
        rlwimi r0,r4,30,0,1
        stw r0,0(r3)
        blr
which is responsible but not optimal as 3.3 produces better:
        lwz r2,0(r3)
        rlwimi r2,r4,30,0,1
        stw r2,0(r3)
        blr

(insn 11 10 12 0 0x0 (set (zero_extract:SI (reg:SI 2 r2 [120])
            (const_int 2 [0x2])
            (const_int 0 [0x0]))
        (reg/v:SI 4 r4 [119])) 106 {insvsi} (insn_list:REG_DEP_OUTPUT 10 (nil))
    (expr_list:REG_DEAD (reg/v:SI 4 r4 [119])
        (nil)))
Should be the only thing which is produced but:
(insn 14 15 16 0 (parallel [
            (set (reg:SI 4 r4 [121])
                (and:SI (reg:SI 4 r4 [ v ])
                    (const_int 3 [0x3])))
            (clobber (scratch:CC))
        ]) 100 {andsi3} (nil)
    (expr_list:REG_UNUSED (scratch:CC)
        (nil)))

(insn:TI 16 14 17 0 (set (zero_extract:SI (reg:SI 0 r0 [122])
            (const_int 2 [0x2])
            (const_int 0 [0x0])) 
        (reg:SI 4 r4 [121])) 125 {insvsi} (insn_list:REG_DEP_TRUE 14 (insn_list:REG_DEP_TRUE 15 (nil)))
    (expr_list:REG_DEAD (reg:SI 4 r4 [121])
        (nil)))

is produced, note the extra and 3 which is not needed for as the zero_extract does the work for us (I 
think this is equivalent to the extra and for -march=i386).
Comment 9 roger 2004-10-17 19:02:33 UTC
There are clearly a number of problems exposed by this PR (constant folding
issues, x86 backend issues, RTL simplification issues, etc...)  My personal
pick of which is the worst of these is that get_inner_reference returns an
SImode "inner" reference in a call to expand_assignment with a QImode source
and a QImode target.  This is enough to confuse the RTL optimizers and backends
which aren't used to this type of unexpected mode change.

For example, changing "unsigned" to "unsigned char" in the testcase generates
significantly better code.  If get_inner_reference returned a QImode "tem",
then the optimizers wouldn't have problems combining the QImode AND with the
SImode AND.  Funny that we read this field as QImode, and write it as SImode!

get_inner_reference is called with:
component_ref:QImode (indirect_ref:SImode (parm_decl:SImode)) (field_decl:SImode))
and returns the indirect_ref:SImode.  This doesn't seem correct??
Comment 10 Andrew Pinski 2004-12-23 16:05:07 UTC
Note I would not be surprised that fixing this also helps compile time as there are a couple (really a lot) 
of bit fields in GCC.
Comment 11 Paolo Bonzini 2005-01-22 10:04:45 UTC
Combine ought to merge these two

(insn 13 11 14 1 (parallel [
            (set (reg:QI 60)
                (and:QI (subreg:QI (reg/v:SI 59 [ v ]) 0)
                    (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 15 14 16 1 (parallel [
            (set (reg:SI 62)
                (zero_extend:SI (reg:QI 60)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

into

(insn 15 14 16 1 (parallel [
            (set (reg:SI 62)
                (zero_extend:SI (and:QI (subreg:QI (reg/v:SI 59 [ v ]) 0)
                    (const_int 3 [0x3]))))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

which can be simplified via the expand_compound_operation machinery to

(insn 15 14 16 1 (parallel [
            (set (reg:SI 62)
                (and:SI (reg/v:SI 59 [ v ]) (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

and the set of (reg:SI 63) from (and:SI (reg:SI 62) (const_int 3)) should be 
taken care of by the nonzero_bits stuff.  So it's feasible to have combine fix 
it, at least.

Hum, but maybe the clobber is messing up combine?
Comment 12 Steven Bosscher 2005-01-26 01:55:15 UTC
Why do we cast "v" to "unsigned char" before casting to the bitfield 
type?  This is already in the GIMPLE dump and comes from the front end: 
 
;; Function store (store) 
 
store (bD.1118, vD.1119) 
{ 
  unsigned charD.10 D.1122; 
  <unnamed type> D.1123; 
 
  D.1122 = (unsigned charD.10) vD.1119; 
  D.1123 = (<unnamed type>) D.1122; 
  bD.1118->bD.1117 = D.1123; 
} 
 
Breakpoint 4, build_modify_expr (lhs=0x2a95895910, modifycode=NOP_EXPR, 
rhs=0x2a95988d00) 
    at c-typeck.c:3321 
3321      newrhs = convert_for_assignment (lhstype, newrhs, ic_assign, 
(gdb) p debug_generic_expr (newrhs) 
vD.1450 
$13 = void 
(gdb) next 
3323      if (TREE_CODE (newrhs) == ERROR_MARK) 
(gdb) p debug_generic_expr (newrhs) 
(<unnamed type>) (unsigned charD.10) vD.1450 
$14 = void 
(gdb)    
 
Comment 13 Steven Bosscher 2005-01-26 02:12:54 UTC
g++ produces the same code as 3.3.  The .optimized dump for cc1plus: 
 
void store(B*, int) (b, v) 
{ 
  unsigned int v.0; 
 
<bb 0>: 
  b->b = (unsigned int) v; 
  return; 
 
} 
 
Look mom, no casts! 
 
Comment 14 Richard Henderson 2005-01-26 18:11:00 UTC
Tricky.  Part the first, I thought, is that the mode of the FIELD_DECL doesn't
match the mode of the FIELD_DECL's type.  Except that doesn't actually appear to
matter.  I shall fix it anyway.

As for combine, the problem on i386 is that we merge the mem with the QImode AND,
which prevents the later combination with the SImode AND.  The problem on PowerPC
is a missing optimization in make_field_assignment.

Perhaps we can avoid the zero_extend and use a paradoxical subreg instead...
Comment 15 GCC Commits 2005-01-26 20:29:39 UTC
Subject: Bug 18008

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-26 20:29:27

Modified files:
	gcc            : ChangeLog combine.c 

Log message:
	PR middle-end/18008
	* combine.c (make_field_assignment): Simplify store to zero_extract
	from a source with an overlapping mask.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7288&r2=2.7289
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/combine.c.diff?cvsroot=gcc&r1=1.471&r2=1.472

Comment 16 GCC Commits 2005-01-26 23:18:23 UTC
Subject: Bug 18008

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-26 23:18:15

Modified files:
	gcc            : ChangeLog c-decl.c expmed.c expr.c 

Log message:
	PR middle-end/18008
	* c-decl.c (finish_struct): Set DECL_MODE after resetting a
	field's type.
	* expmed.c (store_fixed_bit_field): Create a paradoxical subreg
	if we don't need the bits above those present in the current mode.
	* expr.c (store_field): Strip conversions to odd-bit-sized types
	if the destination field width matches.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7292&r2=2.7293
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/c-decl.c.diff?cvsroot=gcc&r1=1.623&r2=1.624
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expmed.c.diff?cvsroot=gcc&r1=1.218&r2=1.219
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expr.c.diff?cvsroot=gcc&r1=1.773&r2=1.774

Comment 17 GCC Commits 2005-01-27 00:08:08 UTC
Subject: Bug 18008

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-27 00:07:43

Modified files:
	gcc            : ChangeLog c-decl.c expr.c 

Log message:
	For real this time...
	
	PR middle-end/18008
	* c-decl.c (finish_struct): Set DECL_MODE after resetting a
	field's type.
	* expr.c (store_field): Strip conversions to odd-bit-sized types
	if the destination field width matches.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7294&r2=2.7295
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/c-decl.c.diff?cvsroot=gcc&r1=1.625&r2=1.626
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expr.c.diff?cvsroot=gcc&r1=1.775&r2=1.776

Comment 18 Richard Henderson 2005-01-27 00:12:48 UTC
Fixed.