This code struct B { unsigned b : 2; }; void store (struct B *b, int v) { b->b = v;} produces this assembler store: pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx movb 12(%ebp), %dl andl $3, %edx andl $3, %edx movl (%ecx), %eax andl $-4, %eax orl %edx, %eax movl %eax, (%ecx) leave ret Those two andl $3's get inserted becuase rtl expansion generates somethine daft. Here's the bits of rtl (insn 13 11 14 1 (parallel [ (set (reg:QI 60) (and:QI (subreg:QI (reg/v:SI 59 [ v ]) 0) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 15 14 16 1 (parallel [ (set (reg:SI 62) (zero_extend:SI (reg:QI 60))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 16 15 17 1 (parallel [ (set (reg:SI 63) (and:SI (reg:SI 62) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) huh?
Confirmed, this is a middle-end issue as the trees look good b->b = (<unnamed type>) (unsigned char) v;
I thought one of Kazu's PRs already had this case. Someone should go through his PRs... W.
No, I don't think this is a duplicate of mine.
There are at least two strange things. 1. We emit two ands in the first place. 2. The combiner does not seem to be working.
This is a regression from 3.3.3, which does not generate two consecutive andl.
All the interesting events happen in store_field. When store_field expands "(<unnamed type>) (unsigned char) v", it generates an AND with 3 because (<unnamed type>) is an anonymous type of two-bit integer. When it expands the assignment, it again generates an AND with 3. It's probably easier to avoid the first AND if we see a narrowing conversion (from 8-bit to 2-bit in this case).
From IRC: <sayle> kazu: Looks like PR 18008 only has a single AND with -march=pentiumpro, but two ANDs with -march=i386
Actually there are two ands for -march=pentiumpro also: movl 4(%esp), %ecx movzbl 8(%esp), %edx <--- one movl (%ecx), %eax andl $3, %edx <--- two andl $-4, %eax <-- I have not figured out where this one comes from but it shows up on x86 and x86-64 and also PPC orl %edx, %eax movl %eax, (%ecx) ret PPC produces: lwz r0,0(r3) rlwinm r4,r4,0,30,31 rlwimi r0,r4,30,0,1 stw r0,0(r3) blr which is responsible but not optimal as 3.3 produces better: lwz r2,0(r3) rlwimi r2,r4,30,0,1 stw r2,0(r3) blr (insn 11 10 12 0 0x0 (set (zero_extract:SI (reg:SI 2 r2 [120]) (const_int 2 [0x2]) (const_int 0 [0x0])) (reg/v:SI 4 r4 [119])) 106 {insvsi} (insn_list:REG_DEP_OUTPUT 10 (nil)) (expr_list:REG_DEAD (reg/v:SI 4 r4 [119]) (nil))) Should be the only thing which is produced but: (insn 14 15 16 0 (parallel [ (set (reg:SI 4 r4 [121]) (and:SI (reg:SI 4 r4 [ v ]) (const_int 3 [0x3]))) (clobber (scratch:CC)) ]) 100 {andsi3} (nil) (expr_list:REG_UNUSED (scratch:CC) (nil))) (insn:TI 16 14 17 0 (set (zero_extract:SI (reg:SI 0 r0 [122]) (const_int 2 [0x2]) (const_int 0 [0x0])) (reg:SI 4 r4 [121])) 125 {insvsi} (insn_list:REG_DEP_TRUE 14 (insn_list:REG_DEP_TRUE 15 (nil))) (expr_list:REG_DEAD (reg:SI 4 r4 [121]) (nil))) is produced, note the extra and 3 which is not needed for as the zero_extract does the work for us (I think this is equivalent to the extra and for -march=i386).
There are clearly a number of problems exposed by this PR (constant folding issues, x86 backend issues, RTL simplification issues, etc...) My personal pick of which is the worst of these is that get_inner_reference returns an SImode "inner" reference in a call to expand_assignment with a QImode source and a QImode target. This is enough to confuse the RTL optimizers and backends which aren't used to this type of unexpected mode change. For example, changing "unsigned" to "unsigned char" in the testcase generates significantly better code. If get_inner_reference returned a QImode "tem", then the optimizers wouldn't have problems combining the QImode AND with the SImode AND. Funny that we read this field as QImode, and write it as SImode! get_inner_reference is called with: component_ref:QImode (indirect_ref:SImode (parm_decl:SImode)) (field_decl:SImode)) and returns the indirect_ref:SImode. This doesn't seem correct??
Note I would not be surprised that fixing this also helps compile time as there are a couple (really a lot) of bit fields in GCC.
Combine ought to merge these two (insn 13 11 14 1 (parallel [ (set (reg:QI 60) (and:QI (subreg:QI (reg/v:SI 59 [ v ]) 0) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 15 14 16 1 (parallel [ (set (reg:SI 62) (zero_extend:SI (reg:QI 60))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) into (insn 15 14 16 1 (parallel [ (set (reg:SI 62) (zero_extend:SI (and:QI (subreg:QI (reg/v:SI 59 [ v ]) 0) (const_int 3 [0x3])))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) which can be simplified via the expand_compound_operation machinery to (insn 15 14 16 1 (parallel [ (set (reg:SI 62) (and:SI (reg/v:SI 59 [ v ]) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) and the set of (reg:SI 63) from (and:SI (reg:SI 62) (const_int 3)) should be taken care of by the nonzero_bits stuff. So it's feasible to have combine fix it, at least. Hum, but maybe the clobber is messing up combine?
Why do we cast "v" to "unsigned char" before casting to the bitfield type? This is already in the GIMPLE dump and comes from the front end: ;; Function store (store) store (bD.1118, vD.1119) { unsigned charD.10 D.1122; <unnamed type> D.1123; D.1122 = (unsigned charD.10) vD.1119; D.1123 = (<unnamed type>) D.1122; bD.1118->bD.1117 = D.1123; } Breakpoint 4, build_modify_expr (lhs=0x2a95895910, modifycode=NOP_EXPR, rhs=0x2a95988d00) at c-typeck.c:3321 3321 newrhs = convert_for_assignment (lhstype, newrhs, ic_assign, (gdb) p debug_generic_expr (newrhs) vD.1450 $13 = void (gdb) next 3323 if (TREE_CODE (newrhs) == ERROR_MARK) (gdb) p debug_generic_expr (newrhs) (<unnamed type>) (unsigned charD.10) vD.1450 $14 = void (gdb)
g++ produces the same code as 3.3. The .optimized dump for cc1plus: void store(B*, int) (b, v) { unsigned int v.0; <bb 0>: b->b = (unsigned int) v; return; } Look mom, no casts!
Tricky. Part the first, I thought, is that the mode of the FIELD_DECL doesn't match the mode of the FIELD_DECL's type. Except that doesn't actually appear to matter. I shall fix it anyway. As for combine, the problem on i386 is that we merge the mem with the QImode AND, which prevents the later combination with the SImode AND. The problem on PowerPC is a missing optimization in make_field_assignment. Perhaps we can avoid the zero_extend and use a paradoxical subreg instead...
Subject: Bug 18008 CVSROOT: /cvs/gcc Module name: gcc Changes by: rth@gcc.gnu.org 2005-01-26 20:29:27 Modified files: gcc : ChangeLog combine.c Log message: PR middle-end/18008 * combine.c (make_field_assignment): Simplify store to zero_extract from a source with an overlapping mask. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7288&r2=2.7289 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/combine.c.diff?cvsroot=gcc&r1=1.471&r2=1.472
Subject: Bug 18008 CVSROOT: /cvs/gcc Module name: gcc Changes by: rth@gcc.gnu.org 2005-01-26 23:18:15 Modified files: gcc : ChangeLog c-decl.c expmed.c expr.c Log message: PR middle-end/18008 * c-decl.c (finish_struct): Set DECL_MODE after resetting a field's type. * expmed.c (store_fixed_bit_field): Create a paradoxical subreg if we don't need the bits above those present in the current mode. * expr.c (store_field): Strip conversions to odd-bit-sized types if the destination field width matches. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7292&r2=2.7293 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/c-decl.c.diff?cvsroot=gcc&r1=1.623&r2=1.624 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expmed.c.diff?cvsroot=gcc&r1=1.218&r2=1.219 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expr.c.diff?cvsroot=gcc&r1=1.773&r2=1.774
Subject: Bug 18008 CVSROOT: /cvs/gcc Module name: gcc Changes by: rth@gcc.gnu.org 2005-01-27 00:07:43 Modified files: gcc : ChangeLog c-decl.c expr.c Log message: For real this time... PR middle-end/18008 * c-decl.c (finish_struct): Set DECL_MODE after resetting a field's type. * expr.c (store_field): Strip conversions to odd-bit-sized types if the destination field width matches. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7294&r2=2.7295 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/c-decl.c.diff?cvsroot=gcc&r1=1.625&r2=1.626 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expr.c.diff?cvsroot=gcc&r1=1.775&r2=1.776
Fixed.