Hi, asides from Andrew's patch, we also have this alternate patch
using combine. With respect to the bitfield case here, the two patches
are mostly similar in effect, but we thought we'd post both for
discussion anyways.
Currently, combine detects and rejects cases where the combine target
insn is a "modification" (LHS is a SUBREG, STRICT_LOW_PART, or
ZERO_EXTRACT), plus one of the prior merged insns has a SET destination
overlap, citing wrong semantics when such two insns are merged.
For the same testcase:
struct bits {
unsigned a : 5;
unsigned b : 5;
unsigned c : 5;
unsigned d : 5;
};
struct bits
f (unsigned int a)
{
struct bits bits;
bits.a = 1;
bits.b = 2;
bits.c = 3;
bits.d = a;
return bits;
}
The RTL encountered by combine on ARM and fails to optimize is like this:
2: r138:SI = r0:SI
6: r139:SI = 1
23: r136:SI = 0 @
7: zero_extract [4:0] r136:SI = r139:SI @ See how r136 is
8: r140:SI = 2 @ modified throughout
9: zero_extract [9:5] r136:SI = r140:SI @ the insns
10: r141:SI = 3
11: zero_extract [14:10] r136:SI = r141:SI
12: zero_extract [19:15] r136:SI = r138:SI
17: r0:SI = r136:SI
20: <use r0:SI>
For cases like ARM, this patch enhances combine's handled cases by using
its own expand_field_assignment() function (currently applied for prior
merged insn patterns) to transform the combine target insn which happens
to be a modification into a larger IOR/AND expression; this can turn the
LHS into a normal register write.
For example,
change (zero_extract reg1 #5 #0) = <foo>
into (reg1) = (ior (and reg1 0xffffffe0) (and <foo> 0x0000001f))
This plus the various propagation and simplification done later in
combine increases the chances it successfully finds a new combined
bitfield operation.
The combine patch is quite small, basically a call to
expand_field_assignment() + another try at combinable_i3pat() before
giving up and returning 0, enabled when flag_expensive_optimizations
is set.
With this patch, the testcase generated code for ARM improves from:
mov r2, #1
mov r3, #0
bfi r3, r2, #0, #5
mov r2, #2
bfi r3, r2, #5, #5
mov r2, #3
bfi r3, r2, #10, #5
bfi r3, r0, #15, #5
mov r0, r3
bx lr
into
movw r3, #3137
bfi r3, r0, #15, #5
mov r0, r3
bx lr
As mentioned, for this bitfield case, Andrew's CSE patch and mine is
mostly similar in effect. From the ordering of passes however, the CSE
patch will shadow this combine one if both were applied :P
Thanks,
Chung-Lin
2010-12-09 Chung-Lin Tang <cltang@codesourcery.com>
* combine.c (combinable_i3pat): Try field expanding the combine
target insn when it modifies its LHS output, changing it into
overwriting form, before failing the combine attempt.