This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch, combine] Bit-field insertion optimization #2


On 12/09/10 10:21, Chung-Lin Tang wrote:
Hi, asides from Andrew's patch, we also have this alternate patch using combine. With respect to the bitfield case here, the two patches are mostly similar in effect, but we thought we'd post both for discussion anyways.

Currently, combine detects and rejects cases where the combine target
insn is a "modification" (LHS is a SUBREG, STRICT_LOW_PART, or
ZERO_EXTRACT), plus one of the prior merged insns has a SET destination
overlap, citing wrong semantics when such two insns are merged.

For the same testcase:
struct bits {
  unsigned a : 5;
  unsigned b : 5;
  unsigned c : 5;
  unsigned d : 5;
};

struct bits
f (unsigned int a)
{
  struct bits bits;
  bits.a = 1;
  bits.b = 2;
  bits.c = 3;
  bits.d = a;
  return bits;
}

The RTL encountered by combine on ARM and fails to optimize is like this:
  2: r138:SI = r0:SI
  6: r139:SI = 1
 23: r136:SI = 0                @
  7: zero_extract [4:0] r136:SI = r139:SI    @ See how r136 is
  8: r140:SI = 2                @ modified throughout
  9: zero_extract [9:5] r136:SI = r140:SI    @ the insns
 10: r141:SI = 3
 11: zero_extract [14:10] r136:SI = r141:SI
 12: zero_extract [19:15] r136:SI = r138:SI
 17: r0:SI = r136:SI
 20: <use r0:SI>

For cases like ARM, this patch enhances combine's handled cases by using
its own expand_field_assignment() function (currently applied for prior
merged insn patterns) to transform the combine target insn which happens
to be a modification into a larger IOR/AND expression; this can turn the
LHS into a normal register write.

For example,
change (zero_extract reg1 #5 #0) = <foo>
into   (reg1) = (ior (and reg1 0xffffffe0) (and <foo> 0x0000001f))

This plus the various propagation and simplification done later in
combine increases the chances it successfully finds a new combined
bitfield operation.

The combine patch is quite small, basically a call to
expand_field_assignment() + another try at combinable_i3pat() before
giving up and returning 0, enabled when flag_expensive_optimizations is set.


With this patch, the testcase generated code for ARM improves from:
        mov     r2, #1
        mov     r3, #0
        bfi     r3, r2, #0, #5
        mov     r2, #2
        bfi     r3, r2, #5, #5
        mov     r2, #3
        bfi     r3, r2, #10, #5
        bfi     r3, r0, #15, #5
        mov     r0, r3
        bx      lr
into
        movw    r3, #3137
        bfi     r3, r0, #15, #5
        mov     r0, r3
        bx      lr

As mentioned, for this bitfield case, Andrew's CSE patch and mine is mostly similar in effect. From the ordering of passes however, the CSE patch will shadow this combine one if both were applied :P

Thanks,
Chung-Lin

2010-12-09 Chung-Lin Tang <cltang@codesourcery.com>

    * combine.c (combinable_i3pat): Try field expanding the combine
    target insn when it modifies its LHS output, changing it into
    overwriting form, before failing the combine attempt.
This variant looks pretty reasonable as well. It has the minor advantage that we don't need to move the init-regs pass.

I guess I'd be interested to know which of the two generally works better and whether or not if you apply them both, does the new combine code ever trigger?

Like the cse patch, I think there should be a failure to optimize bug with the testcase and potential patches attached.
jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]